eembc / energyrunner

The EEMBC EnergyRunner application framework for the MLPerf Tiny benchmark.
14 stars 5 forks source link

No compatible devices detected, manual handshake works #18

Closed dmorn closed 3 years ago

dmorn commented 3 years ago

Hi, I'm trying to benchmark a Zynq-7000 (Xilinx) based board. The runner runs on one of the ARM cores, accelerated by the FPGA, you can find the incomplete project @ https://github.com/jecoz/mlperf-tiny-ebaz4205.

The benchmark framework does not detect the device, though I can manually handshake as described in the Readme.

I can connect using cu with boudrate 115200.

% sudo cu -l /dev/cu.usbserial-14120 -s 115200
Connected.
name%name%name%name%name%name%name%name%m-name-ebaz-4205-[xlnx]
m-ready
m-name-ebaz-4205-[xlnx]
m-ready
m-name-ebaz-4205-[xlnx]
m-ready
m-name-ebaz-4205-[xlnx]
m-ready
m-name-ebaz-4205-[xlnx]
m-ready
m-name-ebaz-4205-[xlnx]
m-ready
m-name-ebaz-4205-[xlnx]
m-ready
m-name-ebaz-4205-[xlnx]
m-ready

stty on the board side tells

stty -F /dev/ttyPS0 -a
speed 115200 baud;stty: /dev/ttyPS0
 line = 0;
intr = ^C; quit = ^\; erase = ^?; kill = ^U; eof = ^D; eol = <undef>; eol2 = <undef>; swtch = <undef>; start = ^Q; stop = ^S; susp = ^Z; rprnt = ^R;
werase = ^W; lnext = ^V; flush = ^O; min = 1; time = 0;
-parenb -parodd -cmspar cs8 hupcl -cstopb cread clocal -crtscts
-ignbrk -brkint -ignpar -parmrk -inpck -istrip -inlcr -igncr -icrnl -ixon -ixoff -iuclc -ixany -imaxbel -iutf8
-opost -olcuc -ocrnl onlcr -onocr -onlret -ofill -ofdel nl0 cr0 tab0 bs0 vt0 ff0
-isig -icanon iexten echo echoe echok -echonl -noflsh -xcase -tostop -echoprt echoctl echoke -flusho

The runner has the stock configuration

default-timeout-ms=5000
dut-baud=115200
dut-boot-mv=3000
emon-drop-thresh-pct=0.1
root=/Users/danielmorandini
timestamp-hold-us=50
umount-on-error=true
use-crlf=false
use-visa=true
n6705-set-vio=true
disable-mute=false

The host device is a macbook pro running BigSur, 11.4. I connect to the board using a serial cable from Adafruit (https://www.adafruit.com/product/954), which requires the installation of a driver from Prolific (http://www.prolific.com.tw/US/ShowProduct.aspx?p_id=229&pcid=41).

The tool finds the serial device, but reports Serial /dev/tty.usbserial-14120 failed name check, skipping.

I notice now I was testing against the /dev/cu. family and not /dev/tty.. With the latter, cu does not work either! Any hints?

petertorelli commented 3 years ago

I believe it is because you modified EE code, e.g. the name string outside the brackets. Messages that start with m-... are special, in that they are expected to fit a certain regex pattern. The response of the name query should be m-name-dut-[...]. Since the dut was replaced with ebaz-4205 it fails. Only TH code can be modified, not EE_ functions/vars/defines. Restore api/internally_implemented.h EE_DEVICE_NAME to 'dut' and this should work.

dmorn commented 3 years ago

Hi @petertorelli , thanks for the response. I restored the file as suggested, output is now m-name-dut-[xlnx]. The device is not recognised yet though. I just double checked and /dev/tty.* file works just fine if I use screen instead of cu.

Now though the serial communication does not properly work:

% sudo screen /dev/tty.usbserial-14120 115200
name%
e-[Unknown command: 

�name]
m-ready
m-name-dut-[xlnx]
m-ready
m-name-dut-[xlnx]
m-ready

Looks like a mismatched baud rate but apparently I'm setting 115200 on both sides 🧐

petertorelli commented 3 years ago

Two things:

  1. The UART buffer has garbage in it, it should be flushed. If your device does not boot on serial open (like an Arduino), flush the buffer using screen like you have done. Then try starting the framework again.

  2. Why are you using sudo? If you cannot open the device as a normal user, then the framework cannot. I do not advise running the framework as sudo. That could be another reason.

dmorn commented 3 years ago

The UART buffer has garbage in it, it should be flushed. If your device does not boot on serial open (like an Arduino), flush the buffer using screen like you have done. Then try starting the framework again

Done. It looks like the buffer contains garbage only after I run the framework. From that point on, even screen receives garbage. On the other end, if I just play with it through screen (w/o the framework) the communication looks stable.

Why are you using sudo? If you cannot open the device as a normal user, then the framework cannot. I do not advise running the framework as sudo. That could be another reason.

It was just a copy/paste issue, I have to use sudo with cu, screen works just fine w/o it.

dmorn commented 3 years ago

I bet this is a UART misconfiguration. I would have to know which configuration the framework is expecting to fix this (stop bits, parity ecc) and I'm not finding this information anywhere!

dmorn commented 3 years ago

Check this out @petertorelli:

**./runner-48cb466 /dev/ttyPS0
debug: th_getchar: n
debug: th_getchar: a
debug: th_getchar: m
debug: th_getchar: e
debug: th_getchar: %
debug: th_getchar: 

debug: th_getchar: n
debug: th_getchar: a
debug: th_getchar: m
debug: th_getchar: e
debug: th_getchar: %
debug: th_getchar: 

Nothing is sent until a \n character. Only the very first message it processed correctly , afterwards all commands will have \n prepended (from previous messages). This is why the name%name%name% test passed!

My serial device was in canocial (cooked) mode. Putting it into raw mode (termios(3) cfmakeraw) made it, device detected :tada:. I suggest adding this bit to the documentation!

Thanks, dan

petertorelli commented 3 years ago

@jecoz

Looks like it was the behavior of the terminal emulator: it wasn't sending anything down until the user hit enter "\n", which then causes a syntax error.

The percent sign is the terminal character in the parse loop.

From api/internally_implemented.cpp:

void ee_serial_callback(char c) {
  if (c == EE_CMD_TERMINATOR) {
    g_cmd_buf[g_cmd_pos] = (char)0;
    th_command_ready(g_cmd_buf);
    g_cmd_pos = 0;
  } else {
    g_cmd_buf[g_cmd_pos] = c;
    g_cmd_pos = g_cmd_pos >= EE_CMD_SIZE ? EE_CMD_SIZE : g_cmd_pos + 1;
  }
}

From api/internally_implemented.h:

#define EE_CMD_SIZE 80u
#define EE_CMD_DELIMITER " "
#define EE_CMD_TERMINATOR '%'

Some documentation was removed from the code when it was re-written. I'll add a new section to the GitHub for the runner to explain what happens during the scan. Also, 8N1 is so often the default we're starting to take it for granted!

I've not come across cooked v. raw mode on macOS, thanks for the tip, I need to document that.

petertorelli commented 3 years ago

Also, this is explained to some extent at the bottom of the README.md:

https://github.com/eembc/energyrunner/blob/main/README.md#debugging-device-auto-detection