eembc / energyrunner

The EEMBC EnergyRunner application framework for the MLPerf Tiny benchmark.
14 stars 5 forks source link

Send/Ack timeout error on Arduino Nano BLE 33 #26

Closed HaochenZ11 closed 1 year ago

HaochenZ11 commented 2 years ago

Background: trying to run TinyML benchmarking on an Arduino Nano BLE 33 device in Performance Mode but keep hitting timeout error during data loading process

I've adapted the code for the reference submissions to work with Arduino and I'm able to flash and run the program directly and send commands via Arduino Serial Monitor. However, I'm having issues using the EEMBC UI to run any sort of inference. For example, with the person_detection (vww) model configured as below:

image

I'm consistently hitting the error sequencer: e-[Send/Ack timeout] during the data transfer process (below). I've gotten this error with multiple models, on multiple host machines, using different cords, etc and seems like the time until I hit this error differs depending on the model.

image

In addition, the data transfer seems very slow in general - it's taking around 5 seconds for each individual db command (sending 64 chars). I have tried adjusting the "default-timeout-ms" param in the .eembc.ini file to a larger number but it doesn't seem to make a difference. Seems like the issue is with serial communication between EEMBC framework and Arduino. Any idea on how to fix this?

Thanks

petertorelli commented 2 years ago

Hello,

When this happens it almost always a problem with the DUT Rx buffers overflowing and the device crashing. When the send/ack timesout, what is the DUT doing? I would start debugging there. I suspect there is a buffer overflow that is not being handled, or the Rx is being interrupted by some other task and not resuming properly. If the DUT stops responding after the send/ack (e.g., it needs to be reset), then that would hint at a hard fault. If the DUT does respond to the "name" command after the timeout error without rebooting it, then the Rx buffer is overflowing but not trigging a completion interrupt to respond with "m-ready". Hard to diagnose from afar, but hopefully this inspires you.

Peter

HaochenZ11 commented 2 years ago

Hi Peter, thanks for the reply. Before when the send/ack was timing out, the DUT did not crash and would still be able to respond, although the eembc UI would freeze. Currently, I'm able to get past data loading stage and run inferences, although a lot of the data is missing (not received).

For every 64 chars sent by host, only the last 4 are received & stored for some commands, while for others, all 64 are stored. Which commands have all the data received vs. which do not seem to vary randomly. For kws model for example, 490 elements are expected but it only has either 40, 70, 100, or 130 elements in input db at inference time. The number of input elements received varies between those numbers with each run. Flushing serial Rx buffer between commands/increasing buffer size so far has not made a difference.

petertorelli commented 2 years ago

If the Tx/Rx pins reflect the serial port from the USB, then I would start by probing these and trying to find a case where there is a loss.

You can load bytes directly to the buffer using these commands:

    th_printf("db SUBCMD    : Manipulate a generic byte buffer\r\n");
    th_printf("  load N     : Allocate N bytes and set load counter\r\n");
    th_printf("  db HH[HH]* : Load 8-bit hex byte(s) until N bytes\r\n");

e.g.:

    dut db load 256
    dut db 000102030405060708090a0b0c0d0e0f
    dut db 000102030405060708090a0b0c0d0e0f
    dut db 000102030405060708090a0b0c0d0e0f
    dut db 000102030405060708090a0b0c0d0e0f
    :

And see if you can force a failure. If the Rx line of the Arduino shows all the bytes arriving, but the the Arduino didn't receive them, then that would be a firmware issue. If bytes are missing during the transfer from the Host, than that would be a UI issue.

Also, the UI freezing is a concern as well. If you are still able to talk to the DUT through the UI, what part froze?

petertorelli commented 2 years ago

Also, I have a Nano (not BLE), so if you want to email me your firmware I'd be interested to see if I can repeat it (assuming it works on the non-ble).