arduino / ArduinoCore-mbed

347 stars 202 forks source link

Nano 33 BLE unreliable USB communication #129

Open tttapa opened 3 years ago

tttapa commented 3 years ago

Hi,

I'm working on a USB MIDI library using the PluggableUSBModule class. During my tests I've had sporadic failures due to the received USB packets reporting an incorrect length (the number returned by read_finish() is often incorrect). I've printed the packet lengths I get in the USB callback on the Arduino and compared them to the lengths of the packets sent by my computer (using Wireshark), and the received sizes are often incorrect. It's most noticeable at the end of a transmission, where the packet size is less than the maximum size. The strange thing is that the second to last packet often reports the same length as the last packet.

For example, when sending 196 bytes in total, the packet sizes are 64, 64, 64, 4, but the reported sizes on the Arduino are sometimes 64, 64, 4, 4.

The standard Arduino USB Serial port seems to exhibit similar issues. When using simple loop-back code (see below) and sending long strings in the serial monitor, the response is truncated at seemingly random points and sometimes characters are missing in the middle as well.

void setup() {
  Serial.begin(115200);
}

void loop() {
  int r = Serial.read();
  if (r >= 0)
    Serial.write((uint8_t)r);
}

Challenge:

01234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789

Responses (scroll to the right):

0123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345
0123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345
0123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345
0123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678
0123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345
0123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678
0123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678
0123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678
0123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345
0123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678
012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234578901234567890123456789012345678901234567890123456789012345678
0123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678
0123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345
0123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678
0123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345
0123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678
0123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345
0123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345
0123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345
0123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678

I'm using version 1.3.1 of this core on an official Arduino Nano 33 BLE, Arduino IDE 1.8.13 on Ubuntu 20.04 5.4.0-65-lowlatency #73-Ubuntu SMP PREEMPT.

I suspect this is some kind of timing issue or a data race in the low-level USB code. I've verified my own code by simulating it in a multi-threaded desktop C++ program and running it using ThreadSanitizer. I'm reasonably confident that it is correct and race-free, and I've tried just wrapping everything in my code that even remotely interacts with variables used in USB callbacks in a critical section to make sure it doesn't race with my code, but the problem persists.

If it's useful, I'll clean up my code and push it to GitHub.

I did some digging into the ArduinoCore-mbed source files and I realize that the root cause probably lies buried somewhere in the Nordic-specific Mbed OS USB code, but since it also affects the Arduino Core, I decided to first open an issue here. I hope someone who's familiar with Arduino Mbed port could maybe point me to the right resources.

I searched for issues in the ARM Mbed OS repository, and found these two: https://github.com/ARMmbed/mbed-os/issues/11912 & https://github.com/ARMmbed/mbed-os/issues/10862, but although they do mention problems with critical sections in the USB code, I don't know how relevant they are to this issue.

I've been stuck on this problem for about a week now, and I don't know how to proceed. At this point, it seems impossible to reliably receive multiple packets worth of data over USB, neither using Serial, nor using MIDI.

tttapa commented 3 years ago

I've uploaded my code to https://github.com/tttapa/Arduino-mbed-PluggableUSBMIDI-experiment. The README contains a high-level overview of the code, the steps needed to reproduce the issue, and the output of the program (both the failure and success scenarios).

Most of it is just PluggableUSB boilerplate to set up the endpoints, initialize the descriptors, etc.
The relevant code that actually reads the USB data is here: https://github.com/tttapa/Arduino-mbed-PluggableUSBMIDI-experiment/blob/cd499c9b46647723ca6c0b3aa0358e27100e7a61/PluggableUSBMIDI.cpp#L127-L187

liningpan commented 3 years ago

It's not a wise idea to send large data packets with the Serial API. The serial api provided by Nordic has a buffer size of 32. You should break them into smaller packets. SerialTransfer might be a better option. I also implement a library that can transparently send large binary packets based on the SerialTransfer library. However, it's neither published as a library nor has a user friendly API. I attached a link here https://gitlab.com/senior-design-three-musketeers/building-blocks/-/tree/master/test-suite/IntegrationTest/src/serial.

tttapa commented 3 years ago

Hi @liningpan, thanks for the reply.
My previous messages are about the USB CDC “Serial” port, not the UART. It uses 64-byte packets and buffers, but this should be transparent: if you send more than 64 bytes, it should simply block, not drop the extra bytes. Similarly when receiving data, the data should stay in the buffer until a user calls Serial.read(). As long as there's unread data in all buffers, no new USB RX request is made, and the host waits before sending more data until the Arduino code reads from the buffer and at least one buffer of 64 bytes can be used for the next request.

tttapa commented 3 years ago

Another Arduino user with the same problem: https://forum.arduino.cc/index.php?topic=729683

facchinm commented 3 years ago

@tttapa @liningpan can you try this patch?

commit d81487081d242ad9bd3434cdf544d93c684cd456
Author: Paolo Calao <paolo.calao@gmail.com>
Date:   Tue Mar 16 15:27:28 2021 +0100

    PluggableUSBSerial - fix race condition on rx_buffer

diff --git a/cores/arduino/USB/PluggableUSBSerial.h b/cores/arduino/USB/PluggableUSBSerial.h
index a6191cb3..5518d9e3 100644
--- a/cores/arduino/USB/PluggableUSBSerial.h
+++ b/cores/arduino/USB/PluggableUSBSerial.h
@@ -221,18 +221,27 @@ public:
     }

     int available(void) {
+        USBCDC::lock();
         onInterrupt();
-        return rx_buffer.available();
+        auto ret = rx_buffer.available();
+        USBCDC::unlock();
+        return ret;
     }

     int peek(void) {
+        USBCDC::lock();
         onInterrupt();
-        return rx_buffer.peek();
+        auto ret = rx_buffer.peek();
+        USBCDC::unlock();
+        return ret;
     }

     int read(void) {
+        USBCDC::lock();
         onInterrupt();
-        return rx_buffer.read_char();
+        auto ret = rx_buffer.read_char();
+        USBCDC::unlock();
+        return ret;
     }

     void flush(void) {}

We encountered a similar problem recently and we're investigating it

tttapa commented 3 years ago

Unfortunately, this doesn't seem to fix the issue. I tried doing everything in a critical section guarded by the USB lock in my MIDI class as well and I had the same problem. I really think this is a bug in a lower layer of the USB stack.

Using your patch and the same challenge as before, I now get:

01234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789
01234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789 
01234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789 
01234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789 
01234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789 
012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567 
012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567 
01234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789 
01234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789 
012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567 
01234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789 
01234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789 
01234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789 
01234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789 
012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567 
012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567 
01234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789 
012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567 
01234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789 
01234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789 
01234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789 
01234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789 
012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567 
012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567 
01234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789 
012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567 
012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567 
012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567 
01234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789 
012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567 
01234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789 
01234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789 
012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567 
01234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789 
012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567 
012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567 

(I don't mean that the fix didn't help, of course, it does indeed seem to fix a race condition, and after applying the patch, I no longer get issues where single digits get lost, such as the 0123457890 - missing 6 - in my first message.)

steph643 commented 2 years ago

Any news on this? I might have been bitten as well. Serial was working correctly, until I set up a USB->UART bridge with high throughput. Every now and then it seems there are dropped bytes.

steph643 commented 2 years ago

@facchinm , would you please tell us the status of this issue? Is it acknowledged? Fixed? Waiting for feedback from Nordic? How can we help?

If this issue is confirmed, I might need to purchase another board. Does somebody know if the same issue occurs on the Nano 33 IoT? The Nano 33 BLE Sense?

progmars commented 1 year ago

What has changed with the recent merge? Was the "PluggableUSBSerial - fix race condition on rx_buffer" included? Does it mean that bytes will not be missing from the middle of the payload, but there's still the problem left with the bytes sometimes missing after reaching 64 bytes, or was it fixed too?

Would it be possible to implement some kind of a workaround (even if it means losing performance)? Something like additional buffering and then feeding it to the Nordic USB by 64 bytes, waiting for... something, not sure what... to ensure that USB has sent all the previous 64 bytes of data?

tttapa commented 1 year ago

I only encountered issues receiving data, so you don't really have any control over how full the buffer is.

adamgreen commented 1 year ago

@facchinm, I am seeing a similar issue with the USB CDC driver on the Nano 33 BLE. I hit it in ThreadDebug when I attempt to use GDB to update the contents of the CPU registers. This results in a packet that should contain 405 bytes but only 341 bytes are actually received by MRI/ThreadDebug, a deficit of 64 bytes. The same test runs correctly on the Portenta H7.

I will attempt to write and share a simpler repro case next week.

adamgreen commented 1 year ago

I was able to write a simpler repro than all of ThreadDebug/MRI and GDB. The sources for the Mac and Arduino side can be found on https://github.com/adamgreen/repro-129.

This test sends a larger chunk of data which corresponds to what GDB sends when it attempts to set the CPU registers. The code on the Arduino receives the GDB packet and then returns the number of bytes which match the expected GDB packet. The code running on the Mac displays the actual and expected number of matched bytes. When run against the Arduino Nano33 BLE, I see output like this:

~/depots/repro-129$ ./repro-129 /dev/tty.usbmodem11401 
Opening connection to /dev/tty.usbmodem11401...
Sending...
Receiving...
Count - actual: 17  expected: 405)
Sending...
Receiving...
Count - actual: 1  expected: 405)
Sending...
Receiving...
Count - actual: 1  expected: 405)
Sending...
Receiving...
Count - actual: 1  expected: 405)
Sending...
Receiving...
Count - actual: 1  expected: 405)
Sending...
Receiving...
Count - actual: 1  expected: 405)
Sending...
Receiving...
Count - actual: 16  expected: 405)
Sending...
Receiving...
Count - actual: 1  expected: 405)
Sending...
Receiving...
Count - actual: 16  expected: 405)
Sending...
Receiving...
Count - actual: 8  expected: 405)
Sending...
Receiving...
error: Failed while reading bytes from serial port.

It can be seen from the above output that the whole 405 bytes of the packet never match and after awhile it gets enough out of sync that the Arduino doesn't return the matched byte count as expected.

If I run the same code on the Arduino Portenta H7, the test runs for all 1000 iterations, with each iteration getting all 405 bytes as expected:

Sending...
Receiving...
Count - actual: 405  expected: 405)

I was curious if this is a bug in the Arduino high level USBSerial code or if it is deeper down in the mbed or Nordic driver. I wrote a different test where ascending byte values were sent across the wire and also added an assert to the USBSerial code which would halt if it didn't receive the bytes in the expected order. This assert fires which means that the data is corrupted by the time it gets up to the Arduino USBSerial class and it isn't USBSerial's rx_buffer that is getting corrupted. It would probably take me quite some time to come up to speed enough on the USB peripheral on the nRF52840 to debug what might be happening here. It appears that USB packets are being dropped or a newer packet is overwriting the one currently being processed higher up in the USB stack by the mbed/Arduino code.

dahanc commented 6 months ago

I'm seeing the same issue with PluggableUSBModule. I have a bulk out endpoint, and the host is sending my device around 200 bytes. For example, 206 bytes, split as packets of 64, 64, 64, 14. Sometimes it works, but other times, read_finish() returns 64, 64, 14, 14 bytes. On the third read, when read_finish returns 14 instead of 64, I see that the full 64 bytes were written to the buffer I passed to read_start().

I'm using version 4.1.1 of the Mbed Nano board package.