espressif / arduino-esp32

Arduino core for the ESP32
GNU Lesser General Public License v2.1
13.5k stars 7.39k forks source link

USB is randomly disconnected from ESP32-S3 #10000

Open sblantipodi opened 3 months ago

sblantipodi commented 3 months ago

Board

Lolin ESP32-S3

Device Description

Plain Lolin ESP32-S3 board

Hardware Configuration

No GPIO used

Version

v3.0.1

IDE Name

PlatformIO

Operating System

Windows 11

Flash frequency

240MHz

PSRAM enabled

no

Upload speed

115200

Description

the sketch below hangs from time to time. the USB from S3 is disconnected and you need to power down/on the ESP device to get USB working again.

Sketch

byte pre[CONFIG_PREFIX_LENGTH];

Serial.begin(115200);
Serial.setRxBufferSize(1500);
size_t prefixLength = Serial.readBytes((byte *) pre, 1500);

Debug Message

-

Other Steps to Reproduce

No response

I have checked existing issues, online documentation and the Troubleshooting Guide

me-no-dev commented 3 months ago

requires minimal sketch that we can compile to reproduce.

SuGlider commented 3 months ago

What is the USB CDC settings, Hardware Serial JTAG or OTG TinyUSB?

How is it detecting that USB has disconnected?

Could it be a USB Cable/plug problem instead?

When Log Output Level is debug, do you see any messages in the UART0?

sblantipodi commented 3 months ago

requires minimal sketch that we can compile to reproduce.

it's difficult to give you a minimal sketch since there is a PC part also needed that sends the data to the ESP.

What is the USB CDC settings, Hardware Serial JTAG or OTG TinyUSB?

problem happen with both Hardware CDC and TinyUSB.

How is it detecting that USB has disconnected?

I can hear the Windows sound when a USB device is disconnected. And after the sound no device is present in the windows device manager.

Could it be a USB Cable/plug problem instead?

I tried a lot of cable and USB ports, I doubt. I have a firmware that is used by a lot of users, and all users are reporting the same problem with difference PCs and obviously cables.

When Log Output Level is debug, do you see any messages in the UART0?

no

SuGlider commented 2 months ago

@sblantipodi - I have tested it with Arduino 3.0.3, using an ESP32-S3 + HW Serial JTAG USB port. It worked fine for about 3 hours receiving data from a Python script.

Sketch:

// using S3 devKit - RGB LED will indicate that CDC has been
// open by Python Script.
// If S3 resets, Python script will hang/fail and LED will be kept RED

// Serial is the USB port - Enable CDC on Boot!
// Serial0 is the UART0 - Console -- Serial Monitor

void setup() {
  neopixelWrite(RGB_BUILTIN, RGB_BRIGHTNESS, 0, 0);  // Red
  Serial.begin();
  Serial.setRxBufferSize(1500);
  Serial.setTimeout(10); // reduces the time waiting for receiving bytes
  Serial0.begin(115200);
  Serial0.setDebugOutput(true);
  while (!Serial) delay(100);
  neopixelWrite(RGB_BUILTIN, 0, RGB_BRIGHTNESS, 0);  // Green
  Serial0.println("Starting... run the Python Script.");
  delay(2000);
}

#define CONFIG_PREFIX_LENGTH 1500
void loop() {
  byte pre[CONFIG_PREFIX_LENGTH];

  size_t prefixLength = Serial.readBytes((byte *) pre, 1500);
  if (prefixLength > 0) {
    Serial0.println(prefixLength);
  }
}

Python running on a Windows 11 computer:

import serial

print ("CDC App test for issue 10000")

try:
    # CDC same as SERIAL_8N1 - Arduino equivalent
    # Change 'com15' to what ever is your USB CDC serial device name (win/linux)
    # timeout=None means that it will work as a blocking read()
    # write() is blocking by default

    CDC = serial.Serial(port='com15', baudrate=115200, parity=serial.PARITY_NONE,
    stopbits=serial.STOPBITS_ONE, bytesize=serial.EIGHTBITS, timeout=None) 
except:
    print("COM15: port is busy or unavailable")
    exit()

# Configure Serial Out and In as necessary using UART and/or CDC with respective config in the sketch

count = 1
chars_100 = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUV"

while 1:
    # writes 100 bytes in blocking mode
    for x in range(count):
        CDC.write(chars_100.encode('utf_8'))
    print("#"+str(count * 100)+" bytes sent!")
    count = count + 1
    if (count > 50):
        count = 1

Findings:

No issue after running it for more than 3 hours, non stop. Throughput is about 45,000 bytes per second (or 360Kps). Problems found when Power Plan from Windows Control Panel has timeout for Blanking the Screen or for Suspending Activities. In that case I found out that S3 doesn't reset, but the PC stops transmission.

In order to make it work with no failure, I has to set both (blanking screen/ suspend activities0 to "Never". Using a Linux computer that is not necessary as the Python script runs in background forever.

SuGlider commented 2 months ago

@sblantipodi - The sketch and script from https://github.com/espressif/arduino-esp32/issues/10000#issuecomment-2266328360 has been running for almost 24 hours. No issue so far.

Both, PC and S3 are communicating using USB. No disconnection. I think that the issue isn't in the Arduino/USB side. I may be in the windows/linux application side.

sblantipodi commented 2 months ago

@sblantipodi - The sketch and script from https://github.com/espressif/arduino-esp32/issues/10000#issuecomment-2266328360 has been running for almost 24 hours. No issue so far.

Both, PC and S3 are communicating using USB. No disconnection. I think that the issue isn't in the Arduino/USB side. I may be in the windows/linux application side.

@SuGlider that sketch lacks the WiFi connectivity part. Please, just connect It to WiFi and test It again. I'll do the same and report it. I really appreciate what you are doing. Thanks.

SuGlider commented 2 months ago

Please, just connect It to WiFi and test It again.

No issues again.

// using S3 devKit - RGB LED will indicate that CDC has been
// open by Python Script.
// If S3 resets, Python script will hang/fail and LED will be kept RED

// Serial is the USB port - Enable CDC on Boot!
// Serial0 is the UART0 - Console -- Serial Monitor

#include <WiFi.h>
#include <WiFiMulti.h>
#include <HTTPClient.h>

WiFiMulti wifiMulti;

void setup() {
  neopixelWrite(RGB_BUILTIN, RGB_BRIGHTNESS, 0, 0);  // Red

  Serial.begin();
  Serial.setRxBufferSize(1500);
  Serial.setTimeout(10); // reduces the time waiting for receiving bytes

  wifiMulti.addAP("SSID", "PWD");
  Serial0.begin(115200);
  Serial0.setDebugOutput(true);

  Serial0.println("Connecting Wifi...");
  Serial0.println();
  while (wifiMulti.run() != WL_CONNECTED) {
    Serial0.print(".");
    delay(100);
  }
  Serial0.println();

  if (wifiMulti.run() == WL_CONNECTED) {
    Serial0.println("");
    Serial0.println("WiFi connected");
    Serial0.println("IP address: ");
    Serial0.println(WiFi.localIP());
  }

  HTTPClient http;
  // testing connection...
  Serial0.print("[HTTP] begin...\n");
  http.begin("http://google.com/index.html");  //HTTP

  Serial0.print("[HTTP] GET...\n");
  // start connection and send HTTP header
  int httpCode = http.GET();

  // httpCode will be negative on error
  if (httpCode > 0) {
    // HTTP header has been send and Server response header has been handled
    Serial0.printf("[HTTP] GET... code: %d\n", httpCode);

    // file found at server
    if (httpCode == HTTP_CODE_OK) {
      String payload = http.getString();
      Serial0.println(payload);
    }
  } else {
    Serial0.printf("[HTTP] GET... failed, error: %s\n", http.errorToString(httpCode).c_str());
  }

  http.end();

  Serial0.println("Starting... run the Python Script.");
  while (!Serial) delay(100);
  neopixelWrite(RGB_BUILTIN, 0, RGB_BRIGHTNESS, 0);  // Green
}

#define CONFIG_PREFIX_LENGTH 1500
uint32_t count = 0;
void loop() {
  byte pre[CONFIG_PREFIX_LENGTH];
  if (wifiMulti.run() != WL_CONNECTED) {
    Serial0.println(".|");
    return;
  }
  size_t prefixLength = Serial.readBytes((byte *) pre, 1500);
  if (prefixLength > 0) {
    Serial0.println(prefixLength);
  }
}
SuGlider commented 2 months ago

@sblantipodi - I think that the problem may be in the application... some problem with lack of RAM? A failed malloc() or no space for a String to allocate memory....

sblantipodi commented 2 months ago

@SuGlider can I ask you why you use Serial for read and Serial0 for write? I can't read Serial0, how am I supposed to read Serial0?

The sketch that crashes on me, uses Serial for both read/write.

SuGlider commented 2 months ago

USB Serial can read by the Python Script, is necessary. The Windows COM port is open by this script. I use Serial0 as console and open that with the Arduino Serial Monitor.

With reagards to reading and writing from USB Serial, there is a caveat that has to do with TinyUSB / HW Serial JTAG driver and the related tasks used to populate/consume the RX/TX Buffers. The Arduino Sketch runs in a very low Task priority. The USB Tasks run on a very high priority. This may cause some problems.

SuGlider commented 2 months ago

I can't read Serial0, how am I supposed to read Serial0?

I see that your board is a Lolin ESP32-S3. It has no USB-UART chip on it.

It is possible to read the UART by using an external UART-USB converter, based on chips ike CP2102, CH430, etc. It may be possible to use another ESP32 board that has such converter and use its chip or by running a sketch that reads/forwards UART1 to UART0...

sblantipodi commented 2 months ago

@SuGlider thanks for the answer, I really, really appreciate it.

I see that your board is a Lolin ESP32-S3. It has no USB-UART chip on it.

I have a lot of boards from various manufacturers and very very few of them has both USB and UART.

I know very little boards that has both USB and UART chip and that boards are more "development boards" than real ones... I mean, what's the point of having both USB and UART on the same board ?

I think that my problem is caused by the fact that I write and read on Serial (USB).

Will espressif ever fix this problem? Having two different interfaces for read and write, is not really a solution to this :) is there a workaround to read and write using USB serial without making the driver to crash?

sblantipodi commented 2 months ago

I confirm that if I don't use the same Serial for read ad write it doesn't crash. If I use the same Serial for both read and write, the USB driver crashes, the device is disconnected from the PC, the ESP hangs and there is no way to recover it if not by manually rebooting it.

SuGlider commented 2 months ago

Thanks @sblantipodi for the confirmation. If possible, just confirm that the crash happens in both USB Modes: TinyUSB and HWSerial JTAG. It can be configured using the Arduino IDE menu, but it is necessary to build each mode and possibly upload it using BOOT+RESET buttons in order to put the S3 into download mode.

sblantipodi commented 2 months ago

Thanks @sblantipodi for the confirmation. If possible, just confirm that the crash happens in both USB Modes: TinyUSB and HWSerial JTAG. It can be configured using the Arduino IDE menu, but it is necessary to build each mode and possibly upload it using BOOT+RESET buttons in order to put the S3 into download mode.

Thank you for your time @SuGlider, I appreciate it. Yes, I have tested it in both TinyUSB and HWSerial JTAG mode. Sometimes the crash happens in the first 5 minutes, sometimes it takes longer but there is no way to make it stable.

Same problem on different boards like UE TinyS3 and Adafruit ESP32-S3 Feather. Problems does not happen on the standard ESP32 using CH340 chip.

I tried creating different tasks for read/write, I tried giving the tasks a different priority, I tried pinning them at different cores but nothing solved or improved the problem.

The only way to make it stable is to stop writing on serial, but this isn't a solution clearly :) The less I write, the more time is needed to crash, but if I write something, a crash will happen, sooner or later. In normal conditions a firmware can recover operations after a crash, but in this case is impossible because when the crash happens, only a manual reboot of the device make it working again.

SuGlider commented 2 months ago

I see. I need a test case that I can use to investigate the issue. Let me know if you have a pair sketch/python that I could use to reproduce it. In the meanwhile, I'll try to create this testing code and latter post it here.

sblantipodi commented 2 months ago

@SuGlider here something that may help you reproduce the problem.

Sketch


#include "Arduino.h"

#include <WiFi.h>
#include <WiFiMulti.h>
#include <HTTPClient.h>

WiFiMulti wifiMulti;

void setup() {
  neopixelWrite(RGB_BUILTIN, RGB_BRIGHTNESS, 0, 0);  // Red

  Serial.begin();
  Serial.setRxBufferSize(1500);
  Serial.setTimeout(10); // reduces the time waiting for receiving bytes

  wifiMulti.addAP("SSID", "PWD");
  Serial.begin(115200);
  Serial.setDebugOutput(true);

  Serial.println("Connecting Wifi...");
  Serial.println();
  while (wifiMulti.run() != WL_CONNECTED) {
    Serial.print(".");
    delay(100);
  }
  Serial.println();

  if (wifiMulti.run() == WL_CONNECTED) {
    Serial.println("");
    Serial.println("WiFi connected");
    Serial.println("IP address: ");
    Serial.println(WiFi.localIP());
  }

  HTTPClient http;
  // testing connection...
  Serial.print("[HTTP] begin...\n");
  http.begin("http://google.com/index.html");  //HTTP

  Serial.print("[HTTP] GET...\n");
  // start connection and send HTTP header
  int httpCode = http.GET();

  // httpCode will be negative on error
  if (httpCode > 0) {
    // HTTP header has been send and Server response header has been handled
    Serial.printf("[HTTP] GET... code: %d\n", httpCode);

    // file found at server
    if (httpCode == HTTP_CODE_OK) {
      String payload = http.getString();
      Serial.println(payload);
    }
  } else {
    Serial.printf("[HTTP] GET... failed, error: %s\n", http.errorToString(httpCode).c_str());
  }

  http.end();

  Serial.println("Starting... run the Python Script.");
  while (!Serial) delay(100);
  neopixelWrite(RGB_BUILTIN, 0, RGB_BRIGHTNESS, 0);  // Green
}

#define CONFIG_PREFIX_LENGTH 1500
uint32_t count = 0;
unsigned long previousMillisA = 0;
const long intervalA = 1000;

void loop() {
  byte pre[CONFIG_PREFIX_LENGTH];
  if (wifiMulti.run() != WL_CONNECTED) {
    Serial.println(".|");
    return;
  }
  size_t prefixLength = Serial.readBytes((byte *) pre, 1500);
  if (prefixLength > 0) {
    Serial.println(prefixLength);
  }
  while(Serial.available() > 0) {
    char t = Serial.read();
  }

  unsigned long currentMillisA = millis();
  if (currentMillisA - previousMillisA >= intervalA) {
    previousMillisA = currentMillisA;
    Serial.println("MSG_sent_to_the_python_program_every_second msg can be pretty big sometimes. Lorem ipsum dolor");
    Serial.println("MSG_sent_to_the_python_program_every_second msg can be pretty big sometimes. Lorem ipsum dolor");
    Serial.println("MSG_sent_to_the_python_program_every_second msg can be pretty big sometimes. Lorem ipsum dolor");
    Serial.println("MSG_sent_to_the_python_program_every_second msg can be pretty big sometimes. Lorem ipsum dolor");
    Serial.println("MSG_sent_to_the_python_program_every_second msg can be pretty big sometimes. Lorem ipsum dolor");
    Serial.println("MSG_sent_to_the_python_program_every_second msg can be pretty big sometimes. Lorem ipsum dolor");
    Serial.println("MSG_sent_to_the_python_program_every_second msg can be pretty big sometimes. Lorem ipsum dolor");
    Serial.println("MSG_sent_to_the_python_program_every_second msg can be pretty big sometimes. Lorem ipsum dolor");
    Serial.println("MSG_sent_to_the_python_program_every_second msg can be pretty big sometimes. Lorem ipsum dolor");
    Serial.println("MSG_sent_to_the_python_program_every_second msg can be pretty big sometimes. Lorem ipsum dolor");
    Serial.println("MSG_sent_to_the_python_program_every_second msg can be pretty big sometimes. Lorem ipsum dolor");
  }
}

Python program:

import serial
import time

arduino = serial.Serial(port='COM5', baudrate=115200, timeout=5)

def send_message():
    while True:        
        arduino.write(b'Lorem ipsum dolor sit amet Lorem ipsum dolor sit amet Lorem ipsum dolor sit amet Lorem ipsum dolor sit amet Lorem ipsum dolor sit amet Lorem ipsum dolor sit amet Lorem ipsum dolor sit amet Lorem ipsum dolor sit amet Lorem ipsum dolor sit amet Lorem ipsum dolor sit amet Lorem ipsum dolor sit amet Lorem ipsum dolor sit amet Lorem ipsum dolor sit amet Lorem ipsum dolor sit amet Lorem ipsum dolor sit amet Lorem ipsum dolor sit amet Lorem ipsum dolor sit amet Lorem ipsum dolor sit amet Lorem ipsum dolor sit amet\n') 
        response = arduino.readline().decode('utf-8').strip() 
        if response:
            print(f"Received: {response}")
        time.sleep(0.008) 

if __name__ == "__main__":
    send_message()

If you reduce the "Lorem ipsum dolor" string length it requires more time to crash, the bigger the string is, the faster the crash is.

I exxagerated the values to make it crash faster but in a real world application, the program will crash too, sooner or later.

sblantipodi commented 2 months ago

hi @SuGlider were you able to reproduce the problem with my snippets? if yes, can you remove the Resolution: Unable to reproduce label? :)

vickash commented 2 months ago

The only way to make it stable is to stop writing on serial, but this isn't a solution clearly :) The less I write, the more time is needed to crash, but if I write something, a crash will happen, sooner or later. In normal conditions a firmware can recover operations after a crash, but in this case is impossible because when the crash happens, only a manual reboot of the device make it working again.

I have a project using a call-and-response type protocol, implemented over Serial, and I'm having exactly the same issue when there's a lot of data being passed back and forth over USB CDC. Using a Lolin S3, just like @sblantipodi.

Problems does not happen on the standard ESP32 using CH340 chip.

My S3 board has a CH340 on board, so I switched to that interface and have a test running. No issues so far after about 10 minutes. Will update if that changes.

I'm using version 3.0.4. The same issue occurs with CDC on my S2, and C3, which both worked fine at some point last year, when I put this project down for a while. Same on the new (to me) H2 and C6.

I will try rolling everything back to see if I can find a working state, and let you know which version of the core that was, if it's any help.

sblantipodi commented 2 months ago

@vickash I confirm that using CH340 is the way to go for stability currently but this is bad because most of the newer ESPs boards doesn't use that chip by default. Regarding the USB implementation, I have the same problem on both Arduino core 3.x and 2.x.

vickash commented 2 months ago

I let my S3 run for about an hour on the CH340, without issue, then stopped it. Now I'm trying CDC again on core version 2.0.14, which I think was the last version I used before 3. No issues so far, running about 10 minutes.

Are you using 2.0.17 @sblantipodi? Maybe try your example on 2.0.14?

vickash commented 2 months ago

Now I'm trying CDC again on core version 2.0.14

This is still running after 17 hours. It's definitely something that changed after 2.0.14. Maybe something going from IDF 4 to 5, not necessarily the Arduino core itself?

sblantipodi commented 2 months ago

I let my S3 run for about an hour on the CH340, without issue, then stopped it. Now I'm trying CDC again on core version 2.0.14, which I think was the last version I used before 3. No issues so far, running about 10 minutes.

Are you using 2.0.17 @sblantipodi? Maybe try your example on 2.0.14?

I'll try it next week and report back. Thanks

sblantipodi commented 1 month ago

my ESP continue to hang randomly. it can't handle many reads and few writes from time to time. USB stack hangs.

vickash commented 1 month ago

my ESP continue to hang randomly. it can't handle many reads and few writes from time to time. USB stack hangs.

With 2.0.14?

sblantipodi commented 1 month ago

Yes. USB stack crashes. On 2.0.14 took an hour to crash, but it crashed.

sblantipodi commented 1 month ago

the very bad part is that when the USB stack crashes the ESP does not reset, WDT is not triggered. I can't access Serial anymore but other parts of the firmware continue to work, like the web interface for example.

vickash commented 1 month ago

the very bad part is that when the USB stack crashes the ESP does not reset, WDT is not triggered. I can't access Serial anymore but other parts of the firmware continue to work, like the web interface for example.

I've observed the same thing in my tests, when I can get it to fail.

Using my test script (I'm sending binary data from Ruby to the Arduino, 0-255 each time, then expecting that echoed back as ASCII digits), it does not fail on 2.0.14 through 2.0.16, but will consistently fail in about 5 minutes on 2.0.17. So my issue seems to be caused by something that changed in that version.

I also tried your Python script above, and left it running overnight, compiling the Arduino sketch with core version 3.0.4, ESP32-S3, USB-CDC (Hardware CDC & JTAG).

It "stutters" most of the time, where there's no output for maybe a second, then a lot of "Received:" lines print all at once. Sometimes the output is more smooth. Not sure if that's relevant, or just a Python issue, but it didn't stop working, even after about 12 hours running.

I'm really confused at this point.

sblantipodi commented 1 month ago

@SuGlider said in another issue that this is a known issue since 2022. I don't know why we didn't come to the same conclusion earlier in this issue and why this issue is marked as "unable to reproduce"

https://github.com/espressif/arduino-esp32/issues/10323