espressif / esp-idf

Espressif IoT Development Framework. Official development framework for Espressif SoCs.
Apache License 2.0
13.49k stars 7.26k forks source link

USB CDC can receive but cannot transmit data (IDFGH-12740) #13722

Closed StevenMacias closed 3 months ago

StevenMacias commented 5 months ago

Answers checklist.

IDF version.

v5.1.2

Espressif SoC revision.

ESP32-S3

Operating System used.

Linux

How did you build your project?

Command line with idf.py

If you are using Windows, please specify command line type.

None

Development Kit.

ESP32-S3-DevKitC-1 v1.0

Power Supply used.

USB

What is the expected behavior?

Reliable bi-directional USB CDC communication.

What is the actual behavior?

The board can receive data but cannot send data back to the host.

Steps to reproduce.

  1. get_idf
  2. cp -r $IDF_PATH/examples/peripherals/usb/device/tusb_serial_device Workspace/local/espressif/serial_issue
  3. cd Workspace/local/espressif/serial_issue
  4. idf.py set-target esp32s3
  5. Connect micro USB to the UART connector on the DevKit
  6. idf.py flash monitor
  7. Connect micro USB to the USB connector on the DevKit
  8. Execute spamming script python serial_spammer.py
    
    import serial
    import time

Configure the serial port

ser = serial.Serial( port='/dev/ttyACM0', # Change this to your UART port baudrate=115200, # Change this to match your device's baud rate timeout=0.01, # Timeout for readline() function rtscts = False, dsrdtr = False, )

Open the serial port

if(ser.isOpen() == False):

ser.open()

if ser.is_open: try:

Track the start time

    start_time = time.time()
    for i in range(10000):
        message = 'Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.\n'
        ser.write(message.encode())

        # Read the reply
        replied = False
        while replied == False:
            reply = ser.readline().decode().strip()
            if reply != "":
                print("Message sent:", message)
                print("Reply received:", reply)
                replied = True

        time.sleep(0.25)

    # Calculate the elapsed time
    elapsed_time = time.time() - start_time
    print("Script has been running for {:.2f} minutes.".format(elapsed_time / 60))

except Exception as e:
    print("Error:", e)
    print("Closing Serial")
    ser.close()

finally:
    # Close the serial port
    ser.close()

else: print("Serial port is not open.")


9. Disconnect `USB` cable while spamming. The spammer program will exit.
10. Reconnect `USB` cable and execute spamming script again. 
11. Repeat step 9 and 10 until the spammer program does not show replies. At this point the issue is triggered. You can send data, the ESP32 will receive data, it will 'reply' but the data will never reach the host.
12. Double check the issue with `minicom -D /dev/ttyACM0`. Bytes are received and plotted in the `idf.py monitor` but not echoed to minicom. 
...

### Debug Logs.

_No response_

### More Information.

Our project only has one USB (not self-powered device) but this issue appears by just using the interface. However, the symptoms are exactly the same than when the USB is disconnected in the DevKit. The USB disconnection helps us to trigger the issue quickly but from our experience we can say it can happen at any moment. Right after reset or after an hour of usage. 

This issue seems related to https://github.com/espressif/esp-idf/issues/13518, however we believe that reproducing it with a vanilla example and a DevKit might be helpful to triangulate the root cause. Following some of the comments, we are using the latest version of espressif/tinyusb (0.15.0~8) but the issue persists.
peter-marcisovsky commented 5 months ago

Hi @StevenMacias thanks for reporting. I have managed to replicate the issue by following your steps to reproduce. Will take a look at it an let you know.

StevenMacias commented 4 months ago

Hi @peter-marcisovsky, do you have any update regarding this issue? It is impacting seriously our project and we cannot find a workaround... Thanks!

peter-marcisovsky commented 4 months ago

Hi @StevenMacias, sorry for a delay. After some extensive testing, following your steps to reproduce, I have found a repeating pattern, causing an error.

I observed, that the USB device sometimes sends a 0 length CDC IN transfer to the USB host, right after re-plugging the USB device. The device sends the transfer before the USB host is ready to receive one, which in general might cause transfer problems. (The host indicates readiness for data reception by usb_cdc_line_state_changed_callback.)

Once the 0 length CDC in transfer appears before the host is ready to receive any transfer, the error is triggered and the USB device behaves just like you are explaining.

To fix the issue, I canceled all the pending transfers after the USB device is mounted, by stalling and un-stalling the endpoints. I could not reproduce the error anymore after that.

A pseudo function to cancel the pending transfers, would be placed into the TinyUSB device stack usbd.c and would look like this:

void usbd_edpt_stall_unstall(uint8_t rhport, uint8_t ep_addr) {
    usbd_edpt_stall(rhport, ep_addr);
    usbd_edpt_clear_stall(rhport, ep_addr);
}

The function shall be invoked before any data transfer is made, or right after unplugging the USB device. In my case, I used mount callback from TinyUSB TU_ATTR_WEAK void tud_mount_cb(void); The callback is used, for example, in our addition to the MSC Device class here, this way the endpoints are cleared, right after enumerating and mounting of the USB device.

The TinyUSB log of stalling and un-stalling the endpoints shall look like this:

Mounted                       // USB device mounted

    Stall EP 81               // Stall Interrupt IN EP 0x81 
    Clear Stall EP 81
    Stall EP 02               // Stall Bulk OUT EP 0x02
    Clear Stall EP 02
    Stall EP 82               // Stall Bulk IN EP 0x82
    Clear Stall EP 82

What could also be a problem in your case especially, is the way how the tinyusb_cdc_rx_callback in tusb_serial_device example is written. There is sending CDC data and receiving CDC data happening in the callback, which might be causing issues in your case, if you are using the similar logic in your code. I will be updating the callbacks in our examples accordingly using xSendQueue and xReceiveQueue in following days.

I hope this workaround resolves the issue you are having.

StevenMacias commented 4 months ago

Thanks a lot for your reply! We will test it immediately and see if it solves the issue. Thanks again :smile:

peter-marcisovsky commented 3 months ago

Hi @StevenMacias any update on this issue? If you problem was solved, feel free to close the issue.

StevenMacias commented 3 months ago

@peter-marcisovsky so sorry for the delay, we have been a quite overloaded lately. This does not seems to solve our issue since it also fails during runtime, not only when plugging and unplugging the cable.

This is how our TX callback looks like:

esp_err_t usb_cdc_serial_tx(const char* buffer, size_t length)
{
    ESP_LOGD(TAG, "Send:\t%s", buffer);
    size_t bytes_written = tinyusb_cdcacm_write_queue(TINYUSB_CDC_ACM_0, (uint8_t*)buffer, length);
    if (bytes_written != length)
        ESP_LOGW(TAG, "USB serial TX FIFO lacked space, dropping outgoing message; bytes_written: %d, length: %d", bytes_written, length);
    return tinyusb_cdcacm_write_flush(TINYUSB_CDC_ACM_0, pdMS_TO_TICKS(200));
}

When the issue is triggered it looks like this, please pay attention to the "USB serial TX FIFO lacked space" log and correlate it with the above function.

2024-07-09 10:02:37 I (10309906) DOWNSTREAM: Processing: msg_num: 2196252184, command: previous
2024-07-09 10:02:38 I (10310264) UPSTREAM: src_act: playing  - src_pos: 5
2024-07-09 10:02:38 E (10310378) rmt-uart: not a stop bit byte_num=0
2024-07-09 10:02:38 W (10310466) tusb_cdc_acm: Flush failed
2024-07-09 10:02:53 I (10325815) MAIN_APP: Current audio buffer utilization: 87.33%
2024-07-09 10:02:53 I (10325816) MAIN_APP: Audio buffer difference: [0.00% in 30 seconds]
2024-07-09 10:02:53 I (10325862) DOWNSTREAM: Processing: msg_num: 672158606, command: previous [Original Message from Host]
2024-07-09 10:02:54 W (10326165) tusb_cdc_acm: Flush failed
2024-07-09 10:02:55 I (10327363) DOWNSTREAM: Processing: msg_num: 672158606, command: previous [Host Retry]
2024-07-09 10:02:55 I (10327621) UPSTREAM: src_act: playing  - src_pos: 4
2024-07-09 10:02:55 I (10327622) MAIN_APP: [Masterlink event] Track change
2024-07-09 10:02:55 W (10327666) tusb_cdc_acm: Flush failed
2024-07-09 10:02:55 W (10328067) tusb_cdc_acm: Flush failed
2024-07-09 10:02:57 I (10329167) DOWNSTREAM: Processing: msg_num: 672158606, command: previous [Host Retry]
2024-07-09 10:02:57 W (10329469) tusb_cdc_acm: Flush failed
2024-07-09 10:02:59 I (10331169) DOWNSTREAM: Processing: msg_num: 672158606, command: previous [Host Retry]
2024-07-09 10:02:59 W (10331270) USB-CDC: USB serial TX FIFO lacked space, dropping outgoing message; bytes_written: 80, length: 82
2024-07-09 10:02:59 W (10331471) tusb_cdc_acm: Flush failed
2024-07-09 10:03:01 I (10333171) DOWNSTREAM: Processing: msg_num: 672158606, command: previous [Host Retry]
2024-07-09 10:03:01 W (10333272) USB-CDC: USB serial TX FIFO lacked space, dropping outgoing message; bytes_written: 0, length: 82
2024-07-09 10:03:01 W (10333473) tusb_cdc_acm: Flush failed
2024-07-09 10:03:03 I (10335173) DOWNSTREAM: Processing: msg_num: 902212124, type: system, command: reset
2024-07-09 10:03:03 W (10335173) USB: Stopping TinyUSB task
2024-07-09 10:03:03 W (10335274) USB-CDC: USB serial TX FIFO lacked space, dropping outgoing message; bytes_written: 0, length: 77
2024-07-09 10:03:03 W (10335475) tusb_cdc_acm: Flush failed

Basically we are sending a JSON message over CDC from Host to ESP32 (Downstream), a message is reported upstream, you can also see tha we are using the RMT peripheral at the same time (I don't know if that can be related). When message 2196252184 is sent, the Flush failed issue is triggered. However the reply for 2196252184 seems to be correctly sent to the host since the host does not retry to send the message. After that, the host sends a new message 672158606 that never receives a reply and that is why it is re-sent many times.

Can you see something wrong in our callback? Not sure if you shared the new version of the examples already. Also, we did some modifications to the esp_tinyusb component to support USB audio, do you know if it is in the roadmap to officially support that feature anytime soon?

Thanks again and sorry for the late feedback.

peter-marcisovsky commented 3 months ago

Hi @StevenMacias sorry to hear that the issue has not been resolved. Could first try to modify your callback according to the as-promised tusb_serial_device example update here

Is there any reason, why are you using timeout other than 0 in your tinyusb_cdcacm_write_flush function? Does the tinyusb_cdcacm_write_flush return any error?

Regarding the audio support. We just released our UAC driver (USB Host side) here. The USB Device side is not a priority right now for us.

StevenMacias commented 2 months ago

Hi @peter-marcisovsky, I think our TX and RX callbacks are already detached similarly to your example. We use xMessageBufferSend instead for sending information downstream. Here you have them, I have updated the TX callback to print the tinyusb_cdcacm_write_flush error:

esp_err_t usb_cdc_serial_tx(const char* buffer, size_t length)
{
    ESP_LOGD(TAG, "Send:\t%s", buffer);
    size_t bytes_written = tinyusb_cdcacm_write_queue(TINYUSB_CDC_ACM_0, (uint8_t*)buffer, length);
    if (bytes_written != length) {
        ESP_LOGW(TAG, "USB CDC failed to write queue. Written %d out of %d bytes.", bytes_written, length);
        ESP_LOG_BUFFER_HEXDUMP(TAG, buffer, length, ESP_LOG_WARN);
    }
    esp_err_t err = tinyusb_cdcacm_write_flush(TINYUSB_CDC_ACM_0, pdMS_TO_TICKS(200));
    if (err != ESP_OK) {
        ESP_LOGE(TAG, "Write flush failed. Reason: %s", esp_err_to_name(err));
    }
    return err;
}

Since we are expecting JSON strings that end with \n we do some extra logic to build the whole message.

void usb_cdc_rx_callback(int interface_index, cdcacm_event_t*)
{
    size_t rx_size = 0;
    esp_err_t ret = tinyusb_cdcacm_read(interface_index, usb_cdc_rx_buf,
        CONFIG_TINYUSB_CDC_RX_BUFSIZE, &rx_size);

    if (ret == ESP_OK) {
        if (msg_idx <= (USB_CDC_MSG_MAX_SIZE - rx_size)) {
            memcpy(serial_msg + msg_idx, usb_cdc_rx_buf, rx_size);
            msg_idx += rx_size;
        } else {
            ESP_LOGE(TAG, "Message too big (%zu+%zu bytes) for buffer size (%d bytes)",
                msg_idx,
                rx_size,
                USB_CDC_MSG_MAX_SIZE);
            msg_idx = 0;
            usb_cdc_clear_serial_buffers();
        }

        if (wanted_char_flag) {
            wanted_char_flag = false;
            // ESP_LOG_BUFFER_HEXDUMP(TAG, serial_msg, msg_idx, ESP_LOG_INFO);
            if (msg_buffer_handle != NULL) {
                size_t written_bytes = xMessageBufferSend(msg_buffer_handle, serial_msg, msg_idx, 0);
                if (written_bytes < msg_idx) {
                    ESP_LOGW(TAG, "RX Message buffer is full.");
                }
            } else {
                ESP_LOGW(TAG, "Received data but message buffer not registered.");
            }
            usb_cdc_clear_serial_buffers();
        }
    } else {
        ESP_LOGE(TAG, "Read error");
    }
}

No special reason to use a timeout in tinyusb_cdcacm_write_flush other that playing with that value to see if it made a difference, I will change it to 0. The error returned by the flush function when the issue is triggered is:

2024-07-15 10:34:40 W (251977906) tusb_cdc_acm: Flush failed
2024-07-15 10:34:40 E (251977906) USB-CDC: Write flush failed. Reason: ESP_ERR_TIMEOUT

Thanks again for your time and support! :smiley:

peter-marcisovsky commented 2 months ago

1) There should also be a xMessageBufferReceive() somewhere in your usb_cdc_serial_tx() if I am not mistaken. Just like in our example, there is xQueueSend() in rx_callback and xQueueReceive() in the tx while loop

2) How and from where are you calling your usb_cdc_serial_tx() ? In our example, we are calling a transmitting part of the code, once an Queue item is successfully received from the Queue

3) Do you have multiple writers/readers of teh messagebuffer? Doc here

4) xMessageBuffer is in IDF internally ported to xStreamBuffer here. Please mind the differences between those two and make sure you are using it correctly.

5) Regarding the tinyusb_cdcacm_write_flush() returning errors, or warnings: please mind this warning

StevenMacias commented 2 months ago

Hi @peter-marcisovsky,

include "freertos/FreeRTOS.h"

include "freertos/message_buffer.h"

include "esp_err.h"

define USB_CDC_TERMINATION_CHAR 0x0A

define USB_CDC_MSG_MAX_SIZE 2048u

esp_err_t usb_cdc_init(void); esp_err_t usb_cdc_serial_tx(const char* buffer, size_t length); void usb_cdc_serial_register_message_buffer(MessageBufferHandle_t msg_buffer);

The goal of `usb_cdc_init` is to configure and call `tusb_cdc_acm_init`. Also it sets up the wanted character (`tud_cdc_n_set_wanted_char`). 
Then I use `usb_cdc_serial_register_message_buffer` to store a reference to the message buffer handle and therefore being able to decouple the `usb_cdc_rx_callback` from the message processing logic. 

At the other end of the message buffer it will be a task (`Message Router`) that will process the JSON message and trigger different actions: 

```C
static char message_router_rx_msg[USB_CDC_MSG_MAX_SIZE] = { 0 };
static char message_router_tx_msg[USB_CDC_MSG_MAX_SIZE] = { 0 };
static MessageBufferHandle_t message_router_tx_msg_buf_handle;

void message_router_application_task(void* pvParams)
{
    // This is the same handler used in usb_cdc_serial_register_message_buffer
    MessageBufferHandle_t message_router_rx_msg_buf_handle = (MessageBufferHandle_t)pvParams;
    // This tx handler is used internally by this task
    message_router_tx_msg_buf_handle = xMessageBufferCreate(USB_CDC_MSG_MAX_SIZE);
    size_t rx_message_size = 0;
    size_t tx_message_size = 0;

    for (;;) {
        rx_message_size = xMessageBufferReceive(
            message_router_rx_msg_buf_handle,
            message_router_rx_msg,
            sizeof(message_router_rx_msg),
            pdMS_TO_TICKS(MESSAGE_ROUTER_TX_WAIT_MS));

        if (rx_message_size > 0) {
            message_router_process_incoming_message(message_router_rx_msg, rx_message_size, NULL);
            // Limit downstream message throughput
            vTaskDelay(pdMS_TO_TICKS(MESSAGE_ROUTER_TX_SENT_DELAY_MS));
            ESP_LOGD(TAG, "Sending message downstream");
            ESP_LOG_BUFFER_HEXDUMP(TAG, message_router_rx_msg, rx_message_size, ESP_LOG_DEBUG);
        }

        tx_message_size = xMessageBufferReceive(
            message_router_tx_msg_buf_handle,
            message_router_tx_msg,
            sizeof(message_router_tx_msg),
            pdMS_TO_TICKS(MESSAGE_ROUTER_RX_WAIT_MS));

        if (tx_message_size > 0) {
            usb_cdc_serial_tx(message_router_tx_msg, tx_message_size);
            // Limit upstream message throughput
            vTaskDelay(pdMS_TO_TICKS(MESSAGE_ROUTER_RX_SENT_DELAY_MS));
            ESP_LOGD(TAG, "Sending message upstream");
            ESP_LOG_BUFFER_HEXDUMP(TAG, message_router_tx_msg, tx_message_size, ESP_LOG_DEBUG);
        }
    }
}

The task loop basically checks two message buffers, the one shared with the USB CDC component (message_router_rx_msg_buf_handle) and the internal one used to send replies back (message_router_tx_msg_buf_handle). The message_router_process_incoming_message will analyze the JSON message coming from USB CDC and do some action. For example, if a syntax error is detected in the incoming message, a JSON string will be build with information about the error and written into the message_router_tx_msg_buf_handle. Finally, since data has been written into message_router_tx_msg_buf_handle, the function usb_cdc_serial_tx will be called.

I do not have a message buffer from Message Router to USB CDC because I do not have a task that can block waiting for a message buffer to be written. Do you think that is an issue? As I mentioned at the beginning we did not find the need of the USB CDC component to be a task.

static MessageBufferHandle_t message_router_msg_buf_handle;

static void init_tasks()
{
    message_router_msg_buf_handle = xMessageBufferCreate(USB_CDC_MSG_MAX_SIZE);

    xTaskCreate(message_router_application_task,
        "message_router_application_task",
        (1024 * 6),
        (void*)message_router_msg_buf_handle,
        tskIDLE_PRIORITY + 1,
        NULL);
}

void main_application_task(void* pvParams)
{
    [...]
    init_tasks();
    init_led();
    [..]
    usb_cdc_serial_register_message_buffer(message_router_msg_buf_handle);
    [...]
    err = usb_init(); // calls tinyusb_driver_install and usb_cdc_init
    if (err != ESP_OK) {
        ESP_LOGE(TAG, "Error initializing USB: %s", esp_err_to_name(err));
    }
    for (;;) {
        [...]
    }
}

As always, thank you so much for your time and help! :smile: