espressif / esp-idf

Espressif IoT Development Framework. Official development framework for Espressif SoCs.
Apache License 2.0
13.7k stars 7.3k forks source link

DMA for USB device (IDFGH-9832) #11161

Open tomaszduda23 opened 1 year ago

tomaszduda23 commented 1 year ago

Is your feature request related to a problem?

I'm trying to transfer a lot of data from ESP to PC over USB. I'm far from 12mbit/s. I wonder if the transfer speed would be improved if DMA mode is used instead of Slave. Is there any plan to make it implemented?

Describe the solution you'd like.

No response

Describe alternatives you've considered.

No response

Additional context.

No response

zjanosy commented 1 year ago

Because of the protocol overhead it is not possible to get 12 Mbps.

Quoting from "USB Complete" 4th ed., pg. 24.:

Of USB’s four transfer types, the fastest on an otherwise idle bus are bulk transfers, with theoretical maximums of around 1.2 MB/s at full speed, 53 MB/s at high speed, and 400 MB/s at SuperSpeed. Isochronous transfers can request the most bandwidth (1.023 MB/s at full speed, 24.576 MB/s at high speed, and 393 MB/s at SuperSpeed).

Moreover, when there are several devices attached to a hub, they share the bandwidth.

Certainly, using DMA could help to get the most out of the possible throughput. What speeds do you get?

tomaszduda23 commented 1 year ago

From USB device to PC in bulk mode (64bytes) max is around 0.55MB/s. According to this post max theoretical speed would be twice that much https://esp32.com/viewtopic.php?t=28840.

jrahlf commented 1 year ago

I measured around 560KiB/s when sending data at full throttle as a CDC device. I wanted to look into the DMA possibility but discovered that the register documentation is missing in the technical reference manual (1.2) for esp32s3. Seems to be the same for esp32s2.

usb_registers_missing
tomaszduda23 commented 1 year ago

https://github.com/espressif/esp-idf/blob/master/components/soc/esp32s3/include/soc/usb_reg.h seems to have some register documentation.

jrahlf commented 1 year ago

Good news, I managed to send pretty much 1MB/s as a cdc device (see picture below). You do not even need DMA, the reason for the underutilized bandwidth is latency and inefficient use of the hardware USB FIFO.

I see some issues with the tinyusb stack (somewhat specific to cdc device, but not all):

  1. cdc_device.c::tud_cdc_n_write_flush always sends a buffer of 64Bytes to the lower layer. If you increase epin_buf from CFG_TUD_CDC_EP_BUFSIZE to 8*CFG_TUD_CDC_EP_BUFSIZE, you can achieve 750kiB/s throughput. Further increase in buffer size does not increase transfer speed.
  2. dcd_esp32sx.c configures the hardware fifo size to a certain value in dcd_edpt_open but transmit_packet always pushes 64 bytes (xfer->max_size) into the FIFO, instead of the previously configured hardware fifo size.
  3. dcd_edpt_open assumes that all hardware endpoints are used. One should rather define at compile time the maximum number of required endpoints. Then the available FIFO size can be split between less endpoints (e.g. CDC needs 2 IN endpoints, but currently FIFO size is shared for 5 endpoints). I think this is how it is done for STM32 chips.
  4. dcd_esp32sx.c defines #define EP_FIFO_SIZE 1024 but the technical manual specifies 4096!. In the comment it says ( 1280 or 4096 bytes ), so which is it?

Points 1,3 are optimizations. Points 2,4 are bugs in my opinion.

For quick testing, I increased EP_FIFO_SIZE to 4096 and assumed a fifo size of 256 inside dcd_esp32sx::transmit_packet instead of xfer->max_size.

image

tomaszduda23 commented 1 year ago

You found pretty interesting things. Those seems to have some similarities: https://www.silabs.com/documents/public/reference-manuals/ezr32hg-rm.pdf 15.4.4.2.3.1 Packet Write in Slave Mode https://github.com/torvalds/linux/blob/6a8f57ae2eb07ab39a6f0ccad60c760743051026/drivers/usb/dwc2/gadget.c#L602

jrahlf commented 1 year ago

Apparently they use the USB IP block from synopsis, hence tinyusb supports this as a generic part: https://github.com/hathach/tinyusb/tree/master/src/portable/synopsys/dwc2

tomaszduda23 commented 1 year ago

#define EP_FIFO_SIZE 1024 this seems to be correct. Technical Reference Manual also says The portion of SPRAM that can be used for FIFO allocation has a depth of 256 and a width of 35 bits (32 data bits plus 3 control bits). If you set size bigger data seems to be overwritten when checking with

    for(int i = 0; i < 256; ++i){
      esp_rom_printf("%08lx", USB0.dbg_fifo[i]);
    }
    esp_rom_printf("\n");

I guess that still each FIFO could have different size to use memory more efficient.

nvx commented 8 months ago
  1. cdc_device.c::tud_cdc_n_write_flush always sends a buffer of 64Bytes to the lower layer. If you increase epin_buf from CFG_TUD_CDC_EP_BUFSIZE to 8*CFG_TUD_CDC_EP_BUFSIZE, you can achieve 750kiB/s throughput. Further increase in buffer size does not increase transfer speed.

I tried this (I also tried doing the same to epout_buf as well) but didn't see any change from the ~51KiB/s I was already getting without any tweaks

Just checking https://github.com/espressif/tinyusb/blob/master/src/class/cdc/cdc_device.c#L70 is the line to change, and that alone should be enough to see some improvement?

I must admit I found it rather surprising an ESP32-S3 was so slow at CDC serial compared to an ancient AT91SAM7S chip which sits around ~600KiB/s at less than a quarter of the clock speed

Superberti commented 8 months ago

Same observation here, using ESP-IDF 5.2 on an ESP32-S3 (240 MHz): I'm nailed down to exactly 50 KiB/s of CDC speed, regardless of what I'm doing. The modification above hasn't changed anything. Changing the CDC FIFO sizes (up to 16384 bytes) doesn't help, too.

jrahlf commented 8 months ago

If you are at only 50KiB/s, then there is a major other problem. My findings were done on ESP-IDF 5.1.1.

Superberti commented 8 months ago

To be more precise, I'm testing the USB-CDC read speed. It is connected to a Win10 computer with my C# test-program. The C# program sends data as fast as it can, the ESP32-S3 reads the data as fast as it can. Read/write block sizes had no influence. But what I've seen is that the tinyusb_cdcacm_read function never reads more than 64 bytes in one read, regardless of the provided buffer size or the write block size in the C# program. Maybe tinyusb (on IDF 5.2) is limited to ONE read operation per ms? With a little bit overhead, this would lead to ~50 KiB/s.

nvx commented 7 months ago

To be more precise, I'm testing the USB-CDC read speed. It is connected to a Win10 computer with my C# test-program. The C# program sends data as fast as it can, the ESP32-S3 reads the data as fast as it can. Read/write block sizes had no influence. But what I've seen is that the tinyusb_cdcacm_read function never reads more than 64 bytes in one read, regardless of the provided buffer size or the write block size in the C# program. Maybe tinyusb (on IDF 5.2) is limited to ONE read operation per ms? With a little bit overhead, this would lead to ~50 KiB/s.

For reference I was testing the other way with the ESP32-S3 sending data to the host.

I'm using PlatformIO with the platform-espressif32 6.5.0 version which uses ESP-IDF 5.1.2. Seems odd to have such a major change in behaviour between 5.1.1 or 5.1.2.