Open tomaszduda23 opened 1 year ago
Because of the protocol overhead it is not possible to get 12 Mbps.
Quoting from "USB Complete" 4th ed., pg. 24.:
Of USB’s four transfer types, the fastest on an otherwise idle bus are bulk transfers, with theoretical maximums of around 1.2 MB/s at full speed, 53 MB/s at high speed, and 400 MB/s at SuperSpeed. Isochronous transfers can request the most bandwidth (1.023 MB/s at full speed, 24.576 MB/s at high speed, and 393 MB/s at SuperSpeed).
Moreover, when there are several devices attached to a hub, they share the bandwidth.
Certainly, using DMA could help to get the most out of the possible throughput. What speeds do you get?
From USB device to PC in bulk mode (64bytes) max is around 0.55MB/s. According to this post max theoretical speed would be twice that much https://esp32.com/viewtopic.php?t=28840.
I measured around 560KiB/s when sending data at full throttle as a CDC device. I wanted to look into the DMA possibility but discovered that the register documentation is missing in the technical reference manual (1.2) for esp32s3. Seems to be the same for esp32s2.
https://github.com/espressif/esp-idf/blob/master/components/soc/esp32s3/include/soc/usb_reg.h seems to have some register documentation.
Good news, I managed to send pretty much 1MB/s as a cdc device (see picture below). You do not even need DMA, the reason for the underutilized bandwidth is latency and inefficient use of the hardware USB FIFO.
I see some issues with the tinyusb stack (somewhat specific to cdc device, but not all):
cdc_device.c::tud_cdc_n_write_flush
always sends a buffer of 64Bytes to the lower layer. If you increase epin_buf
from CFG_TUD_CDC_EP_BUFSIZE
to 8*CFG_TUD_CDC_EP_BUFSIZE
, you can achieve 750kiB/s throughput. Further increase in buffer size does not increase transfer speed.dcd_esp32sx.c
configures the hardware fifo size to a certain value in dcd_edpt_open
but transmit_packet
always pushes 64 bytes (xfer->max_size
) into the FIFO, instead of the previously configured hardware fifo size.dcd_edpt_open
assumes that all hardware endpoints are used. One should rather define at compile time the maximum number of required endpoints. Then the available FIFO size can be split between less endpoints (e.g. CDC needs 2 IN endpoints, but currently FIFO size is shared for 5 endpoints). I think this is how it is done for STM32 chips.dcd_esp32sx.c
defines #define EP_FIFO_SIZE 1024
but the technical manual specifies 4096!. In the comment it says ( 1280 or 4096 bytes )
, so which is it?Points 1,3 are optimizations. Points 2,4 are bugs in my opinion.
For quick testing, I increased EP_FIFO_SIZE
to 4096 and assumed a fifo size of 256 inside dcd_esp32sx::transmit_packet
instead of xfer->max_size
.
You found pretty interesting things.
Those seems to have some similarities:
https://www.silabs.com/documents/public/reference-manuals/ezr32hg-rm.pdf 15.4.4.2.3.1 Packet Write in Slave Mode
https://github.com/torvalds/linux/blob/6a8f57ae2eb07ab39a6f0ccad60c760743051026/drivers/usb/dwc2/gadget.c#L602
Apparently they use the USB IP block from synopsis, hence tinyusb supports this as a generic part: https://github.com/hathach/tinyusb/tree/master/src/portable/synopsys/dwc2
#define EP_FIFO_SIZE 1024
this seems to be correct. Technical Reference Manual also says The portion of SPRAM that can be used for FIFO allocation has a depth of 256 and a width of 35 bits (32 data bits plus 3 control bits)
. If you set size bigger data seems to be overwritten when checking with
for(int i = 0; i < 256; ++i){
esp_rom_printf("%08lx", USB0.dbg_fifo[i]);
}
esp_rom_printf("\n");
I guess that still each FIFO could have different size to use memory more efficient.
cdc_device.c::tud_cdc_n_write_flush
always sends a buffer of 64Bytes to the lower layer. If you increaseepin_buf
fromCFG_TUD_CDC_EP_BUFSIZE
to8*CFG_TUD_CDC_EP_BUFSIZE
, you can achieve 750kiB/s throughput. Further increase in buffer size does not increase transfer speed.
I tried this (I also tried doing the same to epout_buf as well) but didn't see any change from the ~51KiB/s I was already getting without any tweaks
Just checking https://github.com/espressif/tinyusb/blob/master/src/class/cdc/cdc_device.c#L70 is the line to change, and that alone should be enough to see some improvement?
I must admit I found it rather surprising an ESP32-S3 was so slow at CDC serial compared to an ancient AT91SAM7S chip which sits around ~600KiB/s at less than a quarter of the clock speed
Same observation here, using ESP-IDF 5.2 on an ESP32-S3 (240 MHz): I'm nailed down to exactly 50 KiB/s of CDC speed, regardless of what I'm doing. The modification above hasn't changed anything. Changing the CDC FIFO sizes (up to 16384 bytes) doesn't help, too.
If you are at only 50KiB/s, then there is a major other problem. My findings were done on ESP-IDF 5.1.1.
To be more precise, I'm testing the USB-CDC read speed. It is connected to a Win10 computer with my C# test-program. The C# program sends data as fast as it can, the ESP32-S3 reads the data as fast as it can. Read/write block sizes had no influence. But what I've seen is that the tinyusb_cdcacm_read function never reads more than 64 bytes in one read, regardless of the provided buffer size or the write block size in the C# program. Maybe tinyusb (on IDF 5.2) is limited to ONE read operation per ms? With a little bit overhead, this would lead to ~50 KiB/s.
To be more precise, I'm testing the USB-CDC read speed. It is connected to a Win10 computer with my C# test-program. The C# program sends data as fast as it can, the ESP32-S3 reads the data as fast as it can. Read/write block sizes had no influence. But what I've seen is that the tinyusb_cdcacm_read function never reads more than 64 bytes in one read, regardless of the provided buffer size or the write block size in the C# program. Maybe tinyusb (on IDF 5.2) is limited to ONE read operation per ms? With a little bit overhead, this would lead to ~50 KiB/s.
For reference I was testing the other way with the ESP32-S3 sending data to the host.
I'm using PlatformIO with the platform-espressif32 6.5.0 version which uses ESP-IDF 5.1.2. Seems odd to have such a major change in behaviour between 5.1.1 or 5.1.2.
Is your feature request related to a problem?
I'm trying to transfer a lot of data from ESP to PC over USB. I'm far from 12mbit/s. I wonder if the transfer speed would be improved if DMA mode is used instead of Slave. Is there any plan to make it implemented?
Describe the solution you'd like.
No response
Describe alternatives you've considered.
No response
Additional context.
No response