Open dzubybb opened 1 year ago
IIRC @MabezDev did some tests and the results where much, much better than your results. t.b.h. I haven't benchmarked it myself
Maybe worth to try this with a different client (e.g. another PC) and see if what the WiFI and the server can do - just to rule that out
i ruled out weak wifi connection by moving esp close to router. I'm using https://github.com/dheijl/swyh-rs as audio source, so i think i should be fine. I'll try to make some simple rust client to try the speed on my laptop, but maybe somebody will notice some configuration or other errors in my code.
I also had the same problem the last two days. I wrote multiple benchmarks for download speeds. The code used is available here. Here is the result I got (on my PC I get speeds up to 25 MB/s on Wi-Fi, so the Wi-Fi is plenty fast):
It seems that there is a bottleneck somewhere, as the download speed is very similar across download sizes. The esp32-s3, which I tested on, should be capable of 2.5 MB/s (20 Mbit/s).
I tried downloading file from http://speedtest.ftp.otenet.gr/files/test10Mb.db
(which on my PC in Firefox browser clocks at > 1 MiB/s) and i get only about 10 KiB/s on esp32c3.
IIRC @MabezDev did some tests and the results where much, much better than your results. t.b.h. I haven't benchmarked it myself
Is it possible to get code for those benchmarks that @MabezDev did ? Maybe that will point me to some errors in my code.
I'm on vacation right now but the weather is even worse than it was the other days - so I took my gaming laptop and an ESP32-C3 and had a look.
I took the DHCP example and just changed the fetched URL, then added printing the duration and bytes downloaded - I got around 14000 bytes/second. Pretty bad.
Changing the socket buffer and similar simple things got me to around 37000 bytes/second. Still not good.
I tried a few things in esp-wifi
itself and changed to my laptop as the HTTP server.
Bytes 1863616, millis 3012, B/s 618730
Making HTTP request
Bytes 1863616, millis 2402, B/s 775860
Making HTTP request
Bytes 1863616, millis 4666, B/s 399403
Making HTTP request
Bytes 1863616, millis 3909, B/s 476750
Making HTTP request
Bytes 1863616, millis 2976, B/s 626215
Making HTTP request
Bytes 1863616, millis 3072, B/s 606645
Making HTTP request
Bytes 1863616, millis 4665, B/s 399488
Making HTTP request
Bytes 1863616, millis 3537, B/s 526891
Making HTTP request
Bytes 1863616, millis 4241, B/s 439428
Making HTTP request
Bytes 1863616, millis 3956, B/s 471085
Making HTTP request
Bytes 1863616, millis 3149, B/s 591812
Making HTTP request
Bytes 1863616, millis 4226, B/s 440988
Making HTTP request
Bytes 1863616, millis 3338, B/s 558303
Making HTTP request
While the speed varies a lot (probably due to my crap access-point and the device not sitting exactly next to the AP) this is much better.
Unfortunately, the changes are not in a shape that I can just make it a PR yet - will do that after my vacation since I also need to check a few things (e.g. increased memory usage, how it affects other chips).
However, I think these numbers are quite promising.
With esp-rs/esp-wifi-sys#233 it will be possible to configure various internals which got me much better performance. Finding the best values might be a trial-and-error thing however
@bjoernQ do you happen to remember what settings were effective during your july trial?
@bjoernQ do you happen to remember what settings were effective during your july trial?
I think it was like this
rx_queue_size = 20
tx_queue_size = 5
static_rx_buf_num = 32
dynamic_rx_buf_num = 16
ampdu_rx_enable = 1
ampdu_tx_enable = 1
rx_ba_win = 32
max_burst_size = 8
FWIW the best I could achieve is still an order of magnitude slower than yours.
Average speed: 34.3kB/s KiB/s
I'm not blocked on my display, I'm not bottlenecked by TLS (yet), this speed is the same with/without HTTPS. Did you happen to have some other modifications privately that didn't get into esp-wifi by any chance? :)
IIRC it was really just what is also in the tuning document. But I also had to use large receive and socket buffers to get there - I can try to reproduce it when I'm back home (next week)
Changing the buffer size may be a good idea. There are some details in my firmware that make it difficult to test right now but I'll try and play with it some.
Update: I wasn't able to achieve much by upping my socket buffer from 4k to 32k. Some improvement, ~10-15% on average.
Using this code with merged esp-rs/esp-wifi-sys#233 :
I was able to tune my esp32c3 to get around 400 KB/s. I've used multivariate experiment (https://www.youtube.com/watch?v=5oULEuOoRd0) to find out which settings has most influence on download speed.
I've used those values for experiments:
test | rx_queue_size | tx_queue_size | static_rx_buf_num | dynamic_rx_buf_num | static_tx_buf_num | dynamic_tx_buf_num | ampdu_rx_enable | ampdu_tx_enable | rx_ba_win | max_burst_size | country_code |
---|---|---|---|---|---|---|---|---|---|---|---|
1 | 5 | 3 | 10 | 32 | 0 | 32 | 0 | 0 | 6 | 1 | CN |
2 | 20 | 5 | 32 | 16 | 16 | 16 | 1 | 1 | 32 | 8 | PL |
And those were the results:
1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | ||
---|---|---|---|---|---|---|---|---|---|---|---|---|
test | speed B/s (avg of 3) | rx_queue_size | tx_queue_size | static_rx_buf_num | dynamic_rx_buf_num | static_tx_buf_num | dynamic_tx_buf_num | ampdu_rx_enable | ampdu_tx_enable | rx_ba_win | max_burst_size | country_code |
1 | 124764 | 5 | 3 | 10 | 32 | 0 | 32 | 0 | 0 | 6 | 1 | CN |
2 | 60115 | 5 | 3 | 10 | 32 | 0 | 16 | 1 | 1 | 32 | 8 | PL |
3 | 63889 | 5 | 3 | 32 | 16 | 16 | 32 | 0 | 0 | 32 | 8 | PL |
4 | 114179 | 5 | 5 | 10 | 16 | 16 | 32 | 1 | 1 | 6 | 1 | PL |
5 | 65398 | 5 | 5 | 32 | 32 | 16 | 16 | 0 | 1 | 6 | 8 | CN |
6 | 133764 | 5 | 5 | 32 | 16 | 0 | 16 | 1 | 0 | 32 | 1 | CN |
7 | 407828 | 20 | 3 | 32 | 16 | 0 | 32 | 1 | 1 | 6 | 8 | CN |
8 | 113233 | 20 | 3 | 32 | 32 | 16 | 16 | 1 | 0 | 6 | 1 | PL |
9 | 114566 | 20 | 3 | 10 | 16 | 16 | 16 | 0 | 1 | 32 | 1 | CN |
10 | 116413 | 20 | 5 | 32 | 32 | 0 | 32 | 0 | 1 | 32 | 1 | PL |
11 | 435955 | 20 | 5 | 10 | 16 | 0 | 16 | 0 | 0 | 6 | 8 | PL |
12 | 328365 | 20 | 5 | 10 | 32 | 16 | 32 | 1 | 0 | 32 | 8 | CN |
SUM 1 | 562109 | 884395 | 1177944 | 808288 | 1278839 | 1155438 | 920985 | 1199970 | 1261357 | 716919 | 1174685 | |
SUM 2 | 1516360 | 1194074 | 900525 | 1270181 | 799630 | 923031 | 1157484 | 878499 | 817112 | 1361550 | 903784 |
Unfortunately i live in very RF noisy place, so i couldn't rule out external variables, but i think this method can lead to even better results.
I have been testing esp32s3 with an ambition to push ~2.5MB/s over TCP from the board; the most I was able to get from esp-wifi was 300kBytes/s, achieved mostly due to larger smoltcp buffer. To put in in perspective, though, the C iperf example from esp-idf, achieves 5-5.5 MBytes/s.
The board config reads:
#
# ESP32S3-specific
#
CONFIG_ESP_WIFI_STATIC_RX_BUFFER_NUM=16
CONFIG_ESP_WIFI_DYNAMIC_RX_BUFFER_NUM=64
CONFIG_ESP_WIFI_DYNAMIC_TX_BUFFER_NUM=64
CONFIG_ESP_WIFI_AMPDU_TX_ENABLED=y
CONFIG_ESP_WIFI_TX_BA_WIN=32
CONFIG_ESP_WIFI_AMPDU_RX_ENABLED=y
CONFIG_ESP_WIFI_RX_BA_WIN=32
CONFIG_LWIP_TCP_SND_BUF_DEFAULT=65535
CONFIG_LWIP_TCP_WND_DEFAULT=65535
CONFIG_LWIP_TCP_RECVMBOX_SIZE=64
CONFIG_LWIP_UDP_RECVMBOX_SIZE=64
CONFIG_LWIP_TCPIP_RECVMBOX_SIZE=64
CONFIG_ESP_DEFAULT_CPU_FREQ_MHZ_240=y
CONFIG_ESP_DEFAULT_CPU_FREQ_MHZ=240
CONFIG_ESPTOOLPY_FLASHMODE_QIO=y
CONFIG_ESPTOOLPY_FLASHFREQ_80M=y
CONFIG_ESP32S3_INSTRUCTION_CACHE_32KB=y
CONFIG_ESP32S3_INSTRUCTION_CACHE_LINE_32B=y
CONFIG_ESP32S3_INSTRUCTION_CACHE_WRAP=y
So it seems that the secret ingredients are increasing the flash read speed and bumping up the ICACHE... Is there a way to change them in a no-std Rust stack?
So it seems that the secret ingredients are increasing the flash read speed and bumping up the ICACHE... Is there a way to change them in a no-std Rust stack?
Those things are currently not configurable in esp-hal. While it will make a difference, I think it needs more since 300k vs 5M is a huge difference. Can you post your cfg.toml
?
Sure:
[esp-wifi]
rx_queue_size = 3
tx_queue_size = 3
static_rx_buf_num = 16
dynamic_rx_buf_num = 16
static_tx_buf_num=16
dynamic_tx_buf_num=16
ampdu_rx_enable = 1
ampdu_tx_enable = 1
country_code="PL"
rx_ba_win = 32
tx_ba_win = 32
max_burst_size = 8
EspHeap is 100 kib, RX buffer 1 kib (since its only for ACKs), TX buffer is 64 kib, to mirror ESP config. It would benefit from larger queues since it emits a lot of "no TX token" warnings, but it seem to OOM if it is increased (I'm very new to ESP32 and its memory layout so I'm likely doing something stupid, though).
I'm going to test how disabling flash/cache in sdkconfig hurts the idf demo, this should give us a hint how useful they are.
I was not able to run the iperf example with modified parameters (this fancy pseudo-shell got broken easier than wifi), yet I made a simple project with esp-idf stack pushing 1KiB of static data in a blocking way. I have used the following sdkconfig.defaults
file:
##--- This part is from Rust template ---
# Rust often needs a bit of an extra main task stack size compared to C (the default is 3K)
CONFIG_ESP_MAIN_TASK_STACK_SIZE=8000
# Use this to set FreeRTOS kernel tick frequency to 1000 Hz (100 Hz by default).
# This allows to use 1 ms granuality for thread sleeps (10 ms by default).
#CONFIG_FREERTOS_HZ=1000
# Workaround for https://github.com/espressif/esp-idf/issues/7631
#CONFIG_MBEDTLS_CERTIFICATE_BUNDLE=n
#CONFIG_MBEDTLS_CERTIFICATE_BUNDLE_DEFAULT_FULL=n
##--- This is the IPERF demo config ---
## -- The Modem/LWIP part --
CONFIG_ESP_WIFI_STATIC_RX_BUFFER_NUM=16
CONFIG_ESP_WIFI_DYNAMIC_RX_BUFFER_NUM=64
CONFIG_ESP_WIFI_DYNAMIC_TX_BUFFER_NUM=64
CONFIG_ESP_WIFI_AMPDU_TX_ENABLED=y
CONFIG_ESP_WIFI_TX_BA_WIN=32
CONFIG_ESP_WIFI_AMPDU_RX_ENABLED=y
CONFIG_ESP_WIFI_RX_BA_WIN=32
CONFIG_LWIP_TCP_SND_BUF_DEFAULT=65535
CONFIG_LWIP_TCP_WND_DEFAULT=65535
CONFIG_LWIP_TCP_RECVMBOX_SIZE=64
CONFIG_LWIP_UDP_RECVMBOX_SIZE=64
CONFIG_LWIP_TCPIP_RECVMBOX_SIZE=64
## -- The Cpufreq part --
CONFIG_ESP_DEFAULT_CPU_FREQ_MHZ_240=y
CONFIG_ESP_DEFAULT_CPU_FREQ_MHZ=240
## -- The QIO (flash speed) part --
CONFIG_ESPTOOLPY_FLASHMODE_QIO=y
CONFIG_ESPTOOLPY_FLASHFREQ_80M=y
## -- The ICACHE part --
CONFIG_ESP32S3_INSTRUCTION_CACHE_32KB=y
CONFIG_ESP32S3_INSTRUCTION_CACHE_LINE_32B=y
CONFIG_ESP32S3_INSTRUCTION_CACHE_WRAP=y
It has three blocks that I was then turning on/off; here are my results (def
means defaults/block off, iperf
means settings from the demo):
Modem/LWIP | CPU Freq | QIO+Flash | ICACHE | Speed |
---|---|---|---|---|
def | 160MHz | def | def | 643KiB/s |
iperf | 240MHz | iperf | iperf | 3.62MiB/s |
iperf | 240MHz | def | iperf | 3.62MiB/s |
iperf | 240MHz | def | def | 879KiB/s |
def | 160MHz | def | iperf | 1.69MiB/s |
def | 240MHz | iperf | iperf | 2.06MiB/s |
It is pretty evident that instruction cache is a main source of speed-up; playing with buffers and queue sizes helps, but I doubt it can get one over the 1MiBps barrier.
You could try playing with patching various values from here: https://github.com/esp-rs/esp-hal/blob/4d87e75d71b55d546b37aff6a4494cefb743c4a9/esp32s3-hal/src/lib.rs#L76-L100 in esp-hal. We have a longer-standing issue around cache configuration here: https://github.com/esp-rs/esp-hal/issues/955.
Edit: for the other chips, I believe the cache settings can be configured in the bootloader and they will persist to the main app, so we may be able to get some speed up on other chips too.
Another factor is that xtensa-lx-rt curren't can't do lazy loading of float registers (maybe we should just put it behind a feature?) so that will affect the context switch performance greatly.
@mbq you may wish to test with esp-rs/esp-wifi-sys#430, on RISCV at least I was seeing 2MB/s upload speeds. It seems there are still some bottlenecks on Xtensa though.
@MabezDev For a fast test I tried bumping esp-wifi to main, and the result was 50% throughput drop, from 313KiB/s to 139KiB/s (this is on S3 board with a sync code that just tries to push as many bytes as possible); I haven't tested cache patches yet.
Anyhow, I decided to use IDF for the project I'm doing now, but I'm still keeping my fingers crossed for a solution here.
I have a problem with slow download speed on esp32c3. I'm trying to build a simple audio sink, which connects to audio source in my local network and plays audio through I2S DAC. For now I'm testing download speed, because i wan't to stream uncompressed audio at CD quality which requires about 172 KiB/s of data throughput. On ESP datasheet i saw 20 Mbit/s download speed, so it should be possibile.
This is my test program based on examples :
And speeds i get :
So I'm more than 10 times short :( Is there anything i can do to speed this up ? I have tried increasing buffers size but it doesn't change anything. Ultimately i would like to use DMA transfer to push incoming audio data to I2S, is this possible on ESP and this library ?