esp-rs / esp-hal

no_std Hardware Abstraction Layers for ESP32 microcontrollers
https://docs.esp-rs.org/esp-hal/
Apache License 2.0
751 stars 208 forks source link

Download speed #1605

Open dzubybb opened 1 year ago

dzubybb commented 1 year ago

I have a problem with slow download speed on esp32c3. I'm trying to build a simple audio sink, which connects to audio source in my local network and plays audio through I2S DAC. For now I'm testing download speed, because i wan't to stream uncompressed audio at CD quality which requires about 172 KiB/s of data throughput. On ESP datasheet i saw 20 Mbit/s download speed, so it should be possibile.

This is my test program based on examples :

#![no_std]
#![no_main]
#![feature(type_alias_impl_trait)]

//extern crate alloc;

use esp_backtrace as _;
use embassy_executor::_export::StaticCell;
use embassy_net::tcp::TcpSocket;
use embassy_net::{Config, Ipv4Address, Stack, StackResources};

use embassy_executor::Executor;
use embassy_time::{Duration, Timer, Instant};
use embedded_svc::wifi::{ClientConfiguration, Configuration, Wifi};
use esp_backtrace as _;
use esp_println::logger::init_logger;
use esp_println::println;
use esp_wifi::wifi::{WifiController, WifiDevice, WifiEvent, WifiMode, WifiState};
use esp_wifi::{initialize, EspWifiInitFor};
use hal::{
    clock::{ClockControl, CpuClock},
    Rng, embassy,
    peripherals::Peripherals,
    prelude::*,
    timer::TimerGroup, Rtc,
    systimer::SystemTimer,
    i2s::{I2s0New},
};

const SSID: &str = env!("SSID");
const PASSWORD: &str = env!("PASSWORD");

macro_rules! singleton {
    ($val:expr) => {{
        type T = impl Sized;
        static STATIC_CELL: StaticCell<T> = StaticCell::new();
        let (x,) = STATIC_CELL.init(($val,));
        x
    }};
}

static EXECUTOR: StaticCell<Executor> = StaticCell::new();

#[entry]
fn main() -> ! {
    init_logger(log::LevelFilter::Info);

    let peripherals = Peripherals::take();

    let system = peripherals.SYSTEM.split();
    let mut peripheral_clock_control = system.peripheral_clock_control;
    let clocks = ClockControl::configure(system.clock_control, CpuClock::Clock160MHz).freeze();

    let mut rtc = Rtc::new(peripherals.RTC_CNTL);
    rtc.swd.disable();
    rtc.rwdt.disable();

    let timer = SystemTimer::new(peripherals.SYSTIMER).alarm0;

    let init = initialize(
        EspWifiInitFor::Wifi,
        timer,
        Rng::new(peripherals.RNG),
        system.radio_clock_control,
        &clocks,
    ).unwrap();

    let (wifi, _) = peripherals.RADIO.split();

    let (wifi_interface, controller) = esp_wifi::wifi::new_with_mode(&init, wifi, WifiMode::Sta);

    let timer_group0 = TimerGroup::new(peripherals.TIMG0, &clocks, &mut peripheral_clock_control);
    embassy::init(&clocks, timer_group0.timer0);

    let config = Config::dhcpv4(Default::default());

    let seed = 1234; // very random, very secure seed

    // Init network stack
    let stack = &*singleton!(Stack::new(
        wifi_interface,
        config,
        singleton!(StackResources::<3>::new()),
        seed
    ));

    let executor = EXECUTOR.init(Executor::new());
    executor.run(|spawner| {
        spawner.spawn(connection(controller)).ok();
        spawner.spawn(net_task(&stack)).ok();
        spawner.spawn(get_page(&stack)).ok();
    })
}

#[embassy_executor::task]
async fn connection(mut controller: WifiController<'static>) {
    println!("start connection task");
    println!("Device capabilities: {:?}", controller.get_capabilities());
    loop {
        match esp_wifi::wifi::get_wifi_state() {
            WifiState::StaConnected => {
                // wait until we're no longer connected
                controller.wait_for_event(WifiEvent::StaDisconnected).await;
                Timer::after(Duration::from_millis(5000)).await
            }
            _ => {}
        }
        if !matches!(controller.is_started(), Ok(true)) {
            let client_config = Configuration::Client(ClientConfiguration {
                ssid: SSID.into(),
                password: PASSWORD.into(),
                ..Default::default()
            });
            controller.set_configuration(&client_config).unwrap();
            println!("Starting wifi");
            controller.start().await.unwrap();
            println!("Wifi started!");
        }
        println!("About to connect...");

        match controller.connect().await {
            Ok(_) => println!("Wifi connected!"),
            Err(e) => {
                println!("Failed to connect to wifi: {e:?}");
                Timer::after(Duration::from_millis(5000)).await
            }
        }
    }
}

#[embassy_executor::task]
async fn net_task(stack: &'static Stack<WifiDevice<'static>>) {
    println!("start net_task task");
    stack.run().await
}

#[embassy_executor::task]
async fn get_page(stack: &'static Stack<WifiDevice<'static>>) {
    let mut rx_buffer = [0; 40960];
    let mut tx_buffer = [0; 40960];

    println!("start get_page task");

    loop {
        if stack.is_link_up() {
            break;
        }
        Timer::after(Duration::from_millis(500)).await;
    }

    println!("Waiting to get IP address...");
    loop {
        if let Some(config) = stack.config_v4() {
            println!("Got IP: {}", config.address);
            break;
        }
        Timer::after(Duration::from_millis(500)).await;
    }

    loop {
        Timer::after(Duration::from_millis(1_000)).await;

        let mut socket = TcpSocket::new(&stack, &mut rx_buffer, &mut tx_buffer);

        socket.set_timeout(Some(embassy_time::Duration::from_secs(10)));

        let remote_endpoint = (Ipv4Address::new(10, 1, 1, 85), 5901);
        println!("connecting...");
        let r = socket.connect(remote_endpoint).await;
        if let Err(e) = r {
            println!("connect error: {:?}", e);
            continue;
        }
        println!("connected!");
        let mut buf = [0; 20480];
        loop {
            use embedded_io::asynch::Write;
            let r = socket
                .write_all(b"GET /stream/swyh.wav HTTP/1.1\r\nHost: 10.1.1.85:5901\r\n\r\n")
                .await;
            if let Err(e) = r {
                println!("write error: {:?}", e);
                break;
            }

            let mut measure_start = Instant::now();
            let mut bytes_downloaded_total = 0;
            loop {
                let bytes_downloaded = match socket.read(&mut buf).await {
                    Ok(0) => {
                        println!("read EOF");
                        break;
                    }
                    Ok(n) => n,
                    Err(e) => {
                        println!("read error: {:?}", e);
                        break;
                    }
                };
                bytes_downloaded_total += bytes_downloaded;

                let now = Instant::now();
                let elapsed_ms = now.duration_since(measure_start).as_millis();
                if elapsed_ms >= 1000 {
                    let kib_per_sec = bytes_downloaded_total as f32 / elapsed_ms as f32;
                    println!("got {}B in {}ms {:05.2} KB/s", bytes_downloaded_total, elapsed_ms, kib_per_sec);

                    measure_start = now;
                    bytes_downloaded_total = 0;
                }
            }
        }
        Timer::after(Duration::from_millis(3000)).await;
    }
}

And speeds i get :

got 15818B in 1028ms 15.39 KB/s
got 15818B in 1044ms 15.15 KB/s
got 12942B in 1075ms 12.04 KB/s
got 12942B in 1142ms 11.33 KB/s
got 12942B in 1023ms 12.65 KB/s
got 11504B in 1154ms 09.97 KB/s
got 14380B in 1078ms 13.34 KB/s
got 14380B in 1013ms 14.20 KB/s
got 11504B in 1097ms 10.49 KB/s

So I'm more than 10 times short :( Is there anything i can do to speed this up ? I have tried increasing buffers size but it doesn't change anything. Ultimately i would like to use DMA transfer to push incoming audio data to I2S, is this possible on ESP and this library ?

bjoernQ commented 1 year ago

IIRC @MabezDev did some tests and the results where much, much better than your results. t.b.h. I haven't benchmarked it myself

Maybe worth to try this with a different client (e.g. another PC) and see if what the WiFI and the server can do - just to rule that out

dzubybb commented 1 year ago

i ruled out weak wifi connection by moving esp close to router. I'm using https://github.com/dheijl/swyh-rs as audio source, so i think i should be fine. I'll try to make some simple rust client to try the speed on my laptop, but maybe somebody will notice some configuration or other errors in my code.

Sofiman commented 1 year ago

I also had the same problem the last two days. I wrote multiple benchmarks for download speeds. The code used is available here. Here is the result I got (on my PC I get speeds up to 25 MB/s on Wi-Fi, so the Wi-Fi is plenty fast):

image

It seems that there is a bottleneck somewhere, as the download speed is very similar across download sizes. The esp32-s3, which I tested on, should be capable of 2.5 MB/s (20 Mbit/s).

dzubybb commented 1 year ago

I tried downloading file from http://speedtest.ftp.otenet.gr/files/test10Mb.db (which on my PC in Firefox browser clocks at > 1 MiB/s) and i get only about 10 KiB/s on esp32c3.

dzubybb commented 1 year ago

IIRC @MabezDev did some tests and the results where much, much better than your results. t.b.h. I haven't benchmarked it myself

Is it possible to get code for those benchmarks that @MabezDev did ? Maybe that will point me to some errors in my code.

bjoernQ commented 1 year ago

I'm on vacation right now but the weather is even worse than it was the other days - so I took my gaming laptop and an ESP32-C3 and had a look.

I took the DHCP example and just changed the fetched URL, then added printing the duration and bytes downloaded - I got around 14000 bytes/second. Pretty bad.

Changing the socket buffer and similar simple things got me to around 37000 bytes/second. Still not good.

I tried a few things in esp-wifi itself and changed to my laptop as the HTTP server.

Bytes 1863616, millis 3012, B/s 618730
Making HTTP request

Bytes 1863616, millis 2402, B/s 775860
Making HTTP request

Bytes 1863616, millis 4666, B/s 399403
Making HTTP request

Bytes 1863616, millis 3909, B/s 476750
Making HTTP request

Bytes 1863616, millis 2976, B/s 626215
Making HTTP request

Bytes 1863616, millis 3072, B/s 606645
Making HTTP request

Bytes 1863616, millis 4665, B/s 399488
Making HTTP request

Bytes 1863616, millis 3537, B/s 526891
Making HTTP request

Bytes 1863616, millis 4241, B/s 439428
Making HTTP request

Bytes 1863616, millis 3956, B/s 471085
Making HTTP request

Bytes 1863616, millis 3149, B/s 591812
Making HTTP request

Bytes 1863616, millis 4226, B/s 440988
Making HTTP request

Bytes 1863616, millis 3338, B/s 558303
Making HTTP request

While the speed varies a lot (probably due to my crap access-point and the device not sitting exactly next to the AP) this is much better.

Unfortunately, the changes are not in a shape that I can just make it a PR yet - will do that after my vacation since I also need to check a few things (e.g. increased memory usage, how it affects other chips).

However, I think these numbers are quite promising.

bjoernQ commented 1 year ago

With esp-rs/esp-wifi-sys#233 it will be possible to configure various internals which got me much better performance. Finding the best values might be a trial-and-error thing however

bugadani commented 1 year ago

@bjoernQ do you happen to remember what settings were effective during your july trial?

bjoernQ commented 1 year ago

@bjoernQ do you happen to remember what settings were effective during your july trial?

I think it was like this

rx_queue_size = 20
tx_queue_size = 5
static_rx_buf_num = 32
dynamic_rx_buf_num = 16
ampdu_rx_enable = 1
ampdu_tx_enable = 1
rx_ba_win = 32
max_burst_size = 8
bugadani commented 1 year ago

FWIW the best I could achieve is still an order of magnitude slower than yours.

Average speed: 34.3kB/s KiB/s

I'm not blocked on my display, I'm not bottlenecked by TLS (yet), this speed is the same with/without HTTPS. Did you happen to have some other modifications privately that didn't get into esp-wifi by any chance? :)

bjoernQ commented 1 year ago

IIRC it was really just what is also in the tuning document. But I also had to use large receive and socket buffers to get there - I can try to reproduce it when I'm back home (next week)

bugadani commented 1 year ago

Changing the buffer size may be a good idea. There are some details in my firmware that make it difficult to test right now but I'll try and play with it some.

Update: I wasn't able to achieve much by upping my socket buffer from 4k to 32k. Some improvement, ~10-15% on average.

dzubybb commented 1 year ago

Using this code with merged esp-rs/esp-wifi-sys#233 :

Code ``` #![no_std] #![no_main] #[path = "../../examples-util/util.rs"] mod examples_util; use examples_util::hal; use embedded_io::blocking::*; use embedded_svc::ipv4::Interface; use embedded_svc::wifi::{AccessPointInfo, ClientConfiguration, Configuration, Wifi}; use esp_backtrace as _; use esp_println::{print, println}; use esp_wifi::wifi::utils::create_network_interface; use esp_wifi::wifi::{WifiError, WifiMode}; use esp_wifi::wifi_interface::WifiStack; use esp_wifi::{current_millis, initialize, EspWifiInitFor}; use hal::clock::ClockControl; use hal::{peripherals::Peripherals, prelude::*}; use hal::{systimer::SystemTimer, Rng}; use smoltcp::iface::SocketStorage; use smoltcp::wire::IpAddress; use smoltcp::wire::Ipv4Address; const SSID: &str = env!("SSID"); const PASSWORD: &str = env!("PASSWORD"); #[entry] fn main() -> ! { #[cfg(feature = "log")] esp_println::logger::init_logger(log::LevelFilter::Info); let peripherals = Peripherals::take(); let system = peripherals.SYSTEM.split(); let clocks = ClockControl::max(system.clock_control).freeze(); let timer = SystemTimer::new(peripherals.SYSTIMER).alarm0; let init = initialize( EspWifiInitFor::Wifi, timer, Rng::new(peripherals.RNG), system.radio_clock_control, &clocks, ) .unwrap(); let (wifi, ..) = peripherals.RADIO.split(); let mut socket_set_entries: [SocketStorage; 3] = Default::default(); let (iface, device, mut controller, sockets) = create_network_interface(&init, wifi, WifiMode::Sta, &mut socket_set_entries).unwrap(); let wifi_stack = WifiStack::new(iface, device, sockets, current_millis); let client_config = Configuration::Client(ClientConfiguration { ssid: SSID.into(), password: PASSWORD.into(), ..Default::default() }); let res = controller.set_configuration(&client_config); println!("wifi_set_configuration returned {:?}", res); controller.start().unwrap(); println!("is wifi started: {:?}", controller.is_started()); println!("Start Wifi Scan"); let res: Result<(heapless::Vec, usize), WifiError> = controller.scan_n(); if let Ok((res, _count)) = res { for ap in res { println!("{:?}", ap); } } println!("{:?}", controller.get_capabilities()); println!("wifi_connect {:?}", controller.connect()); // wait to get connected println!("Wait to get connected"); loop { let res = controller.is_connected(); match res { Ok(connected) => { if connected { break; } } Err(err) => { println!("{:?}", err); loop {} } } } println!("{:?}", controller.is_connected()); // wait for getting an ip address println!("Wait to get an ip address"); loop { wifi_stack.work(); if wifi_stack.is_iface_up() { println!("got ip {:?}", wifi_stack.get_ip_info()); break; } } println!("Start busy loop on main"); let mut rx_buffer = [0u8; 1536 * 64]; let mut tx_buffer = [0u8; 1536 * 1]; let mut socket = wifi_stack.get_socket(&mut rx_buffer, &mut tx_buffer); let mut current_try = 1; let max_tries = 3; let mut speed_sum = 0; loop { println!("Making HTTP request (count {})", current_try); socket.work(); socket .open(IpAddress::Ipv4(Ipv4Address::new(10, 1, 1, 10)), 80) .unwrap(); socket .write(b"GET /testfile HTTP/1.0\r\nHost: 10.1.1.10\r\n\r\n") .unwrap(); socket.flush().unwrap(); let wait_end = current_millis() + 20 * 1000; let mut bytes = 0; let t1 = current_millis(); let mut buffer = [0u8; 1024 * 32]; loop { if let Ok(len) = socket.read(&mut buffer) { //println!("{len}"); bytes += len; } else { break; } if current_millis() > wait_end { println!("Timeout"); break; } } let t2 = current_millis(); println!(); socket.disconnect(); let speed = bytes as u64 * 1000 / (t2 as u64 - t1 as u64); println!("Bytes {}, millis {}, B/s {}", bytes, t2 - t1, speed); speed_sum += speed; let wait_end = current_millis() + 5 * 1000; while current_millis() < wait_end { socket.work(); } current_try += 1; if current_try > max_tries { let avg_speed = speed_sum / max_tries; println!("Avg speed in {} tries {} B/s", max_tries, avg_speed); loop { } } } } ```

I was able to tune my esp32c3 to get around 400 KB/s. I've used multivariate experiment (https://www.youtube.com/watch?v=5oULEuOoRd0) to find out which settings has most influence on download speed.

I've used those values for experiments:

test rx_queue_size tx_queue_size static_rx_buf_num dynamic_rx_buf_num static_tx_buf_num dynamic_tx_buf_num ampdu_rx_enable ampdu_tx_enable rx_ba_win max_burst_size country_code
1 5 3 10 32 0 32 0 0 6 1 CN
2 20 5 32 16 16 16 1 1 32 8 PL

And those were the results:

    1 2 3 4 5 6 7 8 9 10 11
test speed B/s (avg of 3) rx_queue_size tx_queue_size static_rx_buf_num dynamic_rx_buf_num static_tx_buf_num dynamic_tx_buf_num ampdu_rx_enable ampdu_tx_enable rx_ba_win max_burst_size country_code
1 124764 5 3 10 32 0 32 0 0 6 1 CN
2 60115 5 3 10 32 0 16 1 1 32 8 PL
3 63889 5 3 32 16 16 32 0 0 32 8 PL
4 114179 5 5 10 16 16 32 1 1 6 1 PL
5 65398 5 5 32 32 16 16 0 1 6 8 CN
6 133764 5 5 32 16 0 16 1 0 32 1 CN
7 407828 20 3 32 16 0 32 1 1 6 8 CN
8 113233 20 3 32 32 16 16 1 0 6 1 PL
9 114566 20 3 10 16 16 16 0 1 32 1 CN
10 116413 20 5 32 32 0 32 0 1 32 1 PL
11 435955 20 5 10 16 0 16 0 0 6 8 PL
12 328365 20 5 10 32 16 32 1 0 32 8 CN
  SUM 1 562109 884395 1177944 808288 1278839 1155438 920985 1199970 1261357 716919 1174685
  SUM 2 1516360 1194074 900525 1270181 799630 923031 1157484 878499 817112 1361550 903784

Unfortunately i live in very RF noisy place, so i couldn't rule out external variables, but i think this method can lead to even better results.

mbq commented 9 months ago

I have been testing esp32s3 with an ambition to push ~2.5MB/s over TCP from the board; the most I was able to get from esp-wifi was 300kBytes/s, achieved mostly due to larger smoltcp buffer. To put in in perspective, though, the C iperf example from esp-idf, achieves 5-5.5 MBytes/s.

The board config reads:

#
# ESP32S3-specific
#
CONFIG_ESP_WIFI_STATIC_RX_BUFFER_NUM=16
CONFIG_ESP_WIFI_DYNAMIC_RX_BUFFER_NUM=64
CONFIG_ESP_WIFI_DYNAMIC_TX_BUFFER_NUM=64
CONFIG_ESP_WIFI_AMPDU_TX_ENABLED=y
CONFIG_ESP_WIFI_TX_BA_WIN=32
CONFIG_ESP_WIFI_AMPDU_RX_ENABLED=y
CONFIG_ESP_WIFI_RX_BA_WIN=32

CONFIG_LWIP_TCP_SND_BUF_DEFAULT=65535
CONFIG_LWIP_TCP_WND_DEFAULT=65535
CONFIG_LWIP_TCP_RECVMBOX_SIZE=64
CONFIG_LWIP_UDP_RECVMBOX_SIZE=64
CONFIG_LWIP_TCPIP_RECVMBOX_SIZE=64

CONFIG_ESP_DEFAULT_CPU_FREQ_MHZ_240=y
CONFIG_ESP_DEFAULT_CPU_FREQ_MHZ=240

CONFIG_ESPTOOLPY_FLASHMODE_QIO=y
CONFIG_ESPTOOLPY_FLASHFREQ_80M=y

CONFIG_ESP32S3_INSTRUCTION_CACHE_32KB=y
CONFIG_ESP32S3_INSTRUCTION_CACHE_LINE_32B=y
CONFIG_ESP32S3_INSTRUCTION_CACHE_WRAP=y

So it seems that the secret ingredients are increasing the flash read speed and bumping up the ICACHE... Is there a way to change them in a no-std Rust stack?

bjoernQ commented 9 months ago

So it seems that the secret ingredients are increasing the flash read speed and bumping up the ICACHE... Is there a way to change them in a no-std Rust stack?

Those things are currently not configurable in esp-hal. While it will make a difference, I think it needs more since 300k vs 5M is a huge difference. Can you post your cfg.toml?

mbq commented 9 months ago

Sure:

[esp-wifi]
rx_queue_size = 3
tx_queue_size = 3
static_rx_buf_num = 16
dynamic_rx_buf_num = 16
static_tx_buf_num=16
dynamic_tx_buf_num=16
ampdu_rx_enable = 1
ampdu_tx_enable = 1
country_code="PL"
rx_ba_win = 32
tx_ba_win = 32
max_burst_size = 8

EspHeap is 100 kib, RX buffer 1 kib (since its only for ACKs), TX buffer is 64 kib, to mirror ESP config. It would benefit from larger queues since it emits a lot of "no TX token" warnings, but it seem to OOM if it is increased (I'm very new to ESP32 and its memory layout so I'm likely doing something stupid, though).

I'm going to test how disabling flash/cache in sdkconfig hurts the idf demo, this should give us a hint how useful they are.

mbq commented 9 months ago

I was not able to run the iperf example with modified parameters (this fancy pseudo-shell got broken easier than wifi), yet I made a simple project with esp-idf stack pushing 1KiB of static data in a blocking way. I have used the following sdkconfig.defaults file:

##--- This part is from Rust template ---
# Rust often needs a bit of an extra main task stack size compared to C (the default is 3K)
CONFIG_ESP_MAIN_TASK_STACK_SIZE=8000

# Use this to set FreeRTOS kernel tick frequency to 1000 Hz (100 Hz by default).
# This allows to use 1 ms granuality for thread sleeps (10 ms by default).
#CONFIG_FREERTOS_HZ=1000

# Workaround for https://github.com/espressif/esp-idf/issues/7631
#CONFIG_MBEDTLS_CERTIFICATE_BUNDLE=n
#CONFIG_MBEDTLS_CERTIFICATE_BUNDLE_DEFAULT_FULL=n

##--- This is the IPERF demo config ---

## -- The Modem/LWIP part --
CONFIG_ESP_WIFI_STATIC_RX_BUFFER_NUM=16
CONFIG_ESP_WIFI_DYNAMIC_RX_BUFFER_NUM=64
CONFIG_ESP_WIFI_DYNAMIC_TX_BUFFER_NUM=64
CONFIG_ESP_WIFI_AMPDU_TX_ENABLED=y
CONFIG_ESP_WIFI_TX_BA_WIN=32
CONFIG_ESP_WIFI_AMPDU_RX_ENABLED=y
CONFIG_ESP_WIFI_RX_BA_WIN=32

CONFIG_LWIP_TCP_SND_BUF_DEFAULT=65535
CONFIG_LWIP_TCP_WND_DEFAULT=65535
CONFIG_LWIP_TCP_RECVMBOX_SIZE=64
CONFIG_LWIP_UDP_RECVMBOX_SIZE=64
CONFIG_LWIP_TCPIP_RECVMBOX_SIZE=64

## -- The Cpufreq part --
CONFIG_ESP_DEFAULT_CPU_FREQ_MHZ_240=y
CONFIG_ESP_DEFAULT_CPU_FREQ_MHZ=240

## -- The QIO (flash speed) part --
CONFIG_ESPTOOLPY_FLASHMODE_QIO=y
CONFIG_ESPTOOLPY_FLASHFREQ_80M=y

## -- The ICACHE part --
CONFIG_ESP32S3_INSTRUCTION_CACHE_32KB=y
CONFIG_ESP32S3_INSTRUCTION_CACHE_LINE_32B=y
CONFIG_ESP32S3_INSTRUCTION_CACHE_WRAP=y

It has three blocks that I was then turning on/off; here are my results (def means defaults/block off, iperf means settings from the demo):

Modem/LWIP CPU Freq QIO+Flash ICACHE Speed
def 160MHz def def 643KiB/s
iperf 240MHz iperf iperf 3.62MiB/s
iperf 240MHz def iperf 3.62MiB/s
iperf 240MHz def def 879KiB/s
def 160MHz def iperf 1.69MiB/s
def 240MHz iperf iperf 2.06MiB/s

It is pretty evident that instruction cache is a main source of speed-up; playing with buffers and queue sizes helps, but I doubt it can get one over the 1MiBps barrier.

MabezDev commented 9 months ago

You could try playing with patching various values from here: https://github.com/esp-rs/esp-hal/blob/4d87e75d71b55d546b37aff6a4494cefb743c4a9/esp32s3-hal/src/lib.rs#L76-L100 in esp-hal. We have a longer-standing issue around cache configuration here: https://github.com/esp-rs/esp-hal/issues/955.

Edit: for the other chips, I believe the cache settings can be configured in the bootloader and they will persist to the main app, so we may be able to get some speed up on other chips too.

Another factor is that xtensa-lx-rt curren't can't do lazy loading of float registers (maybe we should just put it behind a feature?) so that will affect the context switch performance greatly.

MabezDev commented 9 months ago

@mbq you may wish to test with esp-rs/esp-wifi-sys#430, on RISCV at least I was seeing 2MB/s upload speeds. It seems there are still some bottlenecks on Xtensa though.

mbq commented 9 months ago

@MabezDev For a fast test I tried bumping esp-wifi to main, and the result was 50% throughput drop, from 313KiB/s to 139KiB/s (this is on S3 board with a sync code that just tries to push as many bytes as possible); I haven't tested cache patches yet.

Anyhow, I decided to use IDF for the project I'm doing now, but I'm still keeping my fingers crossed for a solution here.