espressif / esp-idf

Espressif IoT Development Framework. Official development framework for Espressif SoCs.
Apache License 2.0
13.31k stars 7.2k forks source link

Wifi performance of ESP32-S3 really bad, lot worse compared to ESP8266 (IDFGH-13236) #14171

Open eriksl opened 2 months ago

eriksl commented 2 months ago

Answers checklist.

IDF version.

v5.2.2

Espressif SoC revision.

ESP32-S3 0.1

Operating System used.

Linux

How did you build your project?

Command line with idf.py

If you are using Windows, please specify command line type.

None

Development Kit.

Wemos/Lolin S3 mini

Power Supply used.

USB

What is the expected behavior?

I expect to see payload transfer speed at least 1/10'th of the connection speed, so in the range of 5400 kbits/s, 675 kbyte/s. On the ESP8266, with a similar setup (sending or receive 4k blocks over either tcp or udp, same wireless network), I can obtain around 800 kbytes/s (depending on the range and interference, but there is more handling involved there). On the ESP32 I can obtain little more than 200 kbyte/s to ESP32 and 600 kbyte/s from ESP32.

What is the actual behavior?

Very low network performance, much lower than ESP8266 in the same setup. Connection to access point is exactly the same, as is the rest of the infrastructure. The only real difference is that ESP8266 uses native LWIP callback API while the ESP32 image uses the LWIP POSIX API.

Note: you cannot leave out the "ACK" stuff, otherwise the non-ESP32 side will just queue up everything in memory and then report the test as ready, without having sent a single byte yet. The "ACK"-ing introduces a bit of lag, I am aware of that.

I do not see any errors on the wireless controller for this association. I don't think it's an RF issue. Looks more like an issue within LWIP or the IDF.

Steps to reproduce.

Use a very simple program like below. Use simple POSIX socket calls like socket/bind/listen/accept/send/receive/close. Use a client that sends 4k blocks upon reception of the word "ACK" or use a client that receives 4k blocks whenever it sends "ACK". Use default values for idf configuration. I've tried many and it doesn't matter that much. Also use of SPIRAM doesn't matter much.

For the full source code of the ESP32-S3 image, see here: https://github.com/eriksl/esp32. The performance testing code is currently disabled, adjust init.c to enable it. For the client I used, see here: https://github.com/eriksl/e32if

#include <stdint.h>
#include <stdbool.h>
#include <sys/socket.h>

#include "perftest.h"
#include "string.h"
#include "cli-command.h"
#include "log.h"
#include "util.h"

static bool inited = false;

enum
{
    //malloc_type = MALLOC_CAP_INTERNAL
    malloc_type = MALLOC_CAP_SPIRAM
};

static void run_tcp_receive(void *)
{
    enum { size = 4096 };
    char *receive_buffer;
    int accept_fd;
    struct sockaddr_in6 si6_addr;
    socklen_t si6_addr_length;
    int length;
    int tcp_socket_fd;
    static const char *ack = "ACK";
    enum { attempts = 8 };
    unsigned int attempt;

    assert(inited);

    receive_buffer = heap_caps_malloc(size, malloc_type);

    memset(&si6_addr, 0, sizeof(si6_addr));
    si6_addr.sin6_family = AF_INET6;
    si6_addr.sin6_port = htons(9); // discard

    assert((accept_fd = socket(AF_INET6, SOCK_STREAM, 0)) >= 0);
    assert(bind(accept_fd, (const struct sockaddr *)&si6_addr, sizeof(si6_addr)) == 0);
    assert(listen(accept_fd, 0) == 0);

    for(;;)
    {
        si6_addr_length = sizeof(si6_addr);

        if((tcp_socket_fd = accept(accept_fd, (struct sockaddr *)&si6_addr, &si6_addr_length)) < 0)
        {
            log_format_errno("perftest: accept fails: %d", tcp_socket_fd);
            continue;
        }

        assert(sizeof(si6_addr) >= si6_addr_length);

        for(;;)
        {
            length = recv(tcp_socket_fd, receive_buffer, size, 0);

            if(length <= 0)
            {
                log_format("perftest tcp recv: %d", length);
                break;
            }

            for(attempt = attempts; attempt > 0; attempt--)
            {
                length = send(tcp_socket_fd, ack, sizeof(ack), 0);

                if(length == sizeof(ack))
                    break;

                log_format("perftest tcp send ack: %d, try %d", length, attempt);
                vTaskDelay(100 / portTICK_PERIOD_MS);
            }

            if(attempt == 0)
                log("perftest tcp send ack: no more tries");
        }

        close(tcp_socket_fd);
    }
}

static void run_tcp_send(void *)
{
    enum { size = 4096 };
    char *send_buffer;
    int accept_fd;
    struct sockaddr_in6 si6_addr;
    socklen_t si6_addr_length;
    int length;
    int tcp_socket_fd;
    static const char *ack = "ACK";
    enum { attempts = 8 };
    unsigned int attempt;

    assert(inited);

    send_buffer = heap_caps_malloc(size, malloc_type);

    memset(&si6_addr, 0, sizeof(si6_addr));
    si6_addr.sin6_family = AF_INET6;
    si6_addr.sin6_port = htons(19); // chargen

    assert((accept_fd = socket(AF_INET6, SOCK_STREAM, 0)) >= 0);
    assert(bind(accept_fd, (const struct sockaddr *)&si6_addr, sizeof(si6_addr)) == 0);
    assert(listen(accept_fd, 0) == 0);

    for(;;)
    {
        si6_addr_length = sizeof(si6_addr);

        if((tcp_socket_fd = accept(accept_fd, (struct sockaddr *)&si6_addr, &si6_addr_length)) < 0)
        {
            log_format_errno("perftest: accept fails: %d", tcp_socket_fd);
            continue;
        }

        assert(sizeof(si6_addr) >= si6_addr_length);

        for(;;)
        {
            length = recv(tcp_socket_fd, send_buffer, sizeof(ack), 0);

            if(length <= 0)
            {
                log_format("perftest tcp revc 2: %d", length);
                break;
            }

            for(attempt = attempts; attempt > 0; attempt--)
            {
                length = send(tcp_socket_fd, send_buffer, size, 0);

                if(length == size)
                    break;

                if((length < 0) && ((errno == ENOTCONN) || (errno == ECONNRESET)))
                    goto abort;

                log_format_errno("perftest tcp send 2: %d, try %d", length, attempt);
                vTaskDelay(100 / portTICK_PERIOD_MS);
            }

            if(attempt == 0)
                log("perftest tcp send 2: no more tries");
        }

abort:
        close(tcp_socket_fd);
    }
}

static void run_udp_receive(void *)
{
    enum { size = 4096 };
    char *receive_buffer;
    struct sockaddr_in6 si6_addr;
    socklen_t si6_addr_length;
    int length;
    int udp_socket_fd;
    static const char *ack = "ACK";
    enum { attempts = 8 };
    unsigned int attempt;

    assert(inited);

    receive_buffer = heap_caps_malloc(size, malloc_type);

    memset(&si6_addr, 0, sizeof(si6_addr));
    si6_addr.sin6_family = AF_INET6;
    si6_addr.sin6_port = htons(9); // discard

    assert((udp_socket_fd = socket(AF_INET6, SOCK_DGRAM, 0)) >= 0);
    assert(bind(udp_socket_fd, (const struct sockaddr *)&si6_addr, sizeof(si6_addr)) == 0);

    for(;;)
    {
        si6_addr_length = sizeof(si6_addr);

        length = recvfrom(udp_socket_fd, receive_buffer, size, 0, (struct sockaddr *)&si6_addr, &si6_addr_length);

        assert(sizeof(si6_addr) >= si6_addr_length);

        if(length <= 0)
        {
            log_format("perftest udp recv: %d", length);
            continue;
        }

        for(attempt = attempts; attempt > 0; attempt--)
        {
            length = sendto(udp_socket_fd, ack, sizeof(ack), 0, (const struct sockaddr *)&si6_addr, si6_addr_length);

            if(length == sizeof(ack))
                break;

            log_format("perftest udp send ack: %d, try %d", length, attempt);
            vTaskDelay(100 / portTICK_PERIOD_MS);
        }

        if(attempt == 0)
            log("perftest udp send ack: no more tries");
    }

    close(udp_socket_fd);
}

static void run_udp_send(void *)
{
    enum { size = 4096 };
    char *send_buffer;
    struct sockaddr_in6 si6_addr;
    socklen_t si6_addr_length;
    int length;
    int udp_socket_fd;
    static const char *ack = "ACK";
    enum { attempts = 8 };
    unsigned int attempt;

    assert(inited);

    send_buffer = heap_caps_malloc(size, malloc_type);

    memset(&si6_addr, 0, sizeof(si6_addr));
    si6_addr.sin6_family = AF_INET6;
    si6_addr.sin6_port = htons(19); // chargen

    assert((udp_socket_fd = socket(AF_INET6, SOCK_DGRAM, 0)) >= 0);
    assert(bind(udp_socket_fd, (const struct sockaddr *)&si6_addr, sizeof(si6_addr)) == 0);

    for(;;)
    {
        si6_addr_length = sizeof(si6_addr);

        length = recvfrom(udp_socket_fd, send_buffer, sizeof(ack), 0, (struct sockaddr *)&si6_addr, &si6_addr_length);

        assert(sizeof(si6_addr) >= si6_addr_length);

        if(length <= 0)
        {
            log_format("perftest udp recv 2: %d", length);
            continue;
        }

        for(attempt = attempts; attempt > 0; attempt--)
        {
            length = sendto(udp_socket_fd, send_buffer, size, 0, (const struct sockaddr *)&si6_addr, si6_addr_length);

            if(length == size)
                break;

            log_format("perftest udp send 2: %d, try %d", length, attempt);
            vTaskDelay(100 / portTICK_PERIOD_MS);
        }

        if(attempt == 0)
            log("perftest udp send 2: no more tries");
    }

    close(udp_socket_fd);
}

void perftest_init(void)
{
    assert(!inited);

    inited = true;

    if(xTaskCreatePinnedToCore(run_tcp_receive, "perf-tcp-recv", 2 * 1024, (void *)0, 1, (TaskHandle_t *)0, 1) != pdPASS)
        util_abort("perftest: xTaskCreatePinnedToNode tcp receive");

    if(xTaskCreatePinnedToCore(run_tcp_send, "perf-tcp-send", 2 * 1024, (void *)0, 1, (TaskHandle_t *)0, 1) != pdPASS)
        util_abort("perftest: xTaskCreatePinnedToNode tcp send");

    if(xTaskCreatePinnedToCore(run_udp_receive, "perf-udp-recv", 2 * 1024, (void *)0, 1, (TaskHandle_t *)0, 1) != pdPASS)
        util_abort("perftest: xTaskCreatePinnedToNode udp receive");

    if(xTaskCreatePinnedToCore(run_udp_send, "perf-udp-send", 2 * 1024, (void *)0, 1, (TaskHandle_t *)0, 1) != pdPASS)
        util_abort("perftest: xTaskCreatePinnedToNode udp send");
}

Debug Logs.

No response

More Information.

No response

kapilkedawat commented 1 month ago

Hi @eriksl, we have an iperf example in IDF, could you please try that and share throughput numbers?

Also please share sniffer capture of that instance if possible.

MaxwellAlan commented 1 month ago

Hi @eriksl

You can also refer to IDF docs https://docs.espressif.com/projects/esp-idf/en/v5.2.2/esp32s3/api-guides/wifi.html#how-to-improve-wi-fi-performance to improve Wi-Fi throughput.

eriksl commented 1 month ago

Hi @eriksl

You can also refer to IDF docs https://docs.espressif.com/projects/esp-idf/en/v5.2.2/esp32s3/api-guides/wifi.html#how-to-improve-wi-fi-performance to improve Wi-Fi throughput.

I am aware of this document. I tried all of these, with very little change in performance.

The key point remains:

I am using a dead-simple loop of accept()->read()->repeat or accept()->write()->repeat, I really think I should get a much better performance. On both TCP and UDP. So TCP-specific options aren't really significant here. Or I'd like to have a statement that using the LWIP POSIX interface a good performance is not possible. Which would explain why it's so much faster on the ESP8266 where I am using the LWIP native interface.

eriksl commented 1 month ago

Hi @eriksl, we have an iperf example in IDF, could you please try that and share throughput numbers?

Also please share sniffer capture of that instance if possible.

Can't test. When running I get "Writing to serial is timing out. Please make sure that your application supports an interactive console and that you have picked the correct console for serial communication." every time. I can't type anything.

Probably the same issue with console on USB console on ESP32-S3 I reported earlier.

eriksl commented 1 month ago

FWIW the iperf example uses the same API (POSIX) as I do (connect, sendto, etc.)

MaxwellAlan commented 1 month ago

Hi @eriksl

Can't test. When running I get "Writing to serial is timing out. Please make sure that your application supports an interactive console and that you have picked the correct console for serial communication." every time. I can't type anything.

Probably the same issue with console on USB console on ESP32-S3 I reported earlier.

If I understand correctly, the iperf application uses UART as the default input and output for the console. If you want to use USB, you need to enable the USB Serial Options(https://github.com/espressif/esp-idf/blob/af25eb447e3330c21e3b38e91db16332056882b2/components/esp_system/Kconfig#L237) in Menuconfig.

eriksl commented 1 month ago

I copied the config file I am using for my own firmware (which sees the low performance). This configuration has, of course, already console on USB-JTAG enabled. After copying the config, I ran idf.py menuconfig so a composite config could be generated.

MaxwellAlan commented 1 month ago

@eriksl Could you upload the full logs of throughput test ?

MaxwellAlan commented 1 month ago

Hi @eriksl

I made a preliminary comparison https://github.com/eriksl/esp32/blob/master/s3/develop/sdkconfig Compared to the default configuration of iperf, there seem to be many differences, such as CPU frequency, WIFI configuration, lwip configuration, and other items that have a significant impact on performance. Currently, we hope that you can try to run our IDF iperf and check if there are any obvious hardware abnormalities.

eriksl commented 1 month ago

ok will do that later. Currently not at home...

eriksl commented 1 month ago

@MaxwellAlan I have tested with an extensive set of combinations of sdk config items, including CPU speed, cache sizes, LWIP options, SPRAM options and wlan options. They make a bit difference, enough to confirm that they changed, but the throughput remains very bad nonetheless. Traffic originating from the ESP32 isn't that bad actually, it's almost as fast as the ESP8266. But traffic to be received by the ESP32 is really bad. Changing the IDF options results in minimal performance change, ranging from 150 kbyte/sec to 250 kbyte/sec (1.2 Mbps - 2.0 Mbps), while I can reach 500 kbyte/s on my ESP8266 (4.0 Mbps).

AxelLin commented 4 weeks ago

Traffic originating from the ESP32 isn't that bad actually, it's almost as fast as the ESP8266. But traffic to be received by the ESP32 is really bad. Changing the IDF options results in minimal performance change, ranging from 150 kbyte/sec to 250 kbyte/sec (1.2 Mbps - 2.0 Mbps), while I can reach 500 kbyte/s on my ESP8266 (4.0 Mbps).

@MaxwellAlan any comments?

eriksl commented 4 weeks ago

If anyone could get the iperf image working on the ESP-S3 using USB jtag/serial console working, I'd be grateful too...

hansw123 commented 6 days ago

@eriksl sorry for late reply ,I test esp32s3 USB jtag with idf v5.2.2 iperf in shield box tcp throughput is ok image

hansw123 commented 6 days ago

@eriksl you can also use iperf example test just enable ESP_CONSOLE_USB_SERIAL_JTAG in menconfig can be ok

eriksl commented 2 days ago

Of course I did. And it doesn't work.

hansw123 commented 2 days ago

@eriksl can you provide the err log when you enable ESP_CONSOLE_USB_SERIAL_JTAG but still work in example iperf

eriksl commented 2 days ago

See here https://github.com/espressif/esp-idf/issues/14171#issuecomment-2228352494. There is no build error, it just doesn't work. It looks like the iperf image fetches it's input directly from one of the UARTs and doesn't recognise/use the USB JTAG UART.

hansw123 commented 16 hours ago

See here #14171 (comment). There is no build error, it just doesn't work. It looks like the iperf image fetches it's input directly from one of the UARTs and doesn't recognise/use the USB JTAG UART. This occurs when your device doesn't print anything, and is usually seen with programs that don't have a follow-up action. in iperf example,this issue will not happen normal

eriksl commented 6 hours ago

I might try again with the newest stable IDF version. I know there have been a few fixes in this area. But without having had a look at the code, I really suspect the iperf code assumes having a real UART connected and not UART emulation over USB-JTAG.

hansw123 commented 6 hours ago

@eriksl I noticed that you are not using our official development board, because there is something special about the usb and the chip, I'm not sure that the unofficial development board has handled it well, maybe you can experiment with the official s3 development board or you can contect with the wemos

eriksl commented 5 hours ago

There is nothing wrong with the Wemos. I am using it all of the time with my own code, USB UART works like a charm. But apparently the iperf code doesn't handle it, doesn't handle it well.

igrr commented 4 hours ago

It looks like the iperf image fetches it's input directly from one of the UARTs and doesn't recognise/use the USB JTAG UART.

Shouldn't be the case, as long as you enable CONFIG_ESP_CONSOLE_USB_SERIAL_JTAG option in menuconfig:

https://github.com/espressif/esp-idf/blob/3b8741b172dc951e18509698dee938304bcf1523/examples/wifi/iperf/main/iperf_example_main.c#L41-L44


There is nothing wrong with the Wemos. I am using it all of the time with my own code, USB UART works like a charm.

The point is that Wi-Fi performance is heavily related to factors such as PCB design, power supply quality, interference between the RF path and other high-speed signals, and so on. Purely digital functions such as USB 1.1 interface and the CPU / peripherals operation are influenced by these factors to a much smaller degree. So it is not unreasonable to try to use a different devboard to rule out such PCB-related issues.


Besides, it might be worth comparing the results with iperf when UART is used for console with the results when USB is used for console. The description of CONFIG_ESP_PHY_ENABLE_USB option says:

On some ESP targets, the USB PHY can interfere with WiFi thus lowering WiFi performance. As a result, on those affected ESP targets, the ESP PHY library's initialization will automatically disable the USB PHY to get best WiFi performance.

Since USB_SERIAL_JTAG requires USB PHY to be enabled, it sounds like this might lower WiFi performance. I am not sure if the iperf log posted by @hansw123 in https://github.com/espressif/esp-idf/issues/14171#issuecomment-2330957321 is already with USB_SERIAL_JTAG console enabled, or with console over UART.

eriksl commented 4 hours ago

If there would be issues with the analogue path, I would find evidence in the statistics from my (enterprise/managed) access points. And I can't find any. There is a connection at the highest speed 802.11n can achieve (65 Mbps) and it remains that way. Looks to me like we can rule any signal issue out.

In the meantime I do have another board (LilyGO T7 S3) and I will try it there. The comparison isn't completely fair though as this one has the SPIRAM connected by 8 wire SPI v.s. 4 wire. But I already discovered SPI RAM speed doesn't really matter that much with this issue.

One of your fellow developers disclosed recently that the impact of using the USB PHY on the Wifi performance is really small, something along the lines of 1%. Besides that, with this kind of interference I'd expect, again, evidence from the access points.

It really looks as if something inside the Wifi handling, in the digital/software domain is handling something very slowly, for some reason, before the frames are handed to LWIP, so not something I can have any influence on.