espressif / esp-idf

Espressif IoT Development Framework. Official development framework for Espressif SoCs.
Apache License 2.0
13.77k stars 7.31k forks source link

rtl8201 no mem for receive buffer (IDFGH-7769) #9308

Closed chrisovergaauw closed 1 year ago

chrisovergaauw commented 2 years ago

Environment

Intellij IDE IDF 4.1.1 (relevant code is exactly the same code as in latest) micropython latest ESP-32-WROOM-32 AsyncTCP Server example lan phy: rtl8201

Problem Description

Exact same issue as #9083 but then for emac_esp32_rx_task in esp_eth_mac_esp.c after "no mem for receive buffer" it seems like no memory will never be freed. Any socket interaction in MP will hang after this, but the code is clearly an ESP-IDF issue.

//Detailed problem description goes here.

Expected Behavior

malloc for rx received bytes instead (similar to #9083) of ETH_MAX_PACKET_SIZE (this will reduce how often the error occurs) and memory should be freed in order to recover from this scenario.

kostaond commented 2 years ago

It is recurring issue as you mentioned. We will have a look.

kostaond commented 2 years ago

@chrisovergaauw could you please confirm you mean https://github.com/me-no-dev/AsyncTCP by "AsyncTCP Server example"?

thiagohd commented 2 years ago

I have the same problem, but with my own firmware.

I have implemented a UDP logging functionality, where every logging from the ESP-32 will be broadcasted to the network.

And the same firmware, but disabling the UDP logging system, doesn't have this issue.

It looks like EMAC phy is trying to pass all the UDP messages received, and it fills up the memory faster than it can consume it. Even though the ESP-32 doesn't expect any UDP packets, it will surely come up in the emac interrupt handler.

I also have other devices in the network which send broadcast packets.

static int init_udp_socket(){
    int b_sock, ret;
    if ((b_sock = socket(PF_INET, SOCK_DGRAM, IPPROTO_UDP)) < 0){
        printf("ERR socket(): %d\n", b_sock);
        return 0;
    }

    int broadcastPermission = 1;
    if ((ret = (setsockopt(b_sock, SOL_SOCKET, SO_BROADCAST, (void *) &broadcastPermission, sizeof(broadcastPermission)))) < 0){
        printf("ERR setsockopt() = %d\n", ret);
        return 0;
    }

    int opts;
    opts = fcntl(b_sock, F_GETFL);
    if(opts < 0){
        printf("ERR fcntl() = %d\n", opts);
        return 0;
    }

    opts = (opts | O_NONBLOCK);
    if((ret = (fcntl(b_sock,F_SETFL,opts))) < 0){
        printf("ERR fcntl() = %d\n", ret);
        return 0;
    }
    return b_sock;
}
static int check_udp_socket(int b_sock){
    int opts;

    if(b_sock == 0) return 0;

    opts = fcntl(b_sock, F_GETFL);
    if(opts < 0){
        printf("fcntl get failed\n");
        return 0;
    }
    return 1;
}
static int send_udp_packet(int b_sock, char *str, int len){
    int ret;
    struct sockaddr_in tx_addr;

    tx_addr.sin_family = AF_INET;                 
    tx_addr.sin_addr.s_addr = inet_addr("255.255.255.255");
    tx_addr.sin_port = htons(UDP_LOG_PORT);
    if((ret = sendto(b_sock, str, len, 0, (struct sockaddr *)&tx_addr, sizeof(tx_addr))) < 0)
        printf("ERR sento(): %d\n", ret);

    return ret;
}
int send_udp(const char *fmt, ...){
    if(!check_udp_socket(__xudp_socket))
        __xudp_socket = init_udp_socket();

    char udp_log_buff[UDP_LOG_LINE_SIZE] = {0};
    va_list va;
    va_start(va, fmt);
    int xlen = vsnprintf(udp_log_buff, UDP_LOG_LINE_SIZE, fmt, va);
    va_end(va);
    if(xlen < 0){
        printf("Erro ao formatar mensagem de log UDP\n");
    }else{
        xlen = (xlen >= UDP_LOG_LINE_SIZE) ? UDP_LOG_LINE_SIZE : xlen;
    }
    return send_udp_packet(__xudp_socket, udp_log_buff, xlen);
}
kostaond commented 2 years ago

Could you please check if this WIP update helped to resolve your issue?

chrisovergaauw commented 2 years ago

@kostaond The changes you've made are obviously (and rightfully) based off the master branch. Currently I'm working with the latest micropython which supports esp-idf v4.4.1 at most. I've tried cherry-picking your commit into v4.4.1 (using -x theirs to resolve issues).

With the resulting micropython firmware I am able to:

I have not been able to make an easily reproducible which utilises enough system resources for this issue to appear. So I can tell you how it currently behaves in my setup as an FYI, but it won't be of much use.

For my project, the initial issue occurs when I start using BLE, HTTPS and +- 15 sensors. with HTTP it all works just fine with your patch. HTTPS is an entirely different story:

EDIT: added esp.emac debug logs to below code snippet. 90 seconds does not seem like a coincidence to me, so I'll investigate that.

07:41:47 DEBUG mqtt_async.AsyncWebsocket 6. Read header [b'HTTP/1.1 101 Switching Protocols']
D (16846) esp.emac: receive len= 70
D (20476) esp.emac: receive len= 508
D (20476) esp.emac: receive len= 528
D (20476) esp.emac: receive len= 528
D (24296) esp.emac: receive len= 342
D (24296) esp.emac: receive len= 60
D (24616) esp.emac: receive len= 60
D (24936) esp.emac: receive len= 60
D (25256) esp.emac: receive len= 60
D (25576) esp.emac: receive len= 60
D (25896) esp.emac: receive len= 60
D (25936) esp.emac: receive len= 261
D (25936) esp.emac: receive len= 281
D (26096) esp.emac: receive len= 508
D (26096) esp.emac: receive len= 528
D (26096) esp.emac: receive len= 514
D (26096) esp.emac: receive len= 534
D (26436) esp.emac: receive len= 277
D (26436) esp.emac: receive len= 297
D (26436) esp.emac: receive len= 327
D (26436) esp.emac: receive len= 347
D (26686) esp.emac: receive len= 277
D (26686) esp.emac: receive len= 297
D (26686) esp.emac: receive len= 327
D (26686) esp.emac: receive len= 347
D (26936) esp.emac: receive len= 277
D (26936) esp.emac: receive len= 297
D (26936) esp.emac: receive len= 327
D (26936) esp.emac: receive len= 347
D (26936) esp.emac: receive len= 261
D (26946) esp.emac: receive len= 281
D (27106) esp.emac: receive len= 508
D (27106) esp.emac: receive len= 528
D (27106) esp.emac: receive len= 514
D (27106) esp.emac: receive len= 534
D (27206) esp.emac: receive len= 1303
D (27206) esp.emac: receive len= 1323
D (27206) esp.emac: receive len= 1303
D (27206) esp.emac: receive len= 1323
D (28206) esp.emac: receive len= 1303
D (28206) esp.emac: receive len= 1323
D (28206) esp.emac: receive len= 1303
D (28206) esp.emac: receive len= 1323
D (28936) esp.emac: receive len= 261
D (28936) esp.emac: receive len= 281
D (30106) esp.emac: receive len= 508
D (30106) esp.emac: receive len= 528
D (30106) esp.emac: receive len= 508
D (30106) esp.emac: receive len= 528
D (30206) esp.emac: receive len= 1303
D (30206) esp.emac: receive len= 1323
D (30206) esp.emac: receive len= 1303
D (30206) esp.emac: receive len= 1323
D (32366) esp.emac: receive len= 342
D (32936) esp.emac: receive len= 1303
D (32936) esp.emac: receive len= 1323
D (32936) esp.emac: receive len= 1462
D (32936) esp.emac: receive len= 1482
D (39126) esp.emac: receive len= 508
D (39126) esp.emac: receive len= 528
D (39126) esp.emac: receive len= 508
D (39126) esp.emac: receive len= 528
D (40646) esp.emac: receive len= 342
D (41736) esp.emac: receive len= 92
D (41736) esp.emac: receive len= 92
D (42406) esp.emac: receive len= 92
D (44426) esp.emac: receive len= 92
D (48796) esp.emac: receive len= 342
D (56936) esp.emac: receive len= 342
D (65076) esp.emac: receive len= 342
D (66186) esp.emac: receive len= 508
D (66186) esp.emac: receive len= 528
D (66186) esp.emac: receive len= 508
D (66186) esp.emac: receive len= 528
D (106396) esp.emac: receive len= 85
D (106396) esp.emac: receive len= 60
07:43:17 DEBUG mqtt_async.AsyncWebsocket 6. Read header [b'Server: nginx']
07:43:17 DEBUG mqtt_async.AsyncWebsocket 6. Read header [b'Date: Tue, 01 Nov 2022 07:41:48 GMT']
07:43:17 DEBUG mqtt_async.AsyncWebsocket 6. Read header [b'Connection: upgrade']
07:43:17 DEBUG mqtt_async.AsyncWebsocket 6. Read header [b'Upgrade: WebSocket']
07:43:17 DEBUG mqtt_async.AsyncWebsocket 6. Read header [b'Sec-WebSocket-Accept: NoM7CbF6cF1+Nz4kcxV+fyySnq8=']
07:43:17 DEBUG mqtt_async.AsyncWebsocket 6. Read header [b'Sec-WebSocket-Protocol: mqttv3.1']

I'll try to gather as much useful info for you as I can in the coming days.

kostaond commented 2 years ago

@chrisovergaauw thank you for very quick response and testing! Just to summarize, you now don't see no mem for receive buffer anymore? Once HTTPS communication hangs, are you able to receive other traffic? It would be great if you were able to describe steps (or provide code snippet) so I could reproduce the issue.

chrisovergaauw commented 2 years ago

I don't see it currently, but that might be due to the fact that my Micropython code hangs on a socket write action. That means that the ESP-IDF code underneath probably does not do much either. I cannot tell you whether receiving other traffic still works, but the esp.emac receive message do keep on coming in, so that's a good sign. Sorry, I have not yet been able to reproduce this without our company's micropython code base, which is far too complex to serve as a debug example. I am still working on creating a reproducible example.

kostaond commented 2 years ago

Try to ping the device once the HTTPS gets stuck.

chrisovergaauw commented 2 years ago

it still responds to pinging properly 👍

Alvin1Zhang commented 1 year ago

Thanks for reporting, fix is available at https://github.com/espressif/esp-idf/commit/785f154f5652b353c460ec47cb90aa404207918c, feel free to reopen.