espressif / esp-idf

Espressif IoT Development Framework. Official development framework for Espressif SoCs.
Apache License 2.0
13.29k stars 7.2k forks source link

LWIP driver fails to recover afer sendto() returns ENOMEM (IDFGH-13101) #14048

Closed usysinct closed 2 months ago

usysinct commented 2 months ago

Answers checklist.

IDF version.

2.9.1 Build id: 20230406-1540

Espressif SoC revision.

ESP32-PICO-V3-02

Operating System used.

Windows

How did you build your project?

Other (please specify in More Information)

If you are using Windows, please specify command line type.

None

Development Kit.

Espressif-IDE

Power Supply used.

USB

What is the expected behavior?

Hello. I am using Espressif-IDE Version: 2.9.1 Build id: 20230406-1540. SoC: ESP32-PICO-V3-02 The application is written entirely in C. My development platform is Windows.

The LWIP driver and specifically sendto()), after issuing 'ENOMEM', should become available to send UDP packets and not return 'ENOMEM' after having been yielded adequate processor time slice to process buffered pbufs.

What is the actual behavior?

Hello. I am using Espressif-IDE Version: 2.9.1 Build id: 20230406-1540. SoC: ESP32-PICO-V3-02 The application is written entirely in C. My development platform is Windows.

The application is in a high stage of completion except for two issues, one surrounding the I2C driver causing the occasional kernel panic on soft reboot, and the other more pressing issue surrounding the LWIP stack, specifically pbuf_alloc() running out of memory.

All application tasks are threaded with xTaskCreatePinnedToCore() and have priority 19 and execute synchronously and symmetrical.

The application consistently sends ~10,800 UDP packets at a rate of precisely 10 pkt/s for 256 Byte/pkt, whence sendto() eventually returns errno 12 'ENOMEM' aforewhich no recovery of which I am aware, is possible. Subsequently, not sending packets for > 20s afterwards does not remedy the situation and I am currently compelled to issue a soft reboot which will not suffice for a successful product deployment.

This same issue was addeessed in Nov, 2019 re: Network "out of memory" error (IDFGH-2154) #4309 https://github.com/espressif/esp-idf/issues/4309

I have read through the arguments but it appears that the problem still exists in at lease the current ESP-IDF version of FreeRTOS.

  1. How do I mitigate this situation without having to reboot the system?
  2. How do I obtain the dynamic size of the pbuf linked list? I have only today begun to .../esp-idf-v5.2/components/lwip $ grep -r "pbuf_alloc" . , so tracking down the methodology will take me some time. Perhaps preventing this anomylous situation to occur in the first place from within application code is an option.

Steps to reproduce.

  1. sendto() ~10,800 packets of size 256 bytes
  2. Wait for sendto() to return ENOMEM
  3. Yield for several seconds.
  4. Attempt sendto() consistently yields ENOMEM ...

Debug Logs.

No response

More Information.

No response

usysinct commented 2 months ago

On further testing, it has been determined that LWIP uptime of 18 minutes appears to be a limiting condition. On increasing UDP packet rate from 10 pkt/s to 12.5 pkt/s and 14.3 pkt/s, a linear increase in the number of packets successfully transmitted was observed until LWIP sendto() failure with ENOMEM, occurred at the same 18 minute mark. This is beginning to look like a DHCP lease timeout situation.

6-23-2024 After some testing and modifying ENOMEM error responses in file: C:\Espressif\frameworks\esp-idf-v5.2\components\lwip\lwip\src\api\sockets.c , I have concluded that the errno == ENOMEN after sendto() failure at the consistent 18 minute mark, is unlikely to be originating there. Surprise, the ENOMEM is not originating in LWIP, at least not directly!

Executing /esp-idf-v5.2/components$ grep -r "ENOMEM" . " produces hits on the following directories: ./bt ./fatfs ./freertos ./newlib ./openthread ./pthread ./spiffs ./vfs ./wpa_supplicant ./lwip

Assuming that we can exclude all directories other than /freertos and /lwip, we are left with:

./freertos/FreeRTOS-Kernel-SMP/include/freertos/projdefs.h:#define pdFREERTOS_ERRNO_ENOMEM 12 / Not enough memory /

Do I have a memory leak? If so then I'm consuming memory at a rate of up to 474 32-bit words/s, for the target ESP32-PICO-V3-02. That's hard to believe.

usysinct commented 2 months ago

I have resolved the issue. FreeRTOS is NOT the source of the leak. I incurred a memory leak with an inadvertent use of malloc where my realloc method should have been used, as:

int allocate_int_buffer(int* buffer, int length) { int p; if (buffer == NULL) { printf("allocate_int_buffer() buffer NULL error\n"); return -1; } if (buffer == NULL) { // call to unitialized allocate_int_buffer() method if ( (buffer = (int)malloc(lengthsizeof(int))) ) { //printf("allocate_int_buffer() malloc new buffer size: %d\n",lengthsizeof(int)); ; } else { printf("allocate_int_buffer() buffer malloc error\n"); return -1; } } else { if ( (buffer = (int)realloc(p = buffer, lengthsizeof(int))) ) { //printf("allocate_int_buffer() realloc pre-existing buffer size: %d\n",lengthsizeof(int)); ; } else { free(p); printf("allocate_int_buffer() buffer realloc error\n"); return -1; } } //printf("Pointer char n: %lu\n", (uint32_t)n); memset(buffer,0,length*sizeof(int)); // zero heap array return 0; }

usysinct commented 2 months ago

Be careful when you use malloc. Inadvertent independent use of malloc within recursive re-entrant methods without realloc, such as within my problematic method, can be a source of much consternation, ado and time wastage.