espressif / esp-idf

Espressif IoT Development Framework. Official development framework for Espressif SoCs.
Apache License 2.0
12.93k stars 7.09k forks source link

Memory allocation failed (IDFGH-13022) #13969

Open pedrohugo-psc opened 3 weeks ago

pedrohugo-psc commented 3 weeks ago

Answers checklist.

IDF version.

v4.4.4

Espressif SoC revision.

ESP32-WROOM-32D

Operating System used.

Windows

How did you build your project?

VS Code IDE

If you are using Windows, please specify command line type.

None

Development Kit.

Devkit_v4

Power Supply used.

USB

What is the expected behavior?

I'd like to run the ESP32 using WiFi-Mesh via MDF and Modbus TCP while disconnected from the internet for an extended period without experiencing memory allocation issues or other problems. Afterward, I would like to connect this device to the internet.

What is the actual behavior?

When running the ESP32 using WiFi-Mesh via MDF and Modbus TCP while disconnected from the internet for an extended period, can 3 types memory allocation failure occurs, causing a reboot, although the device can still connect to the internet. Additionally, if the memory allocation failure does not occur, a wifi_nvs_set failure appears, preventing the ESP from connecting to the internet.

Steps to reproduce.

In this project, I am using a combination of WiFi Mesh via MDF and Modbus TCP. It is essential to integrate both solutions and run them for an extended period without being connected to the internet.

Debug Logs.

1) First type memory allocation failed

[M30_MESH_NETWORK, 59]: Got mesh id from spiffs: 00:00:00:01:00:00

Memory allocation failed

Backtrace: 0x40081c42:0x3ffc9ac0 0x400899dd:0x3ffc9ae0 0x400d6ff3:0x3ffc9b00 0x40081e44:0x3ffc9b20 0x40081e6d:0x3ffc9b40 0x400823f5:0x3ffc9b60 0x400825ed:0x3ffc9bc0            0x4008619d:0x3ffc9be0 0x401c039d:0x3ffc9c00 0x40094999:0x3ffc9c30 0x40095770:0x3ffc9c60 0x40095bf9:0x3ffc9ca0 0x40095caf:0x3ffc9ce0 0x40093507:0x3ffc9d20
0x40081c42: panic_abort at C:/esp/esp-idf/components/esp_system/panic.c:408
0x400899dd: esp_system_abort at C:/esp/esp-idf/components/esp_system/esp_system.c:137
0x400d6ff3: heap_caps_alloc_failed at C:/esp/esp-idf/components/heap/heap_caps.c:74
0x40081e44: heap_caps_malloc at C:/esp/esp-idf/components/heap/heap_caps.c:177
0x40081e6d: heap_caps_malloc_default at C:/esp/esp-idf/components/heap/heap_caps.c:199
0x400823f5: trace_malloc at C:/esp/esp-idf/components/heap/include/heap_trace.inc:96
0x400825ed: __wrap_malloc at C:/esp/esp-idf/components/heap/include/heap_trace.inc:158
0x4008619d: wifi_malloc at C:/esp/esp-idf/components/esp_wifi/esp32/esp_adapter.c:80
0x401c039d: esf_buf_alloc_dynamic at ??:?
0x40094999: esf_buf_alloc at ??:?
0x40095770: wDev_IndicateFrame at ??:?
0x40095bf9: wDev_ProcessRxSucData at ??:?
0x40095caf: wdevProcessRxSucDataAll at ??:?
0x40093507: ppTask at ??:?

2) Second type memory allocation failed

[mdf_mem, 194]: Free heap, current: 116196, minimum: 111476

Memory allocation failed

Backtrace: 0x40081c42:0x3ffc2ee0 0x400899dd:0x3ffc2f00 0x400d6ff7:0x3ffc2f20 0x40081e44:0x3ffc2f40 0x40081e6d:0x3ffc2f60 0x400823f5:0x3ffc2f80 0x400825ed:0x3ffc2fe0            0x4011da70:0x3ffc3000 0x4011daeb:0x3ffc3020 0x4011db55:0x3ffc3040 0x40122fc1:0x3ffc3060 0x401230d2:0x3ffc3080 0x40125199:0x3ffc30a0 0x401251bb:0x3ffc30c0               0x4012317a:0x3ffc30e0 0x4011c8d8:0x3ffc3100
0x40081c42: panic_abort at C:/esp/esp-idf/components/esp_system/panic.c:408
0x400899dd: esp_system_abort at C:/esp/esp-idf/components/esp_system/esp_system.c:137
0x400d6ff7: heap_caps_alloc_failed at C:/esp/esp-idf/components/heap/heap_caps.c:74
0x40081e44: heap_caps_malloc at C:/esp/esp-idf/components/heap/heap_caps.c:177
0x40081e6d: heap_caps_malloc_default at C:/esp/esp-idf/components/heap/heap_caps.c:199
0x400823f5: trace_malloc at C:/esp/esp-idf/components/heap/include/heap_trace.inc:96
0x400825ed: __wrap_malloc at C:/esp/esp-idf/components/heap/include/heap_trace.inc:158
0x4011da70: mem_malloc at C:/esp/esp-idf/components/lwip/lwip/src/core/mem.c:237
0x4011daeb: do_memp_malloc_pool at C:/esp/esp-idf/components/lwip/lwip/src/core/memp.c:254
0x4011db55: memp_malloc at C:/esp/esp-idf/components/lwip/lwip/src/core/memp.c:350 (discriminator 2)
0x40122fc1: sys_timeout_abs at C:/esp/esp-idf/components/lwip/lwip/src/core/timeouts.c:200
0x401230d2: sys_timeout at C:/esp/esp-idf/components/lwip/lwip/src/core/timeouts.c:325 (discriminator 2)
0x40125199: dhcp_fine_tmr at C:/esp/esp-idf/components/lwip/lwip/src/core/ipv4/dhcp.c:599
0x401251bb: dhcp_fine_timeout_cb at C:/esp/esp-idf/components/lwip/lwip/src/core/ipv4/dhcp.c:310
0x4012317a: sys_check_timeouts at C:/esp/esp-idf/components/lwip/lwip/src/core/timeouts.c:411
0x4011c8d8: tcpip_timeouts_mbox_fetch at C:/esp/esp-idf/components/lwip/lwip/src/api/tcpip.c:104
 (inlined by) tcpip_thread at C:/esp/esp-idf/components/lwip/lwip/src/api/tcpip.c:148

3) Third type memory allocation failed

MASTER_TEST: Characteristic #0 Total Consumption (Wh) value = 238.000000 (0x436e0000) read successful.

Memory allocation failed

Backtrace: 0x40081c42:0x3ffc5ea0 0x400899dd:0x3ffc5ec0 0x400d6fdb:0x3ffc5ee0 0x40081e44:0x3ffc5f00 0x400823e7:0x3ffc5f20 0x40082645:0x3ffc5f80 0x40115709:0x3ffc5fa0
0x40081c42: panic_abort at C:/esp/esp-idf/components/esp_system/panic.c:408
0x400899dd: esp_system_abort at C:/esp/esp-idf/components/esp_system/esp_system.c:137
0x400d6fdb: heap_caps_alloc_failed at C:/esp/esp-idf/components/heap/heap_caps.c:74
0x40081e44: heap_caps_malloc at C:/esp/esp-idf/components/heap/heap_caps.c:177
0x400823e7: trace_malloc at C:/esp/esp-idf/components/heap/include/heap_trace.inc:94
0x40082645: __wrap_heap_caps_malloc at C:/esp/esp-idf/components/heap/include/heap_trace.inc:183
0x40115709: emac_w5500_task at C:/esp/esp-idf/components/esp_eth/src/esp_eth_mac_w5500.c:335

4) When the device don´t have memory allocation failed and can´t connect on internet

 [conn_bb_sitter, 190]: <ESP_OK><DISCONNECTING FROM CURRENT PARENT AND FLUSHING UPSTREAM> watch_dog:5043701, ticks:5190002
[conn_bb_sitter, 592]: inside here i'm: NODE
W (5190732) wifi:wifi_nvs_set fail, index=30 ret=4359

W (5190791) wifi:wifi_nvs_set fail, index=5 ret=4359
wifi:Next TBTT incorrect! last beacon:776958295, offset:286024, next beacon:808292695, beacon interval:307200, dtim period:0, dtim count:0, listen interval:3, now:892011915

More Information.

No response

zhangyanjiaoesp commented 3 weeks ago

@pedrohugo-psc Are you just starting your project? Since MDF is out of maintenance and v4.4 is nearing its life cycle, it is recommended that you use mesh lite .

pedrohugo-psc commented 3 weeks ago

@pedrohugo-psc Are you just starting your project? Since MDF is out of maintenance and v4.4 is nearing its life cycle, it is recommended that you use mesh lite .

Hello, @zhangyanjiaoesp,

This project aims to request data from meters via Modbus TCP, RS485, PIMA, PULSE, or RS232 and send this data to a database. Therefore, it is necessary to connect the device to the internet via WiFi and use WiFi Mesh to avoid network overload. The system runs well with RS485, PIMA, PULSE, or RS232 for extended periods without encountering these bugs, and MDF functions properly. However, recently, I needed to integrate Modbus TCP for data requests, which has introduced bugs when this protocol is selected. If another protocol is used, the system runs well. I believe the issue is memory fragmentation because, upon analyzing the memory using esp_get_free_heap_size(), no memory leaks were detected.

There is the log below apper first type memory allocation failed:

ETH-TESTE.20240612104210.txt

And this log apper wifi_nvs_set fail:

ETH-TEST3.20240612150002.txt

pedrohugo-psc commented 2 weeks ago

@zhangyanjiaoesp,

This project is not new and has been running well for a long time. However, I recently needed to implement the Modbus TCP protocol to request data, which has caused this issue to appear.

zhangyanjiaoesp commented 2 weeks ago

@pedrohugo-psc ok, I will check the logs you provide. By the way, can you provide a demo to test?

zhangyanjiaoesp commented 2 weeks ago

@pedrohugo-psc Can you open the Wi-Fi information logs? There seems only have the warning logs. The wifi_nvs_set fail is setting the STA's channel and AP'channel, seems to be the set config fail. The first type memory allocation failure indicates that the device is receiving packets. Can't get more information. Can you explain what Wi-Fi mesh and Modbus TCP do in your case ?

pedrohugo-psc commented 2 weeks ago

Hi, @zhangyanjiaoesp,

This project is extensive and proprietary to the company, making it complicated to send a demo.

Regarding WiFi Mesh, I use this module to connect the root to the router and the nodes to their parent devices. The root is an ESP with a better signal, while the other ESPs become nodes. The purpose of this WiFi Mesh is to avoid overloading the network.

Regarding Modbus TCP, I use this module for the ESP to request data from the meter. The requests occur every 30 seconds. The connection between the ESPs and the meter is via the network, with the ESPs obtaining an IP address via DHCP before connecting. I use the W5500 module for Ethernet.

The ESPs request data, and every 15 minutes, the data is sent to the AWS database through the root.

This project ran well without Modbus TCP and the issue appeared after its integration.

I ran the code again and set the log output to verbose. I hope this log can help:

ETH-TESTE.20240618171437.txt

pedrohugo-psc commented 2 weeks ago

@zhangyanjiaoesp,

Sorry for sending the previous log. This new log contains more information and I hope it helps.

ETH-TESTE.20240620095501.txt

pedrohugo-psc commented 4 days ago

Hello, @zhangyanjiaoesp,

The last log have a lot information, but hasn´t information about WiFi. Then, I send another one log that have WiFi log:

ETH-TEST.20240703091219.txt

Another thing I noticed is that when I use the LILYGO TTGO T-Internet-POE board with the ESP32-WROOM-32E, these bugs don't appear. I conducted this test for 6 days. Is it possible that the bugs are caused by using the ESP32-WROOM-32D?

Edit: Furthermore, this problem can be caused by SPI pins (HSPI), utilized by module W5500?