espressif / esp-idf

Espressif IoT Development Framework. Official development framework for Espressif SoCs.
Apache License 2.0
13.37k stars 7.21k forks source link

ESP-Mesh corrupt heap detected in comprehensive mode (IDFGH-9376) #10750

Closed IH303 closed 1 year ago

IH303 commented 1 year ago

Answers checklist.

IDF version.

release/v5.0 @ 885e501

Operating System used.

Linux

How did you build your project?

VS Code IDE

If you are using Windows, please specify command line type.

None

Development Kit.

ESP32S2 - Custom Board

Power Supply used.

External 5V

What is the expected behavior?

I hava a mesh network of at least two devices. When I reset/restart a child node of the mesh network, it will boot again and reconnect to the root node. Even when heap corruption detection is in comprehensive mode, everything should work normally. No heap corruption should be detected.

What is the actual behavior?

When I restart a child node, it finds the root node but it can't connect to it, because the root node already panicked due to detection of a corrupted heap.

Steps to reproduce.

  1. Use the _mesh/internalcommunication example (most likely any mesh example will work)
  2. Set your Router SSID and password in sdkconfig example configuration
  3. Set Heap corruption detection in sdkconfig to Comprehensive
  4. Flash at least onto two devices
  5. Start the devices
  6. Restart one of the children (e.g. via reset button)
  7. Root will detect a corrupted heap when the child tries to reconnect

Debug Logs.

I (71725) wifi:new:<10,2>, old:<10,2>, ap:<10,2>, sta:<10,2>, prof:10
I (71735) wifi:Send SA Query req with transaction id 2a79
I (71945) wifi:Send SA Query req with transaction id c4f
I (72145) wifi:Send SA Query req with transaction id 7b84
I (72355) wifi:Send SA Query req with transaction id de62
I (72555) wifi:Send SA Query req with transaction id e17c
I (72765) wifi:STA not responded to 5 SA Query attempts, Reset connection sending disassoc
I (72765) wifi:station: 7c:df:a1:01:0d:30 leave, AID = 1, bss_flags is 134243, bss:0x3ffeb070
I (72765) wifi:new:<10,2>, old:<10,2>, ap:<10,2>, sta:<10,2>, prof:10
I (72775) mesh_main: <MESH_EVENT_CHILD_DISCONNECTED>aid:1, 7c:df:a1:01:0d:30

CORRUPT HEAP: Invalid data at 0x3ffeb370. Expected 0xfefefefe got 0x00000000

assert failed: multi_heap_malloc multi_heap_poisoning.c:258 (ret)

Backtrace: 0x4002391e:0x3ffd6b20 0x4002a419:0x3ffd6b40 0x40030bd5:0x3ffd6b60 0x4002ecf6:0x3ffd6c80 0x40023b29:0x3ffd6ca0 0x40023b89:0x3ffd6cc0 0x40023bbe:0x3ffd6ce0 0x40030be5:0x3ffd6d00 0x40026865:0x3ffd6d20 0x401189cd:0x3ffd6d40 0x40033311:0x3ffd6d70 0x400340c4:0x3ffd6da0 0x400344fd:0x3ffd6de0 0x400345a5:0x3ffd6e20 0x40032328:0x3ffd6e60 0x4002cd85:0x3ffd6e90
0x4002391e: panic_abort at /home/ibrahim/esp/esp-idf/components/esp_system/panic.c:423

0x4002a419: esp_system_abort at /home/ibrahim/esp/esp-idf/components/esp_system/esp_system.c:153

0x40030bd5: __assert_func at /home/ibrahim/esp/esp-idf/components/newlib/assert.c:78

0x4002ecf6: multi_heap_malloc at /home/ibrahim/esp/esp-idf/components/heap/multi_heap_poisoning.c:258 (discriminator 1)

0x40023b29: heap_caps_malloc_base at /home/ibrahim/esp/esp-idf/components/heap/heap_caps.c:145

0x40023b89: heap_caps_malloc at /home/ibrahim/esp/esp-idf/components/heap/heap_caps.c:165

0x40023bbe: heap_caps_malloc_default at /home/ibrahim/esp/esp-idf/components/heap/heap_caps.c:190

0x40030be5: malloc at /home/ibrahim/esp/esp-idf/components/newlib/heap.c:24

0x40026865: wifi_malloc at /home/ibrahim/esp/esp-idf/components/esp_wifi/esp32s2/esp_adapter.c:65

0x401189cd: esf_buf_alloc_dynamic at ??:?

0x40033311: esf_buf_alloc at ??:?

0x400340c4: wDev_IndicateFrame at ??:?

0x400344fd: wDev_ProcessRxSucData at ??:?

0x400345a5: wdevProcessRxSucDataAll at ??:?

0x40032328: ppTask at ??:?

0x4002cd85: vPortTaskWrapper at /home/ibrahim/esp/esp-idf/components/freertos/FreeRTOS-Kernel/portable/xtensa/port.c:154

More Information.

I don't think it is a real corruption, only the poisoning fill pattern is not properly written, rather a memory region is filled with zeros even in comprehensive mode.

zhangyanjiaoesp commented 1 year ago

@IH303 https://docs.espressif.com/projects/esp-idf/zh_CN/latest/esp32/api-reference/system/heap_debug.html?highlight=heap%20memory#comprehensive

Yes, it is not a real corruption. And it is recommended to only enable the comprehensive mode when debugging, not in production.

zhangyanjiaoesp commented 1 year ago

@IH303 The bug has been fixed in the latest release/v5.0, and the solution will be synchronized to the github ASAP.

zhangyanjiaoesp commented 1 year ago

@IH303 89bb920c86d65fe2694c0d9b533b170d4a1c7dd2 has been merged, you can use the release/v5.0 after this branch

Alvin1Zhang commented 1 year ago

Thanks for reporting, feel free to reopen.