espressif / esp-idf

Espressif IoT Development Framework. Official development framework for Espressif SoCs.
Apache License 2.0
13.79k stars 7.31k forks source link

Heap corruption diagnostics causing heap corruption? #1582

Closed PerMalmberg closed 6 years ago

PerMalmberg commented 6 years ago

Hi,

This became rather lengthy, please bear with me.

Assuming Heap Debugging is set to "Comprehensive", when allocating a block of memory (multi_heap_malloc()), an additional poison_head_t and poison_tail_t are allocated. Then, it goes on to poison the allocated buffer in poison_allocated_region(), setting the head and tail with HEAD_CANARY_PATTERN (0xABBA1234) and TAIL_CANARY_PATTERN (0xBAAD5678), respectively. When back in multi_heap_malloc, we call verify_fill_pattern(), to verify that the buffer in between head an tail contains the FREE_FILL_WORD (0xfefefefe), and replacing it with MALLOC_FILL_WORD (0xcececece)

First question: What fills the memory with 0xfe in the first place?

Ok, so we now know that whenever we allocate a buffer, we can expect it to be filled with 0xce, correct?

Moving on to freeing memory.

Again, assuming that Heap Debugging is set to "Comprehensive", when freeing a block of memory (multi_heap_free()), the region being freed is verified via verify_allocated_region() such that the head and tail values are intact, i.e. no buffer under/overruns have happened (at least none that wrote any data). Next, the entire buffer, including head and tail are filled with 0xfe, after which the entire buffer is handed back to the heap.

All correct so far?

Now, to my actual problem. I am consistently (as in 100%) getting the stack trace at the bottom of this post very shortly after start up.

First off, I've replaced cJSON with version 1.7.1 as 1.6.0 which ships with ESP-IDF has a memory bug which causes a too small buffer to be allocated.

While calling cJSON_free in smooth::core::json::Value::to_string[abi:cxx11]() in Value.cpp:285 (using default mapping to free()) I expect that to eventually call function multi_heap_free(), which it seems to do just before entering newlib-code (the line numbers seems off once it enters this part of the code):

0x4012bc71: __assert_func at /Users/ivan/e/newlib_xtensa-2.2.0-bin/newlib_xtensa-2.2.0/xtensa-esp32-elf/newlib/libc/stdlib/../../../.././newlib/libc/stdlib/assert.c:59 (discriminator 8)

0x40088dbf: multi_heap_free at /home/permal/esp/esp-idf/components/heap/./multi_heap_poisoning.c:339

Deeper in the stack it also wants to allocate a block of memory for whatever reason(?), which eventually ends up in the corrupt heap message and total stop.

A newly allocated block should have a tail value of 0xBAAD5678, which the message also states, but look closer at what it got:

CORRUPT HEAP: Bad tail at 0x3ffe04d1. Expected 0xbaad5678 got 0xcececece

That is the MALLOC_FILL_WORD, the value used to fill the buffer between head and tail in multi_heap_malloc()/verify_fill_pattern().

I 'm not certain that I'm right, but I've literally spent days with cppcheck, valgrind and -fsanitize trying to come to the conclusion that the error lies in my code and not in the code for heap poisoning, but I always come back to the later one over and over. The read failing value in the tail is no random number, it is always MALLOC_FILL_WORD and there is no user code that ever writes that specific value.

I've also attempted to use a breakpoint to stop the application when it hits 0x3ffe04d1:

esp_set_watchpoint(0, (void *)0x3ffe04d1, 4, ESP_WATCHPOINT_STORE);

If I set it at the very start of the program, it seems the memory is first accessed by the Wifi:

0x4013d814: tcpip_adapter_set_hostname_api at /home/permal/esp/esp-idf/components/tcpip_adapter/./tcpip_adapter_lwip.c:1150

V (1566) Task: Task 'I2CTask' initialized
Guru Meditation Error: Core  0 panic'ed (Unhandled debug exception)
Debug exception reason: Watchpoint 0 triggered 
Core 0 register dump:
PC      : 0x40088c92  PS      : 0x00060a36  A0      : 0x80088d74  A1      : 0x3ffcea30  
0x40088c92: verify_fill_pattern at /home/permal/esp/esp-idf/components/heap/./multi_heap_poisoning.c:339

A2      : 0x3ffe04d0  A3      : 0x000005d8  A4      : 0xcececece  A5      : 0x00000001  
A6      : 0x00000001  A7      : 0xfefefefe  A8      : 0x3ffe0aa8  A9      : 0x00000001  
A10     : 0x00000001  A11     : 0x3ffe3ff9  A12     : 0x00003548  A13     : 0xfefefefe  
A14     : 0x00003b9c  A15     : 0x00000000  SAR     : 0x00000018  EXCCAUSE: 0x00000001  
EXCVADDR: 0x00000000  LBEG    : 0x4000c46c  LEND    : 0x4000c477  LCOUNT  : 0x00000000  

Backtrace: 0x40088c92:0x3ffcea30 0x40088d71:0x3ffcea60 0x40082e74:0x3ffcea80 0x40082ea5:0x3ffceaa0 0x40083349:0x3ffceac0 0x4000beaf:0x3ffceae0 0x4008535d:0x3ffceb00 0x400f1481:0x3ffceb20 0x400f085d:0x3ffceb50 0x400ec422:0x3ffcebb0
0x40088c92: verify_fill_pattern at /home/permal/esp/esp-idf/components/heap/./multi_heap_poisoning.c:339

0x40088d71: multi_heap_malloc at /home/permal/esp/esp-idf/components/heap/./multi_heap_poisoning.c:339

0x40082e74: heap_caps_malloc at /home/permal/esp/esp-idf/components/heap/./heap_caps.c:136

0x40082ea5: heap_caps_malloc_default at /home/permal/esp/esp-idf/components/heap/./heap_caps.c:136

0x40083349: _malloc_r at /home/permal/esp/esp-idf/components/newlib/./syscalls.c:37

0x4008535d: wifi_malloc at /home/permal/esp/esp-idf/components/esp32/./wifi_internal.c:28

0x400f1481: esf_buf_alloc at ??:?

0x400f085d: wdevProcessRxSucDataAll at ??:?

0x400ec422: ppTask at ??:?

Setting the same breakpoint slightly prior to where cJSON_free is called (as in the problem description above), like so:

std::string Value::to_string()
{
    esp_set_watchpoint(0, (void *)0x3ffe04d1, 4, ESP_WATCHPOINT_STORE);
    auto* p = cJSON_Print(data);
    esp_clear_watchpoint(0);
    std::string s{p};
    cJSON_free(p);
    return s;
    }

it seems that cJSON_print() is the culprit:

Guru Meditation Error: Core  0 panic'ed (Unhandled debug exception)
Debug exception reason: Watchpoint 0 triggered 
Core 0 register dump:
PC      : 0x40088c92  PS      : 0x00060636  A0      : 0x80088d74  A1      : 0x3ffbe800  
0x40088c92: verify_fill_pattern at /home/permal/esp/esp-idf/components/heap/./multi_heap_poisoning.c:339

A2      : 0x3ffe04d0  A3      : 0x00000094  A4      : 0xcececece  A5      : 0x00000001  
A6      : 0x00000001  A7      : 0xfefefefe  A8      : 0x3ffe0564  A9      : 0x00000001  
A10     : 0x00000001  A11     : 0x3ffe3ff9  A12     : 0x00003a8c  A13     : 0xfefefefe  
A14     : 0x00003b9c  A15     : 0xff000000  SAR     : 0x0000001e  EXCCAUSE: 0x00000001  
EXCVADDR: 0x00000000  LBEG    : 0x4000c46c  LEND    : 0x4000c477  LCOUNT  : 0x00000000  

Backtrace: 0x40088c92:0x3ffbe800 0x40088d71:0x3ffbe830 0x40082e74:0x3ffbe850 0x40082ea5:0x3ffbe870 0x40083349:0x3ffbe890 0x4000beaf:0x3ffbe8b0 0x40193370:0x3ffbe8d0 0x40193b1c:0x3ffbe920 0x40174777:0x3ffbe940 0x401206c7:0x3ffbe960 0x40120878:0x3ffbeab0 0x40110809:0x3ffbead0 0x4015e80a:0x3ffbeb10 0x4015ea17:0x3ffbec40 0x4011265c:0x3ffbeca0 0x400d1112:0x3ffbfc30
0x40088c92: verify_fill_pattern at /home/permal/esp/esp-idf/components/heap/./multi_heap_poisoning.c:339

0x40088d71: multi_heap_malloc at /home/permal/esp/esp-idf/components/heap/./multi_heap_poisoning.c:339

0x40082e74: heap_caps_malloc at /home/permal/esp/esp-idf/components/heap/./heap_caps.c:136

0x40082ea5: heap_caps_malloc_default at /home/permal/esp/esp-idf/components/heap/./heap_caps.c:136

0x40083349: _malloc_r at /home/permal/esp/esp-idf/components/newlib/./syscalls.c:37

0x40193370: print at /home/permal/esp/esp-idf/components/json/library/cJSON.c:82

0x40193b1c: cJSON_Print at /home/permal/esp/esp-idf/components/json/library/cJSON.c:82

0x40174777: smooth::core::json::Value::to_string[abi:cxx11]() at /home/permal/code/G2-Alarmsystem/components/Smooth/core/json/Value.cpp:282

0x401206c7: G2Alarm::event(AnalogValue const&) at /home/permal/code/G2-Alarmsystem/components/Smooth/include/smooth/core/ipc/TaskEventQueue.h:90

0x40120878: non-virtual thunk to G2Alarm::event(AnalogValue const&) at ??:?

0x40110809: smooth::core::ipc::TaskEventQueue<AnalogValue>::forward_to_event_queue() at /home/permal/code/G2-Alarmsystem/components/Smooth/include/smooth/core/ipc/TaskEventQueue.h:90

0x4015e80a: smooth::core::Task::exec() at /home/permal/code/G2-Alarmsystem/components/Smooth/core/Task.cpp:131

0x4015ea17: smooth::core::Task::start() at /home/permal/code/G2-Alarmsystem/components/Smooth/core/Task.cpp:56

0x4011265c: app_main at /home/permal/code/G2-Alarmsystem/main/./main.cpp:21

0x400d1112: main_task at /home/permal/esp/esp-idf/components/esp32/./cpu_start.c:455

CPU halted.

And yes, if I don't call cJSON_Print(), I no longer have an issue. However, I can also just not start a task that literally [i]only sleeps in the current use case[/i] and the issue also "goes away". I'm not sure what this tells us.

If I'm looking at all this the wrong way, please tell me. No one will be happier than me if it is possible to limit the search for this issue to my own code.

Update 1: I'm currently running the exact same code with "Light impact" mode in which the part where it writes the *_FILL_WORD bytes to the buffer is disabled and I'm not getting any issues with destroyed tails.

Update 2: Nearly two days later it is still running. Also, cmrogan in this thread seems to be having the exact same issue.

This is the complete stacktrace from which the snippets above are taken

CORRUPT HEAP: Bad tail at 0x3ffe04d1. Expected 0xbaad5678 got 0xcececece
CORRUPT HEAP: multi_heap.c:370 detected at 0x3ffe04d8
abort() was called at PC 0x400889bb on core 0
0x400889bb: multi_heap_assert at /home/permal/esp/esp-idf/components/heap/./multi_heap.c:339
 (inlined by) multi_heap_malloc_impl at /home/permal/esp/esp-idf/components/heap/./multi_heap.c:370

Backtrace: 0x40089108:0x3ffbe3e0 0x400892ab:0x3ffbe400 0x400889bb:0x3ffbe420 0x40088d5d:0x3ffbe440 0x40082e74:0x3ffbe460 0x40082ea5:0x3ffbe480 0x40083349:0x3ffbe4a0 0x4000bbc2:0x3ffbe4c0 0x4005918d:0x3ffbe4e0 0x40058d62:0x3ffbe540 0x40130002:0x3ffbe560 0x4012bd01:0x3ffbe820 0x4012bc71:0x3ffbe870 0x40088dbf:0x3ffbe8a0 0x40082f62:0x3ffbe8c0 0x40083355:0x3ffbe8e0 0x4000bec7:0x3ffbe900 0x40193cae:0x3ffbe920 0x40174785:0x3ffbe940 0x401206c7:0x3ffbe960 0x40120878:0x3ffbeab0 0x40110809:0x3ffbead0 0x4015e7fe:0x3ffbeb10 0x4015ea0b:0x3ffbec40 0x4011265c:0x3ffbeca0 0x400d1112:0x3ffbfc30
0x40089108: invoke_abort at /home/permal/esp/esp-idf/components/esp32/./panic.c:648

0x400892ab: abort at /home/permal/esp/esp-idf/components/esp32/./panic.c:648

0x400889bb: multi_heap_assert at /home/permal/esp/esp-idf/components/heap/./multi_heap.c:339
 (inlined by) multi_heap_malloc_impl at /home/permal/esp/esp-idf/components/heap/./multi_heap.c:370

0x40088d5d: multi_heap_malloc at /home/permal/esp/esp-idf/components/heap/./multi_heap_poisoning.c:339

0x40082e74: heap_caps_malloc at /home/permal/esp/esp-idf/components/heap/./heap_caps.c:136

0x40082ea5: heap_caps_malloc_default at /home/permal/esp/esp-idf/components/heap/./heap_caps.c:136

0x40083349: _malloc_r at /home/permal/esp/esp-idf/components/newlib/./syscalls.c:37

0x40130002: _vfiprintf_r at /Users/ivan/e/newlib_xtensa-2.2.0-bin/newlib_xtensa-2.2.0/xtensa-esp32-elf/newlib/libc/stdio/../../../.././newlib/libc/stdio/vfprintf.c:865 (discriminator 3)

0x4012bd01: fiprintf at /Users/ivan/e/newlib_xtensa-2.2.0-bin/newlib_xtensa-2.2.0/xtensa-esp32-elf/newlib/libc/stdio/../../../.././newlib/libc/stdio/fiprintf.c:50

0x4012bc71: __assert_func at /Users/ivan/e/newlib_xtensa-2.2.0-bin/newlib_xtensa-2.2.0/xtensa-esp32-elf/newlib/libc/stdlib/../../../.././newlib/libc/stdlib/assert.c:59 (discriminator 8)

0x40088dbf: multi_heap_free at /home/permal/esp/esp-idf/components/heap/./multi_heap_poisoning.c:339

0x40082f62: heap_caps_free at /home/permal/esp/esp-idf/components/heap/./heap_caps.c:136

0x40083355: _free_r at /home/permal/esp/esp-idf/components/newlib/./syscalls.c:42

0x40193cae: cJSON_free at /home/permal/esp/esp-idf/components/json/library/cJSON.c:82

0x40174785: smooth::core::json::Value::to_string[abi:cxx11]() at /home/permal/code/G2-Alarmsystem/components/Smooth/core/json/Value.cpp:285

0x401206c7: G2Alarm::event(AnalogValue const&) at /home/permal/code/G2-Alarmsystem/components/Smooth/include/smooth/core/ipc/TaskEventQueue.h:90

0x40120878: non-virtual thunk to G2Alarm::event(AnalogValue const&) at ??:?

0x40110809: smooth::core::ipc::TaskEventQueue<AnalogValue>::forward_to_event_queue() at /home/permal/code/G2-Alarmsystem/components/Smooth/include/smooth/core/ipc/TaskEventQueue.h:90

0x4015e7fe: smooth::core::Task::exec() at /home/permal/code/G2-Alarmsystem/components/Smooth/core/Task.cpp:131

0x4015ea0b: smooth::core::Task::start() at /home/permal/code/G2-Alarmsystem/components/Smooth/core/Task.cpp:56

0x4011265c: app_main at /home/permal/code/G2-Alarmsystem/main/./main.cpp:20

0x400d1112: main_task at /home/permal/esp/esp-idf/components/esp32/./cpu_start.c:455
projectgus commented 6 years ago

Hi @PerMalmberg,

At this point, I'd prefer to discuss this in a single place rather than two places. I've replied to you on the forum for now: https://esp32.com/viewtopic.php?f=2&t=4583&p=19970#p19970

If we determine that this is a bug in the heap diagnostics, please reopen this issue and we'll continue from here. But I don't think this is established quite yet.

Angus