Closed KonssnoK closed 1 year ago
We confirm it's the TCPIP the other component in this crashes.
Line 2: CORRUPT HEAP: Invalid data at 0x3de9e130. Expected 0xfefefefe got 0x008cfefe
Line 3: CORRUPT HEAP: Invalid data at 0x3de9e134. Expected 0xfefefefe got 0xfefe0000
Line 59: (969) heap_history[1003] action HEAP_EXIT_MALLOC, task tiT, address 0x3DE9E10C-0x3DE9E609
Line 60: (968) heap_history[1004] action HEAP_ENTER_FREE, task MTXON, address 0x3DE9E10C-0x00000000
Line 213: (815) heap_history[133] action HEAP_EXIT_MALLOC, task tiT, address 0x3DE9E10C-0x3DE9E609
Line 214: (814) heap_history[134] action HEAP_ENTER_FREE, task MTXON, address 0x3DE9E10C-0x00000000
Line 367: (661) heap_history[287] action HEAP_EXIT_MALLOC, task tiT, address 0x3DE9E10C-0x3DE9E609
Line 368: (660) heap_history[288] action HEAP_ENTER_FREE, task MTXON, address 0x3DE9E10C-0x00000000
Line 541: (487) heap_history[461] action HEAP_EXIT_MALLOC, task tiT, address 0x3DE9E10C-0x3DE9E5C8
Line 546: (482) heap_history[466] action HEAP_ENTER_FREE, task MTXON, address 0x3DE9E10C-0x00000000
Line 697: (331) heap_history[617] action HEAP_EXIT_MALLOC, task tiT, address 0x3DE9E10C-0x3DE9E5C8
Line 702: (326) heap_history[622] action HEAP_ENTER_FREE, task MTXON, address 0x3DE9E10C-0x00000000
Line 763: (265) heap_history[683] action HEAP_EXIT_MALLOC, task tiT, address 0x3DE9E10C-0x3DE9E5C8
Line 768: (260) heap_history[688] action HEAP_ENTER_FREE, task MTXON, address 0x3DE9E10C-0x00000000
Line 789: (239) heap_history[709] action HEAP_EXIT_MALLOC, task tiT, address 0x3DE9E10C-0x3DE9E5C8
Line 790: (238) heap_history[710] action HEAP_ENTER_FREE, task MTXON, address 0x3DE9E10C-0x00000000
Line 807: (221) heap_history[727] action HEAP_EXIT_MALLOC, task tiT, address 0x3DE9E10C-0x3DE9E5C8
Line 812: (216) heap_history[732] action HEAP_ENTER_FREE, task MTXON, address 0x3DE9E10C-0x00000000
Line 895: (133) heap_history[815] action HEAP_EXIT_MALLOC, task tiT, address 0x3DE9E10C-0x3DE9E6F4
Line 896: (132) heap_history[816] action HEAP_ENTER_FREE, task MTXON, address 0x3DE9E10C-0x00000000
Line 987: (41) heap_history[907] action HEAP_EXIT_MALLOC, task tiT, address 0x3DE9E10C-0x3DE9E6F4
Line 996: (32) heap_history[916] action HEAP_ENTER_FREE, task MTXON, address 0x3DE9E10C-0x00000000
Considering the size of the mallocs, i suppose this is a pbuf allocation from lwip
10C is the first address invalid data at 130-134 with data 0000008C
I suspect a packet size or some flags
That's an interest approach to debugging heap corruption.
I usually:
Perhaps you can try this approach to narrow things down further.
That's an interest approach to debugging heap corruption.
I usually:
* enable corruption detection * add lots of calls to heap_caps_check_integrity() * start commenting out code until I know which code is causing the corruption
Perhaps you can try this approach to narrow things down further.
we'll refine our investigation on this once we have more resources available. Problem is, the mesh code is closed sourced and all our done steps lead to those modules.
On top, there are pending changes to the mesh library following our highest priority ticket, https://github.com/espressif/esp-idf/issues/9955
@KonssnoK Can you provide a demo to reproduce this issue?
i think we can try to modify our mesh sample to make it crash! will update when we have sometihng
@zhangyanjiaoesp please find the code at
https://github.com/KonssnoK/esp-idf/tree/bug/mtxon-heap-corruption/examples/mesh/ip_internal_network
To have an easy crash just put it on 3 devices and let it run a bit. Please notice that you have to disable watchdogs, i don't know how to put a "disable" in the sdkconfig.defaults, so i put
# CONFIG_TASK_WDT_CHECK_IDLE_TASK_CPU0 is not set
# CONFIG_TASK_WDT_CHECK_IDLE_TASK_CPU1 is not set
the more the devices, the easier the crash example: device_7.txt device_13.txt device_5.txt
I don't know why but in these logs the names of the tasks are wrong, in the terminal they are displayed correctly :)
Please note that this issue is impacting our firmware update capabilities...
just to complete the reporting, apparently it's possible to trigger more heap corruption crashes by simply enabling comprehensive heap poisoning in the example
https://github.com/KonssnoK/esp-idf/tree/bug/mesh_wifi/examples/mesh/ip_internal_network
used for https://github.com/espressif/esp-idf/issues/9955
One way to make the child device crash is to keep the hotspot off apparently
I (07:15:59.609) mesh_main: <MESH_EVENT_PARENT_CONNECTED>layer:0-->2, parent:7c:df:a1:e0:8b:7d<layer2>, ID:77:77:77:77:77:76
I (07:15:59.621) mesh_netif: It was a wifi station removing stuff
I (11487) wifi:<ba-add>idx:0 (ifx:0, 7c:df:a1:e0:8b:7d), tid:5, ssn:0, winSize:64
I (11497) wifi:AP's beacon interval = 102400 us, DTIM period = 1
E (07:16:00.675) mesh_netif: Send with err code 16394 ESP_ERR_MESH_TIMEOUT
W (07:16:00.676) mesh_main: <MESH_EVENT_TODS_REACHABLE>state:1
CORRUPT HEAP: Invalid data at 0x3fcf3edc. Expected 0xfefefefe got 0x0000fefe
CORRUPT HEAP: Invalid data at 0x3fcf3ee0. Expected 0xfefefefe got 0xfefe0000
assert failed: multi_heap_malloc multi_heap_poisoning.c:241 (ret)
Backtrace: 0x40375ace:0x3fcafaa0 0x4037c0c9:0x3fcafac0 0x40382d61:0x3fcafae0 0x40381e94:0x3fcafc00 0x40375dc9:0x3fcafc20 0x40375e29:0x3fcafc40 0x40375e66:0x3fcafc60 0x40382d71:0x3fcafc80 0x40378bb1:0x3fcafca0 0x4206fcf3:0x3fcafcc0 0x4206fd50:0x3fcafce0 0x4204e054:0x3fcafd00 0x420675d9:0x3fcafd50 0x42055aae:0x3fcafe30 0x4206faaf:0x3fcafe60 0x4206fb79:0x3fcafe80 0x4205ee2d:0x3fcafeb0 0x4037f3f6:0x3fcb01e0
xtensa-esp32s3-elf-addr2line -pfiaC -e c:\src\esp-idf\examples\mesh\ip_internal_network\build\ip_internal_network.elf 0x40375ace:
@zhangyanjiaoesp can you give us an update? are you able to reproduce the issue? thanks :)
@KonssnoK Yes, I can reproduce this issue according to your branch. And I'm debugging on it. (By the way, the changes in mesh lib has been merged into v4.4, 90d6e45d9ffc6c060d664d50063fe05c712d3042)
oh, ok! thanks, i'll check back in a couple of days as usual :)
We hit this (I assume) exact issue in v5.0.1, when continuously uploading a sizable (about 200 KiB/s) amount of data to the server over TCP for a period of hours to days until we get a panic_abort
and a reboot.
assert failed: tcp_free_acked_segments /IDF/components/lwip/lwip/src/core/tcp_in.c:1138 (tcp_receive: valid queue length)
Backtrace: 0x4037b406:0x3fcd3fc0 0x40382935:0x3fcd3fe0 0x40389095:0x3fcd4000 0x420dcc99:0x3fcd4120 0x420dcf18:0x3fcd4140 0x420de0b5:0x3fcd4170 0x420e2c1a:0x3fcd41e0 0x420e675e:0x3fcd4210 0x420d8765:0x3fcd4230
0x4037b406 - panic_abort
at /home/ronen/.espressif/esp-idf/master/components/esp_system/panic.c:452
0x3fcd3fc0 - _nimble_bss_end
at ??:??
0x40382935 - esp_system_abort
at /home/ronen/.espressif/esp-idf/master/components/esp_system/port/esp_system_chip.c:77
0x3fcd3fe0 - _nimble_bss_end
at ??:??
0x40389095 - __assert_func
at /home/ronen/.espressif/esp-idf/master/components/newlib/assert.c:81
0x3fcd4000 - _nimble_bss_end
at ??:??
0x420dcc99 - tcp_free_acked_segments
at /home/ronen/.espressif/esp-idf/master/components/lwip/lwip/src/core/tcp_in.c:1138
0x3fcd4120 - _nimble_bss_end
at ??:??
0x420dcf18 - tcp_receive
at /home/ronen/.espressif/esp-idf/master/components/lwip/lwip/src/core/tcp_in.c:1301
0x3fcd4140 - _nimble_bss_end
at ??:??
0x420de0b5 - tcp_process
at /home/ronen/.espressif/esp-idf/master/components/lwip/lwip/src/core/tcp_in.c:996
0x3fcd4170 - _nimble_bss_end
at ??:??
0x420e2c1a - ip4_input
at /home/ronen/.espressif/esp-idf/master/components/lwip/lwip/src/core/ipv4/ip4.c:749
0x3fcd41e0 - _nimble_bss_end
at ??:??
0x420e675e - ethernet_input
at /home/ronen/.espressif/esp-idf/master/components/lwip/lwip/src/netif/ethernet.c:186
0x3fcd4210 - _nimble_bss_end
at ??:??
0x420d8765 - tcpip_thread_handle_msg
at /home/ronen/.espressif/esp-idf/master/components/lwip/lwip/src/api/tcpip.c:174
0x3fcd4230 - _nimble_bss_end
at ??:??
That's with poisoning disabled. With it enabled it trigger more often in multi_heap_malloc
as posted earlier, and various other place like TLSF code, etc.
We thought that it was a PSRAM corruption issue for a while, so we enabled ECC there but it didn't help.
Some configurations which might be related that we use are
CONFIG_SPIRAM_TRY_ALLOCATE_WIFI_LWIP=y
...
CONFIG_LWIP_TCP_SND_BUF_DEFAULT=64800
CONFIG_LWIP_TCP_WND_DEFAULT=5744
CONFIG_LWIP_TCP_RECVMBOX_SIZE=64
CONFIG_LWIP_TCPIP_RECVMBOX_SIZE=64
and in another case:
CONFIG_SPIRAM_TRY_ALLOCATE_WIFI_LWIP=y
...
CONFIG_LWIP_TCP_SND_BUF_DEFAULT=118000
CONFIG_LWIP_TCP_WND_DEFAULT=5744
CONFIG_LWIP_TCP_RECVMBOX_SIZE=64
CONFIG_LWIP_TCPIP_RECVMBOX_SIZE=64
CONFIG_LWIP_WND_SCALE=y
Is there a lead on the horizon? I'm willing to patch our esp-idf
with an experimental fix, including a different revision of esp-wifi
if it's localized there.
@zRedShift are you using a mesh network or simple wifi? In our case it's likely an interaction between tcpip task and the MTXON task, which is part of the mesh library.
More generally speaking, it seems that in general this is happening when someone writes in a already released TCP packet.
@KonssnoK No, we're using it as a STA connected to a configurable AP, standard usage. It was the assumption that it's a UAF of TCP/IP buffer, that would be the most likely culprit. We also use BLE Coex and APSTA during provisioning but it still occurs without them.
Specifically, we're using esp_http_client
with mbedtls to upload massive files generated in real-time.
@zRedShift we also use mbedtls, but our setup is using the wifi mesh environment. BLE is off for us after provisioning.
When i have some time i'll try to understand if i can do some tests to see if the issue is in an open part of the code.
The possible involved modules i see are:
@KonssnoK I'm still debugging on this issue, will feedback to you ASAP when I find the root cause.
@zhangyanjiaoesp did you manage to understand which component is triggering it? if it's an open sourced one we can focus on it too.
@KonssnoK Please use the new wifi lib to test, the corrupt issue has been solved on my side. wifi_lib_0404.zip wifi firmware version: 7fec53d If your test is passed, then we will merged the fix into release/v4.4 ASAP.
thanks @zhangyanjiaoesp , we'll test asap
just a question, what is it built on top of? 15b1309726f600122b5d4539e73758ae1852168d ?
@KonssnoK The wifi lib is base on your branch https://github.com/KonssnoK/esp-idf/tree/bug/mtxon-heap-corruption/examples/mesh/ip_internal_network, and it contains all the fixes from https://github.com/espressif/esp-idf/issues/9955. If you want to update your release/v4.4 version, you can base on https://github.com/espressif/esp-idf/commit/90d6e45d9ffc6c060d664d50063fe05c712d3042.
@zhangyanjiaoesp we saw there were additional fixes to the wifi library, so we are currently aiming at 4c7d97e2bdbd26b1ad6adc6de8051888e1feec10
First though, let's check if everything is ok with the current branch
@zhangyanjiaoesp we continue to test but currently you have the green light for merging, we are not able to reproduce the issue with the same examples. Thanks :)
Would this fix affect v5.x versions of esp-idf
? And is there a possibility that it solves a non-mesh related esp-wifi
bug I mentioned here: https://github.com/espressif/esp-idf/issues/11006#issuecomment-1487529101?
Would this fix affect v5.x versions of
esp-idf
? And is there a possibility that it solves a non-mesh relatedesp-wifi
bug I mentioned here: #11006 (comment)?
these fixes are usually backported everywhere (still maintained), so it will be pushed also to 5.0 for sure. The 4.4 push is focused just because our product line works on v4.4 :)
@zhangyanjiaoesp we were able to reproduce again one crash, we'll try now to isolate and reproduce consistently
W (10:10:57.634) stats: iteration_time_us=1000004 Name=last% IDLE=40 IDLE=63 task_mesh_rx=1 tiT=4 ipc1=24 wifi=5 updater=56 MTXON=1
Guru Meditation Error: Core 1 panic'ed (StoreProhibited). Exception was unhandled.
Core 1 register dump:
PC : 0x40389eb2 PS : 0x00060e33 A0 : 0x8038a678 A1 : 0x3fcb7370
0x40389eb2: remove_free_block at C:/repos/v3/esp-idf/components/heap/heap_tlsf.c:207
(inlined by) block_locate_free at C:/repos/v3/esp-idf/components/heap/heap_tlsf.c:442
(inlined by) tlsf_malloc at C:/repos/v3/esp-idf/components/heap/heap_tlsf.c:849
A2 : 0x3d800014 A3 : 0x0000010c A4 : 0x3d896e6c A5 : 0x3d8a8dd4
A6 : 0x3d8112c4 A7 : 0x3d8a8d44 A8 : 0x3d8a8e24 A9 : 0x00000013
A10 : 0x964c06f3 A11 : 0x00000004 A12 : 0xfffffffc A13 : 0x00007001
A14 : 0x00060e20 A15 : 0x00000001 SAR : 0x0000001d EXCCAUSE: 0x0000001d
EXCVADDR: 0x0000700d LBEG : 0x400570e8 LEND : 0x400570f3 LCOUNT : 0x00000000
Backtrace: 0x40389eaf:0x3fcb7370 0x4038a675:0x3fcb7390 0x40376709:0x3fcb73b0 0x4037685b:0x3fcb73d0 0x403763f9:0x3fcb7420 0x421156f7:0x3fcb7440 0x42115754:0x3fcb7460 0x42116215:0x3fcb7480 0x4038b149:0x3fcb7640 0x4038b235:0x3fcb7690 0x4003f4bd:0x3fcb76b0 |<-CORRUPTED
0x40389eaf: remove_free_block at C:/repos/v3/esp-idf/components/heap/heap_tlsf.c:206
(inlined by) block_locate_free at C:/repos/v3/esp-idf/components/heap/heap_tlsf.c:442
(inlined by) tlsf_malloc at C:/repos/v3/esp-idf/components/heap/heap_tlsf.c:849
0x4038a675: multi_heap_malloc_impl at C:/repos/v3/esp-idf/components/heap/multi_heap.c:187
0x40376709: heap_caps_malloc_base at C:/repos/v3/esp-idf/components/heap/heap_caps.c:175
0x4037685b: heap_caps_malloc_prefer at C:/repos/v3/esp-idf/components/heap/heap_caps.c:290
0x403763f9: wifi_malloc at C:/repos/v3/esp-idf/components/esp_wifi/esp32s3/esp_adapter.c:71
0x421156f7: mesh_malloc at ??:?
0x42115754: esp_mesh_create_context at ??:?
0x42116215: esp_mesh_wifi_recv_cb at ??:?
0x4038b149: sta_input at ??:?
0x4038b235: sta_rx_cb at ??:?
Generically speaking the stability of the system has improved thousands of folds. We still get some crashes, which seem very similar to what was happening before. @zhangyanjiaoesp maybe you can check for similar code (to the one you fixed) in the libraries?
To trigger them, instead of doing high traffic, we do parallel firmware updates.
Here an example :
Line 1609: CORRUPT HEAP: Invalid data at 0x3dea2dbc. Expected 0xfefefefe got 0x3de00018
Line 2290: (344) heap_history[738] action HEAP_EXIT_MALLOC, task tiT, address 0x3DEA2CBC-0x3DEA2D4F
Line 2334: (300) heap_history[782] action HEAP_EXIT_MALLOC, task MTXON, address 0x3DEA2D3C-0x3DEA2D6C
Line 2336: (298) heap_history[784] action HEAP_EXIT_MALLOC, task MTXON, address 0x3DEA2D80-0x3DEA2DAA
Line 2337: (297) heap_history[785] action HEAP_ENTER_FREE, task MTX, address 0x3DEA2D80-0x00000000
Line 2339: (295) heap_history[787] action HEAP_ENTER_FREE, task MTX, address 0x3DEA2D3C-0x00000000
Line 2341: (293) heap_history[789] action HEAP_ENTER_FREE, task MRX, address 0x3DEA2D80-0x00000000
Line 2343: (291) heap_history[791] action HEAP_ENTER_FREE, task MRX, address 0x3DEA2D3C-0x00000000
Line 2353: (281) heap_history[801] action HEAP_ENTER_FREE, task task_mesh_rx, address 0x3DEA2D3C-0x00000000
Line 2384: (250) heap_history[832] action HEAP_EXIT_MALLOC, task tiT, address 0x3DEA2D3C-0x3DEA2D9C
Line 2401: (233) heap_history[849] action HEAP_ENTER_FREE, task MTXON, address 0x3DEA2D3C-0x00000000
Line 2414: (220) heap_history[862] action HEAP_EXIT_MALLOC, task tiT, address 0x3DEA2D3C-0x3DEA2D9C
Line 2435: (199) heap_history[883] action HEAP_ENTER_FREE, task MTXON, address 0x3DEA2D3C-0x00000000
Line 2448: (186) heap_history[896] action HEAP_EXIT_MALLOC, task tiT, address 0x3DEA2D3C-0x3DEA2D9C
Line 2449: (185) heap_history[897] action HEAP_ENTER_FREE, task MTXON, address 0x3DEA2D3C-0x00000000
Line 2464: (170) heap_history[912] action HEAP_EXIT_MALLOC, task tiT, address 0x3DEA2D3C-0x3DEA2D9C
Line 2477: (157) heap_history[925] action HEAP_ENTER_FREE, task MTXON, address 0x3DEA2D3C-0x00000000
Line 2500: (134) heap_history[948] action HEAP_EXIT_MALLOC, task tiT, address 0x3DEA2D3C-0x3DEA2D9C
Line 2509: (125) heap_history[957] action HEAP_ENTER_FREE, task MTXON, address 0x3DEA2D3C-0x00000000
Line 2530: (104) heap_history[978] action HEAP_EXIT_MALLOC, task tiT, address 0x3DEA2D3C-0x3DEA2D9C
Line 2543: (91) heap_history[991] action HEAP_ENTER_FREE, task MTXON, address 0x3DEA2D3C-0x00000000
Line 2566: (68) heap_history[1014] action HEAP_EXIT_MALLOC, task tiT, address 0x3DEA2D3C-0x3DEA2D9C
Line 2567: (67) heap_history[1015] action HEAP_ENTER_FREE, task MTXON, address 0x3DEA2D3C-0x00000000
Line 2596: (38) heap_history[20] action HEAP_EXIT_MALLOC, task tiT, address 0x3DEA2D3C-0x3DEA2D9C
Line 2609: (25) heap_history[33] action HEAP_ENTER_FREE, task MTXON, address 0x3DEA2D3C-0x00000000
Line 2632: (2) heap_history[56] action HEAP_EXIT_MALLOC, task tiT, address 0x3DEA2D3C-0x3DEA2D9C
Line 2835: CORRUPT HEAP: Bad head at 0x3deaf2cc. Expected 0xabba1234 got 0x3de00018
Line 3508: (352) heap_history[17] action HEAP_EXIT_MALLOC, task tiT, address 0x3DEAEDD8-0x3DEAF2EB
Line 3732: (128) heap_history[241] action HEAP_EXIT_MALLOC, task tiT, address 0x3DEAEDD8-0x3DEAF230
Line 3738: (122) heap_history[247] action HEAP_EXIT_MALLOC, task tiT, address 0x3DEAF2EC-0x3DEAF31C
Line 3763: (97) heap_history[272] action HEAP_ENTER_FREE, task MTX, address 0x3DEAF2EC-0x00000000
Line 3812: (48) heap_history[321] action HEAP_EXIT_MALLOC, task tiT, address 0x3DEAEDD8-0x3DEAF230
Line 3818: (42) heap_history[327] action HEAP_EXIT_MALLOC, task tiT, address 0x3DEAF2D8-0x3DEAF308
Line 3851: (9) heap_history[360] action HEAP_ENTER_FREE, task MTX, address 0x3DEAF2D8-0x00000000
0x4038aeb9: free at C:/src/esp-idf/components/newlib/heap.c:39
0x42115543: mesh_free at ??:?
0x421155f1: esp_mesh_free_context at ??:?
0x42112368: esp_mesh_discard_context at ??:?
0x42113225: mesh_tx_task_main at ??:?
0x4038668a: vPortTaskWrapper at C:/src/esp-idf/components/freertos/port/xtensa/port.c:142
@KonssnoK I will check the code
@zhangyanjiaoesp would it be possible to get the fix on top of 4c7d97e2bdbd26b1ad6adc6de8051888e1feec10 ? This way we can check also latest wifi fixes on the field, since we have devices isolating themselves (still investigating on why)
@KonssnoK the wifi lib based on https://github.com/espressif/esp-idf/commit/4c7d97e2bdbd26b1ad6adc6de8051888e1feec10 is here: wifi_lib_0406.zip wifi firmware version: 85cf6e5
To trigger them, instead of doing high traffic, we do parallel firmware updates. 3 devices trigger a firmware update on all 3 devices at the same time be lucky ( 100 ) ... triggering crashes is now quite difficult. We are investigating in order to replicate these crashes.
This issue is still debugging.
@zRedShift Since your issue is not related to wifi mesh, please create a new ticket to report your issue
@KonssnoK the wifi lib based on 4c7d97e is here: wifi_lib_0406.zip wifi firmware version: 85cf6e5
To trigger them, instead of doing high traffic, we do parallel firmware updates. 3 devices trigger a firmware update on all 3 devices at the same time be lucky ( 100 ) ... triggering crashes is now quite difficult. We are investigating in order to replicate these crashes.
This issue is still debugging.
meaning you were able to replicate some other crashes? We are still trying to understand how to trigger them. The new library much more stable!
@KonssnoK No, I didn't reproduce any crash issue, just look at your logs for some ideas. If you find a good way to reproduce the problem, it will speed up the debug process.
@KonssnoK Since the fix has solved the original crash issue, and the new crash issue is hard to reproduce, we will merge the fix into release/v4.4 firstly. Is it OK?
@zhangyanjiaoesp that would be great, so we can move inside esp-idf without having to change wifi-lib with a custom one
i will close this ticket and open a new one when we manage to replicate other crashes
Answers checklist.
IDF version.
v4.4.4-116-g00bb43ff24
Operating System used.
Windows
How did you build your project?
VS Code IDE
If you are using Windows, please specify command line type.
PowerShell
Development Kit.
ESP32S3-WROOM N8R2
Power Supply used.
USB
What is the expected behavior?
As reported in https://github.com/espressif/esp-idf/issues/10992
we are investigating crashes that happen to our devices.
We managed to reduce the triggering conditions to the following:
have 2 mesh devices connecting to MQTT with dynamic settings
Disable all watchdogs, cores + interrupts
make the child device send 1 packet per second with 1024 payload
reset the root
the child device will crash
We enabled comprehensive HEAP tracing
We added a piece of code in heap_caps that logs each time a malloc/free is made
After multiple retries, we see that the address that is corrupted is touched mainly by MTXON
We isolated the logs to MTXON mallocs.
Please note that we don't know what other tasks MTXON works with, so there might be FREEs that are allocated by other tasks (maybe TCPIP)!
Here is the result: