Open andylinpersonal opened 9 months ago
Extra log files: logs.tar.gz
esp_rom_printf
, it seems like S3 hanged near esp_core_dump_replace_sp()
and xthal_window_spill()
. At least esp_rom_printf
cannot print anything anymore after xthal_window_spill()
.Any update? Thanks.
@andylinpersonal Thanks for reporting the issue. We have some findings regarding stack replacement on both xtensa and riscv chips. I will wrap up what we did and explain here next week.
Regarding overflow scenarios, I am sorry, I couldn't find a time to investigate yet. Will check it soon.
@erhankur Any follow up of https://github.com/espressif/esp-idf/issues/13219#issuecomment-2027991578 ?
Title
Core dump failed on overflowed tasks due to excessive stack usage of panic handler
IDF version.
release/v5.1 7380f96 release/v5.2 93ea06f master c460e1c
Espressif SoC revision.
esp32s3: v0.1 esp32c3: v0.4 esp32c6: v0.0
Operating System used.
Linux
How did you build your project?
Command line with idf.py
Development Kit.
esp32c3: LuatOS CORE-ESP32-C3 (custom esp32-c3 with DIO mode flash) esp32c6: esp32-c6-devkitc-1-n8 esp32s3: esp32-s3-devkitc-1-n32r8v
Power Supply used.
USB
What is the expected behavior?
Core dump should be reliably performed, even for the overflowed tasks.
Following is the normal core dump flow:
What is the actual behavior?
espcoredump
crashed randomly when the original task was overflowed.After reserving a large space before the stack, most configurations can perform core dump successfully.
To proof and check the memory region polluted by the overflowed panic handler and the core dump function, I have placed a second canary watchpoint below the first one. It seems that reserving some space under the end of stack could be a possible workaround as the second watchpoint will not be triggered duing core dump if the space is sufficiently large.
If the reserved space is way too small, we'll see this:
If the reserved space is slightly smaller than required space, we'll see this:
After loading the core dump data by
idf.py coredump-debug
and examine withgdb
, we can find the reserved space was modified by the panic handler and core dump code.Possible fix:
_xt_panic
and_panic_handler
only do the minimum works and run the real panic handler on a dedicated emergency stack, just like the core dump do.Additional note for
S3, -O0
:Steps to reproduce.
Reproduce:
sdkconfig.defaults
.CONFIG_ESP32_ENABLE_COREDUMP_TO_FLASH=y
CONFIG_FREERTOS_WATCHPOINT_END_OF_STACK=y
CONFIG_ESP_COREDUMP_STACK_SIZE=1536
CONFIG_ESP32_CORE_DUMP_STACK_SIZE=1536
crash
and carefully overflow the victim task by recursively calling a small function.espcoredump
crashed again :(Examine:
#define ENABLE_RESERVED_SPACE 1
tomain/main.c
crash
.ESP_ROM_ELF_DIR=~/.espressif/tools/esp-rom-elfs/20230320/ idf.py coredump-debug
.p task_stack
and see the content of the stack ingdb
.task_stack
.esp_core_dump_write_internal()
Debug Logs.
Log messages without workaround: (Excerpted from
*without-workaround.log
)v5.1, S3, -Og
:v5.2, C6, -Og
:v5.2, S3, -Os
:v5.3, C6, -O0
:v5.3, C6, -Og
:v5.3, S3, -O0
:Log messages with workaround:
with-workaround/${IDF_VER}-${TARGET}-${OPT}-log-with-workaround.log
.Stack content of overflowed task with workaround applied:
v5.3, C6, -O0
Other dumped stacks have been excerpted to
with-workaround/${IDF_VER}-${TARGET}-${OPT}-corredump-brief.log
Missing logs of
C6, release/v5.2
due to https://github.com/espressif/esp-idf/issues/13197.More Information.
Related bug:
espcordump
regardless of ESP-IDF version when processing the overflowed task. Not yet found the possible cause.Followings are the test code.
main/main.c
main/CMakeLists.txt
CMakeLists.txt
sdkconfig.defaults