Open chipweinberger opened 1 year ago
I see there have been a lot of changes to void task_wdt_isr(void *arg)
in v5.0, so perhaps this work is already done?
see: https://github.com/espressif/esp-idf/commit/e25cda2c400b28362436b3ae01b969386892ab8e
For anyone else on v4.4, a simple workaround is to disable CONFIG_ESP_TASK_WDT_PANIC
in KCondig
.
This will get you a stacktrace if you really need one. You could also look at the coredump.
I still have the same problem with v5.1
===============================================================
==================== ESP32 CORE DUMP START ====================
Crashed task handle: 0x3ffc3ce4, name: 'IDLE', GDB name: 'process 1073495268'
================== CURRENT THREAD REGISTERS ===================
exccause 0x400833ec (Invalid EXCCAUSE code)
excvaddr 0x4008f3b6
epc1 0x400e2dbd
epc2 0x0
epc3 0x0
epc4 0x0
epc5 0x4008f2ba
epc6 0x0
eps2 0x0
eps3 0x0
eps4 0x0
eps5 0x60023
eps6 0x0
pc 0x4000bff0 0x4000bff0
lbeg 0x0 0
lend 0x0 0
lcount 0x0 0
sar 0x1d 29
ps 0x60020 393248
threadptr <unavailable>
br <unavailable>
scompare1 <unavailable>
acclo <unavailable>
acchi <unavailable>
m0 <unavailable>
m1 <unavailable>
m2 <unavailable>
m3 <unavailable>
expstate <unavailable>
f64r_lo <unavailable>
f64r_hi <unavailable>
f64s <unavailable>
fcr <unavailable>
fsr <unavailable>
a0 0x80094dcb -2146873909
a1 0x3ffc3b80 1073494912
a2 0x0 0
a3 0x60023 393251
a4 0x60020 393248
a5 0x3f403828 1061173288
a6 0xb33fffff -1287651329
a7 0xb33fffff -1287651329
a8 0x0 0
a9 0x1 1
a10 0x7b096c 8063340
a11 0x3ffb096c 1073416556
a12 0x60423 394275
a13 0x3f403828 1061173288
a14 0xb33fffff -1287651329
a15 0xb33fffff -1287651329
==================== CURRENT THREAD STACK =====================
#0 0x4000bff0 in ?? ()
#1 0x40094dcb in vApplicationGetIdleTaskMemory (ppxIdleTaskTCBBuffer=0x3ffb5fac <port_uxOldInterruptState+4>, pp
xIdleTaskStackBuffer=0x3ffc3bc0, pulIdleTaskStackSize=0x4) at /home/dizcza/tools/esp-idf/components/freertos/port
_common.c:52
#2 0x400920b4 in find_entry_from_task_handle_and_check_all_reset (handle=0x0, all_reset=0x60023) at /home/dizcza
/tools/esp-idf/components/esp_system/task_wdt/task_wdt.c:146
#3 0x4009213a in prvCopyDataToQueue (pxQueue=0x40092134 <prvCopyDataToQueue>, pvItemToQueue=0x60020, xPosition=4
) at /home/dizcza/tools/esp-idf/components/freertos/FreeRTOS-Kernel/queue.c:2356
#4 0x40092019 in esp_task_wdt_reset () at /home/dizcza/tools/esp-idf/components/esp_system/task_wdt/task_wdt.c:7
83
#5 0x40093093 in prvCheckTasksWaitingTermination () at /home/dizcza/tools/esp-idf/components/freertos/FreeRTOS-K
ernel/tasks.c:4567
#6 prvIdleTask (pvParameters=<optimized out>) at /home/dizcza/tools/esp-idf/components/freertos/FreeRTOS-Kernel/
tasks.c:4231
#7 0x40094b4d in uxInitialiseStackTLS (ret_threadptr_reg_init=<optimized out>, uxStackPointer=<optimized out>) a
t /home/dizcza/tools/esp-idf/components/freertos/FreeRTOS-Kernel/portable/xtensa/port.c:263
#8 pxPortInitialiseStack (pxTopOfStack=<optimized out>, pxCode=0x0, pvParameters=0x0) at /home/dizcza/tools/esp-
idf/components/freertos/FreeRTOS-Kernel/portable/xtensa/port.c:450
======================== THREADS INFO =========================
Id Target Id Frame
* 1 process 1073495268 0x4000bff0 in ?? ()
2 process 1073491500 0x4016c7ef in _strtod_l (ptr=0x0, s00=<optimized out>, se=0x3ffc2ab0, loc=<optimized ou
t>) at /builds/idf/crosstool-NG/.build/xtensa-esp32-elf/src/newlib/newlib/libc/stdlib/strtod.c:811
3 process 1073493384 0x4000bff0 in ?? ()
4 process 1073544832 0x4000bff0 in ?? ()
5 process 1073521872 0x40087f84 in ipc_task (arg=<optimized out>) at /home/dizcza/tools/esp-idf/components/e
sp_system/esp_ipc.c:76
6 process 1073411116 0x4000bff0 in ?? ()
7 process 1073412488 0x4000bff0 in ?? ()
8 process 1073548800 0x4000bff0 in ?? ()
9 process 1073537224 0x4000bff0 in ?? ()
10 process 1073527212 0x4000bff0 in ?? ()
11 process 1073412832 0x4000bff0 in ?? ()
12 process 1073558820 0x4000bff0 in ?? ()
==================== THREAD 1 (TCB: 0x3ffc3ce4, name: 'IDLE') =====================
#0 0x4000bff0 in ?? ()
#1 0x40094dcb in vApplicationGetIdleTaskMemory (ppxIdleTaskTCBBuffer=0x3ffb5fac <port_uxOldInterruptState+4>, pp
xIdleTaskStackBuffer=0x3ffc3bc0, pulIdleTaskStackSize=0x4) at /home/dizcza/tools/esp-idf/components/freertos/port
_common.c:52
#2 0x400920b4 in find_entry_from_task_handle_and_check_all_reset (handle=0x0, all_reset=0x60023) at /home/dizcza
/tools/esp-idf/components/esp_system/task_wdt/task_wdt.c:146
#3 0x4009213a in prvCopyDataToQueue (pxQueue=0x40092134 <prvCopyDataToQueue>, pvItemToQueue=0x60020, xPosition=4
) at /home/dizcza/tools/esp-idf/components/freertos/FreeRTOS-Kernel/queue.c:2356
#4 0x40092019 in esp_task_wdt_reset () at /home/dizcza/tools/esp-idf/components/esp_system/task_wdt/task_wdt.c:7
83
#5 0x40093093 in prvCheckTasksWaitingTermination () at /home/dizcza/tools/esp-idf/components/freertos/FreeRTOS-K
ernel/tasks.c:4567
#6 prvIdleTask (pvParameters=<optimized out>) at /home/dizcza/tools/esp-idf/components/freertos/FreeRTOS-Kernel/
tasks.c:4231
#7 0x40094b4d in uxInitialiseStackTLS (ret_threadptr_reg_init=<optimized out>, uxStackPointer=<optimized out>) a
t /home/dizcza/tools/esp-idf/components/freertos/FreeRTOS-Kernel/portable/xtensa/port.c:263
#8 pxPortInitialiseStack (pxTopOfStack=<optimized out>, pxCode=0x0, pvParameters=0x0) at /home/dizcza/tools/esp-
idf/components/freertos/FreeRTOS-Kernel/portable/xtensa/port.c:450
The backtrace is not informative
I expected to see that the sdprecord_read_sensor_task
WDT is triggered at line
ulTaskNotifyTake(pdTRUE, portMAX_DELAY);
Or at least the name of the task. Not IDLE.
In the coredump I showed, there is no such information whatsoever.
Interesting. looking at the code I thought there were improvements in v5.1.
I've reopened. That's too bad.
@dizcza could you please check the ELF file SHA256 printed by your app at startup, and compare it to the SHA256 of the ELF file passed to espcoredump.py?
Unreasonable backtraces like the one you have posted (e.g. prvCheckTasksWaitingTermination most definitely doesn't call esp_task_wdt_reset) are most often the result of a mismatch between the ELF file used to produce the application binary and the ELF file used when decoding the core dump.
@dizcza could you please check the ELF file SHA256 printed by your app at startup, and compare it to the SHA256 of the ELF file passed to espcoredump.py?
Unreasonable backtraces like the one you have posted (e.g. prvCheckTasksWaitingTermination most definitely doesn't call esp_task_wdt_reset) are most often the result of a mismatch between the ELF file used to produce the application binary and the ELF file used when decoding the core dump.
Indeed, the SHA differs:
I (1458) cpu_start: ELF file SHA256: 99af209085100ddf...
I (08:50:19.357) coredump: App SHA:
I (08:50:19.360) coredump: 33 65 33 62 64 65 36 30 66 36 30 64 32 30 39 65
The coredump app SHA is taken from the app_elf_sha256
of the coredump summary.
After I erased the flash entirely and did nothing to the app I got yet another coredump SHA:
I (00:00:01.116) coredump: App SHA:
I (00:00:01.120) coredump: 39 39 61 66 32 30 39 30 38 35 31 30 30 64 64 66
In both cases, the coredump info is like I've shown above.
My coredump procedure is the following. First, I set up the partitions like so:
nvs, data, nvs, , 0x4000,
otadata, data, ota, , 0x2000,
phy_init, data, phy, , 0x1000,
factory, app, factory, , 0x127000,
ota_0, app, ota_0, , 0x127000,
ota_1, app, ota_1, , 0x127000,
coredump, data, coredump,, 64K,
Then I configured the app to dump backtraces to flash. Upon reset, I read the coredump
partition and copy raw bytes to an SD card mounted to my ESP32 board. Finally, I download this binary file and run
idf.py coredump-info -c COREDUMP.BIN -p /dev/ttyACM0
The code to read a coredump and write it to an SD card:
void coredump_log_sdcard() {
coredump_print_summary();
size_t out_cd_addr = 0;
size_t out_cd_size = 0;
esp_err_t err = esp_core_dump_image_get(&out_cd_addr, &out_cd_size);
bool erase = false;
if (err != ESP_OK) {
ESP_LOGI(TAG, "esp_core_dump_get_summary: %s", esp_err_to_name(err));
erase = true;
} else {
ESP_LOGI(TAG, "core dump addr: %d", out_cd_addr);
ESP_LOGI(TAG, "core dump size: %d", out_cd_size);
}
const char *label = COREDUMP_PARTITION_NAME;
const esp_partition_t* partition = esp_partition_find_first(
ESP_PARTITION_TYPE_DATA, ESP_PARTITION_SUBTYPE_DATA_COREDUMP, label);
if (partition == NULL) {
ESP_LOGI(TAG, "partition '%s' not found", label);
return;
}
if (out_cd_size <= 0) {
ESP_LOGW(TAG, "partition '%s' size: %d", label, out_cd_size);
return;
}
if (erase) {
// erase and exit
esp_partition_erase_range(partition, 0, out_cd_size);
return;
}
FILE* f = sdcard_record_open_file_once("COREDUMP.BIN", "w");
if (f == NULL) {
return;
}
char *buffer = (char*) malloc(COREDUMP_BUFF_SIZE);
long offset = 0;
while (out_cd_size > 0) {
int chunk_size = out_cd_size > COREDUMP_BUFF_SIZE ? COREDUMP_BUFF_SIZE : out_cd_size;
esp_partition_read(partition, offset, buffer, chunk_size);
fwrite(buffer, sizeof(char), chunk_size, f);
offset += chunk_size;
out_cd_size -= chunk_size;
}
fclose(f);
free(buffer);
}
ESP-IDF v5.1-dev-4528-g420ebd208a
Here is the output in the terminal:
E (32754) task_wdt: Task watchdog got triggered. The following tasks/users did not reset the watchdog in time:
E (32754) task_wdt: - sdp_read (CPU 1)
E (32754) task_wdt: Tasks currently running:
E (32754) task_wdt: CPU 0: IDLE
E (32754) task_wdt: CPU 1: IDLE
E (32754) task_wdt: Aborting.
E (32754) task_wdt: Print CPU 1 backtrace
Backtrace: 0x4000bfed:0x3ffc3b60 0x4009402c:0x3ffc3b70 0x4009146d:0x3ffc3b90 0x400914f3:0x3ffc3bd0 0x400913d2:0x3ffc3bf0 0x4009244c:0x3ffc3c10 0x40093dae:0x3ffc3c30
0x4009402c: vPortClearInterruptMaskFromISR at /home/dizcza/tools/esp-idf/components/freertos/FreeRTOS-Kernel/portable/xtensa/include/freertos/portmacro.h:568
(inlined by) vPortExitCritical at /home/dizcza/tools/esp-idf/components/freertos/FreeRTOS-Kernel/portable/xtensa/port.c:532
0x4009146d: esp_task_wdt_reset at /home/dizcza/tools/esp-idf/components/esp_system/task_wdt/task_wdt.c:791
0x400914f3: idle_hook_cb at /home/dizcza/tools/esp-idf/components/esp_system/task_wdt/task_wdt.c:490
0x400913d2: esp_vApplicationIdleHook at /home/dizcza/tools/esp-idf/components/esp_system/freertos_hooks.c:47 (discriminator 1)
0x4009244c: prvIdleTask at /home/dizcza/tools/esp-idf/components/freertos/FreeRTOS-Kernel/tasks.c:4280
0x40093dae: vPortTaskWrapper at /home/dizcza/tools/esp-idf/components/freertos/FreeRTOS-Kernel/portable/xtensa/port.c:162
ELF file SHA256: 99af209085100ddf
I (5018) esp_core_dump_flash: Save core dump to flash...
I (5024) esp_core_dump_common: Backing up stack @ 0x3ffb27f0 and use core dump stack @ 0x3ffb9930
I (5034) esp_core_dump_flash: Erase flash 20480 bytes @ 0x397000
I (5197) esp_core_dump_flash: Write end offset 0x4d84, check sum length 4
I (5197) esp_core_dump_common: Core dump used 944 bytes on stack. 644 bytes left free.
I (5201) esp_core_dump_common: Restoring stack @ 0x3ffb27f0
I (5208) esp_core_dump_flash: Core dump has been saved to flash.
Rebooting...
Could you please give it a try to read the core dump directly from flash (with idf.py coredump-info
), bypassing the step with the SD card?
Could you please give it a try to read the core dump directly from flash (with
idf.py coredump-info
), bypassing the step with the SD card?
Same story.
Could you try to make a version of the program which you can share, that still reproduces the issue?
I just ran the test application — https://github.com/espressif/esp-idf/tree/master/tools/test_apps/system/panic — with the config from sdkconfig.ci.coredump_flash_elf_sha, and ran the task_wdt
test in that application. The backtrace was decoded correctly.
Could you try to make a version of the program which you can share, that still reproduces the issue?
I just ran the test application — https://github.com/espressif/esp-idf/tree/master/tools/test_apps/system/panic — with the config from sdkconfig.ci.coredump_flash_elf_sha, and ran the
task_wdt
test in that application. The backtrace was decoded correctly.
Change the infinite_loop
function to
static void infinite_loop(void* arg) {
(void) arg;
esp_task_wdt_add(NULL);
ulTaskNotifyTake(pdTRUE, portMAX_DELAY);
while(1) {
;
}
}
run the test_task_wdt_cpu1
test and you'll see similar output as I've shown above. Here is what I get:
Crashed task handle: 0x3ffafd60, name: 'IDLE', GDB name: 'process 1073413472'
==================== CURRENT THREAD STACK =====================
#0 0x4000bff0 in ?? ()
#1 0x4008b0ee in vPortClearInterruptMaskFromISR (prev_level=<optimized out>) at /home/dizcza/tools/esp-idf/components/freertos/FreeRTOS-Kernel/portable/xtensa/include/freertos/portmacro.h:568
#2 vPortExitCritical (mux=<optimized out>) at /home/dizcza/tools/esp-idf/components/freertos/FreeRTOS-Kernel/portable/xtensa/port.c:532
#3 0x400dd33a in esp_task_wdt_reset () at /home/dizcza/tools/esp-idf/components/esp_system/task_wdt/task_wdt.c:791
#4 0x400dd342 in idle_hook_cb () at /home/dizcza/tools/esp-idf/components/esp_system/task_wdt/task_wdt.c:490
#5 0x400d4c47 in esp_vApplicationIdleHook () at /home/dizcza/tools/esp-idf/components/esp_system/freertos_hooks.c:47
#6 0x400ebb7d in prvIdleTask (pvParameters=0x0) at /home/dizcza/tools/esp-idf/components/freertos/FreeRTOS-Kernel/tasks.c:4280
#7 0x4008aaf0 in vPortTaskWrapper (pxCode=0x400ebb74 <prvIdleTask>, pvParameters=0x0) at /home/dizcza/tools/esp-idf/components/freertos/FreeRTOS-Kernel/portable/xtensa/port.c:162
Or this is the expected output and really no way to say which task hasn't yielded to the WDT? Cause I have lots of tasks each guarded by a WDT...
The last backtrace you have posted looks reasonable if you are registering TWDT for another task: the Idle task was running (idling) and at some point got interrupted by TWDT interrupt. In this case it's indeed not possible fo say just from the backtrace, which of the tasks has not fed the watchdog.
The last backtrace you have posted looks reasonable if you are registering TWDT for another task: the Idle task was running (idling) and at some point got interrupted by TWDT interrupt. In this case it's indeed not possible fo say just from the backtrace, which of the tasks has not fed the watchdog.
Sadly indeed.
Well, that was the reason I posted a response here thinking that cases as I described should be more informative.
If it's a won't fix, let it be.
Or let it mark a feature request - don't know if implementing it is possible.
We have just discussed this with the team, it is probably possible to include additional information into the core dump to make such cases easier to analyze. Basically, this part of the text currently printed to console
Task watchdog got triggered. The following tasks did not reset the watchdog in time:
- IDLE (CPU 1)
Tasks currently running:
CPU 0: IDLE
CPU 1: btApiTask
Aborting.
could also be included into the coredump, and extracted by the coredump utility on the host side. In your case, the "following tasks did not reset the watchdog in time" list would include the task which you have registered the task watchdog for.
We probably won't implement this immediately, but let's keep this as an open feature request.
the "following tasks did not reset the watchdog in time" list would include the task which you have registered the task watchdog for.
Nice idea, I like it.
All right.
Is your feature request related to a problem?
ESP-IDF: release/v4.4
Debugging Idle TWDT crashes is harder than it needs to be.
In the example below, we should print the backtrace of
btApiTask
, but instead we get the backtrace of the idle task (much less useful).I think we could do this with a task snapshot. See: https://github.com/espressif/esp-idf/issues/9708