espressif / esp-idf

Espressif IoT Development Framework. Official development framework for Espressif SoCs.
Apache License 2.0
13.45k stars 7.25k forks source link

Watchdog report error triggered on OTA (IDFGH-11879) #12965

Open Pail19 opened 8 months ago

Pail19 commented 8 months ago

Answers checklist.

IDF version.

release/v5.0.4

Espressif SoC revision.

ESP32-S3

Operating System used.

Windows

How did you build your project?

VS Code IDE

If you are using Windows, please specify command line type.

None

Development Kit.

ESP32-S3-WROOM-N8R2

Power Supply used.

External 3.3V

What is the expected behavior?

OTA时触发看门狗报告错误

What is the actual behavior?

Customers using https to get the data, and then through the partition of esp_ota_write write, found that it will trigger the watchdog report error

Steps to reproduce.

/

Debug Logs.

Backtrace解析出来如下:
0x403885e7: spinlock_acquire at D:/work/esp_idf/5.0.4/esp-idf/components/esp_hw_support/include/spinlock.h:112
(inlined by) xPortEnterCriticalTimeout at D:/work/esp_idf/5.0.4/esp-idf/components/freertos/FreeRTOS-Kernel/portable/xtensa/port.c:306
0x4038b3c4: vPortEnterCritical at D:/work/esp_idf/5.0.4/esp-idf/components/freertos/FreeRTOS-Kernel/portable/xtensa/include/freertos/portmacro.h:573
(inlined by) xTaskPriorityDisinherit at D:/work/esp_idf/5.0.4/esp-idf/components/freertos/FreeRTOS-Kernel/tasks.c:4969
0x40388c0d: prvCopyDataToQueue at D:/work/esp_idf/5.0.4/esp-idf/components/freertos/FreeRTOS-Kernel/queue.c:2252
0x403891cc: xQueueGenericSend at D:/work/esp_idf/5.0.4/esp-idf/components/freertos/FreeRTOS-Kernel/queue.c:861
0x4038937e: xQueueGiveMutexRecursive at D:/work/esp_idf/5.0.4/esp-idf/components/freertos/FreeRTOS-Kernel/queue.c:692
0x4211291f: spi_flash_op_unlock at D:/work/esp_idf/5.0.4/esp-idf/components/spi_flash/cache_utils.c:96
0x403814f8: spi_flash_enable_interrupts_caches_and_other_cpu at D:/work/esp_idf/5.0.4/esp-idf/components/spi_flash/cache_utils.c:246
0x4038255f: cache_enable at D:/work/esp_idf/5.0.4/esp-idf/components/spi_flash/spi_flash_os_func_app.c:69
0x4038256a: spi1_end at D:/work/esp_idf/5.0.4/esp-idf/components/spi_flash/spi_flash_os_func_app.c:132
0x403850fd: spiflash_end_default at D:/work/esp_idf/5.0.4/esp-idf/components/spi_flash/esp_flash_api.c:138
0x4038225b: esp_flash_write at D:/work/esp_idf/5.0.4/esp-idf/components/spi_flash/esp_flash_api.c:933
0x42171309: esp_partition_write at D:/work/esp_idf/5.0.4/esp-idf/components/esp_partition/partition_target.c:75
0x4211253f: esp_ota_write at D:/work/esp_idf/5.0.4/esp-idf/components/app_update/esp_ota_ops.c:252
0x420296d1: OtaTask at D:/work/code/wirelessthermometerbase/gattc_gatts_coex/components/wifi_process/src/wifi_process.c:4329
0x40388439: vPortTaskWrapper at D:/work/esp_idf/5.0.4/esp-idf/components/freertos/FreeRTOS-Kernel/portable/xtensa/port.c:149

More Information.

No response

KonstantinKondrashov commented 8 months ago

Hi @Pail19! We have a fix for the master that will be merged soon. So I will provide the backport for 5.0 as well. For 5.0 it needs to be changed a bit.

KonstantinKondrashov commented 8 months ago

Hi @Pail19! Could you try this patch 28447.zip?

zrb2002 commented 8 months ago

According to the patch file you provided, the problem still exists. The specific logs are as follows

打补丁后升级Crash.txt

Jacques-Zhao commented 8 months ago

@zrb2002 Can you provide us with a demo that can reproduce the problem? We need further analysis

zrb2002 commented 8 months ago

@Jacques-Zhao sorry,at present, it is reproduced in combination with the entire product business, and the ota function also needs to connect with the server. This hardly provides an example where the problem can be reproduced. I don't know what combination of features caused it to restart, is it normal for you to use this patch file?

KonstantinKondrashov commented 8 months ago

Hi @Pail19 @zrb2002! There was an initial though about the log (https://github.com/espressif/esp-idf/issues/12965#issue-2077957976), that it is IPC related issue. But when you applied the fix I check it one more time and came to conclusion that there is no such the issue, so the fix did not work at all for your case.

Can I ask you to use your code (without fix that I gave) with CONFIG_INT_WDT=n. We think that for some reason ISRs are disabled on cores for a long time this is why we are getting Interrupt wdt timeout. It can help to exclude some other cases that we are thinking. BTW, Do you use any High ISRs? or your code disables ISRs?

zrb2002 commented 8 months ago

Hi @KonstantinKondrashov modify configuration CONFIG_INT_WDT=n,after turning off the interrupt watchdog, there was a restart phenomenon:

ESP-ROM:esp32s3-20210327 Build:Mar 27 2021 rst:0x7 (TG0WDT_SYS_RST),boot:0x8 (SPI_FAST_FLASH_BOOT) Saved PC:0x4038c25e 0x4038c25e: vListInsert at D:/work/esp_idf/5.0.4/esp-idf/components/freertos/FreeRTOS-Kernel/list.c:173 (discriminator 1)

Yes,I used ISR

zrb2002 commented 8 months ago

@KonstantinKondrashov This is the profile after I turned off the watchdog interrupt

sdkconfig.txt

AxelLin commented 4 months ago

@zrb2002 Do you still hit the issue with recent esp-idf version? @KonstantinKondrashov How is the status of this issue now? (The issue is still open but marked as Status :Done?)

KonstantinKondrashov commented 4 months ago

@AxelLin This issue is still OPENED.

@zrb2002 The original log does not give to much information why it happens. The log from this file 打补丁后升级Crash.txt says a bit more. Seems you need to increase the size of IPC stack (CONFIG_ESP_IPC_TASK_STACK_SIZE) because Core0 paniced (Interrupt wdt timeout on CPU0) here ->
https://github.com/espressif/esp-idf/blob/v5.0.4/components/freertos/FreeRTOS-Kernel/list.c#L148-L177. I guess the reason is a stack overflow.

AxelLin commented 2 months ago

@zrb2002 The original log does not give to much information why it happens. The log from this file 打补丁后升级Crash.txt says a bit more. Seems you need to increase the size of IPC stack (CONFIG_ESP_IPC_TASK_STACK_SIZE) because Core0 paniced (Interrupt wdt timeout on CPU0) here -> https://github.com/espressif/esp-idf/blob/v5.0.4/components/freertos/FreeRTOS-Kernel/list.c#L148-L177. I guess the reason is a stack overflow.

I don't get it. If the ESP_IPC_TASK_STACK_SIZE is too small, it should be stack overflow rather than Interrupt wdt timeout on CPU0. In additional, the ESP_IPC_TASK_STACK_SIZE=2048 is the default setting, it should be fine.