espressif / esp-idf

Espressif IoT Development Framework. Official development framework for Espressif SoCs.
Apache License 2.0
13.7k stars 7.29k forks source link

tight loop in spi_flash triggering WDT (IDFGH-4238) #6096

Open mzimmers opened 3 years ago

mzimmers commented 3 years ago

Environment

Problem Description

A tight loop in spi_flash component triggers WDT. The loop can be found at: C:\esp-idf\components\spi_flash\cache_utils.c:141-144.

Expected Behavior

Should return before WDT triggers. I'd recommend a small delay in this loop. (vTaskDelay(1);)

Actual Behavior

Triggers WDT on occasion.

Steps to reproduce

// If possible, attach a picture of your setup/wiring here.

Code to reproduce this issue

Debug Logs

E (23767) task_wdt: Task watchdog got triggered. The following tasks did not reset the watchdog in time:
E (23767) task_wdt:  - IDLE0 (CPU 0)
E (23767) task_wdt: Tasks currently running:
E (23767) task_wdt: CPU 0: Wifi
E (23767) task_wdt: CPU 1: IDLE1
E (23767) task_wdt: Aborting.
abort() was called at PC 0x400dce37 on core 0

ELF file SHA256: 1b5c05c3dc5a8fcf

Backtrace: 0x4008a1bd:0x3ffb0660 0x4008a579:0x3ffb0680 0x400dce37:0x3ffb06a0 0x4008507d:0x3ffb06c0 0x400856f3:0x3ffe1fa0 0x40086571:0x3ffe1fc0 0x40089f71:0x3ffe1fe0 0x4008637b:0x3ffe2000 0x4008655d:0x3ffe2030 0x4011b9d5:0x3ffe2050 0x4011a3d1:0x3ffe2070 0x4011a7a9:0x3ffe2090 0x401171cd:0x3ffe2100 0x40119960:0x3ffe2150 0x4011b8b3:0x3ffe21b0 0x40116719:0x3ffe21d0 0x40116ac9:0x3ffe2200 0x400e43c1:0x3ffe2220 0x400f8846:0x3ffe2370 0x400f8ab5:0x3ffe23c0 0x4008b831:0x3ffe23e0

Other items if possible

Alvin1Zhang commented 3 years ago

Thanks for reporting, we will look into.

vanwykc commented 3 years ago

Any update / workaround on this issue?

I'm experiencing this same issue. This exact same loop is the reason for random BtController / WiFi / fopen / fclose etc Task WDT issues prohibiting me from getting a product production ready for over a week now.

I'm using ESP-IDF 4.2.1.

This issue is occurring very frequently under the following conditions: Due to un-alterable high internal RAM usage, I was forced to choose the option to run WiFi and BLE in external RAM as can be seen in this sdkconfig.zip. I thought that this might happen due to bad SPI lock handling between the two CPUs so I tried to force better lock control in the esp-idf code by changing the SDK config such that SPI1 is also used by some other external SPI device (which it physically isn't), but that also didn't reduce the frequency of this specific Task WDT issue.

See this following example CPU halt printout:

W (67767) heap_watch_task: heap low 4137592 free: 4140812 dif -108
GATT procedure initiated: notify; att_handle=20
GATT procedure initiated: notify; att_handle=30
GATT procedure initiated: notify; att_handle=17
GATT procedure initiated: notify; att_handle=20
GATT procedure initiated: notify; att_handle=30
GATT procedure initiated: notify; att_handle=17
I (69357) No_RTC_Ttemp_Task: resume at 832
W (69767) heap_watch_task: heap low 4137592 free: 4140088 dif -724
GATT procedure initiated: notify; att_handle=20
I (73248) NewBtnState::  1
I (73248) rest_connection_stat: press 255
GATT procedure initiated: notify; att_handle=44
I (73518) NewBtnState::  0
E (74788) task_wdt: Task watchdog got triggered. The following tasks did not reset the watchdog in time:
E (74788) task_wdt:  - IDLE0 (CPU 0)
E (74788) task_wdt: Tasks currently running:
E (74788) task_wdt: CPU 0: No_RTC_Ttemp_Ta
E (74788) task_wdt: CPU 1: IDLE1
E (74788) task_wdt: Print CPU 0 (current core) backtrace

Backtrace:0x400D5AEC:0x3FFBEE70 0x400841B1:0x3FFBEE90 0x4008B7C3:0x3FFEDCF0 0x4008C5F9:0x3FFEDD10 0x4008C605:0x3FFEDD30 0x400910DD:0x3FFEDD50 0x4008C451:0x3FFEDD70 0x4011651F:0x3FFEDDA0 0x400EECAD:0x3FFEDDD0 0x400EF1CD:0x3FFEDE00 0x400EF41A:0x3FFEDE30 0x400EF4F1:0x3FFEDEA0 0x400F10B5:0x3FFEDF10 0x400F0DF2:0x3FFEDFD0 0x400F15D5:0x3FFEE090 0x400F2074:0x3FFEE0D0 0x400EDFCE:0x3FFEE0F0 0x400D3813:0x3FFEE160 0x4018FB84:0x3FFEE180 0x4018FC2D:0x3FFEE1B0 0x400DF8D9:0x3FFEE1D0 0x400E0C03:0x3FFEE210 0x400D8B41:0x3FFEE300 
0x400d5aec: task_wdt_isr at /home/charl/esp/esp-idf/components/esp_common/src/task_wdt.c:187

0x400841b1: _xt_lowint1 at /home/charl/esp/esp-idf/components/freertos/xtensa/xtensa_vectors.S:1105

0x4008b7c3: spi_flash_disable_interrupts_caches_and_other_cpu at /home/charl/esp/esp-idf/components/spi_flash/cache_utils.c:147 (discriminator 1)

0x4008c5f9: cache_disable at /home/charl/esp/esp-idf/components/spi_flash/spi_flash_os_func_app.c:57

0x4008c605: spi1_start at /home/charl/esp/esp-idf/components/spi_flash/spi_flash_os_func_app.c:85

0x400910dd: spiflash_start at /home/charl/esp/esp-idf/components/spi_flash/esp_flash_api.c:77

0x4008c451: esp_flash_read at /home/charl/esp/esp-idf/components/spi_flash/esp_flash_api.c:546

0x4011651f: esp_partition_read at /home/charl/esp/esp-idf/components/spi_flash/partition.c:412

0x400eecad: littlefs_api_read at /home/charl/esp/esp-idf/components/esp_littlefs/src/littlefs_api.c:22

0x400ef1cd: lfs_bd_read at /home/charl/esp/esp-idf/components/esp_littlefs/src/littlefs/lfs.c:99
 (inlined by) lfs_bd_read at /home/charl/esp/esp-idf/components/esp_littlefs/src/littlefs/lfs.c:27

0x400ef41a: lfs_dir_traverse at /home/charl/esp/esp-idf/components/esp_littlefs/src/littlefs/lfs.c:756

0x400ef4f1: lfs_dir_traverse at /home/charl/esp/esp-idf/components/esp_littlefs/src/littlefs/lfs.c:786

0x400f10b5: lfs_dir_compact at /home/charl/esp/esp-idf/components/esp_littlefs/src/littlefs/lfs.c:1579

0x400f0df2: lfs_dir_commit at /home/charl/esp/esp-idf/components/esp_littlefs/src/littlefs/lfs.c:1959

0x400f15d5: lfs_file_rawsync at /home/charl/esp/esp-idf/components/esp_littlefs/src/littlefs/lfs.c:2850
 (inlined by) lfs_file_rawsync at /home/charl/esp/esp-idf/components/esp_littlefs/src/littlefs/lfs.c:2814

0x400f2074: lfs_file_sync at /home/charl/esp/esp-idf/components/esp_littlefs/src/littlefs/lfs.c:5145

0x400edfce: vfs_littlefs_open at /home/charl/esp/esp-idf/components/esp_littlefs/src/esp_littlefs.c:914

0x400d3813: esp_vfs_open at /home/charl/esp/esp-idf/components/vfs/vfs.c:388 (discriminator 3)

0x4018fb84: _fopen_r at /builds/idf/crosstool-NG/.build/xtensa-esp32-elf/src/newlib/newlib/libc/stdio/fopen.c:129

0x4018fc2d: fopen at /builds/idf/crosstool-NG/.build/xtensa-esp32-elf/src/newlib/newlib/libc/stdio/fopen.c:168

0x400df8d9: file_sys_generic_write at /home/charl/Documents/Work/Git/Motorscope/MS4_ESP32_Full/build/../main/scr/module/file_sys.cpp:2065
 (inlined by) file_sys_generic_write at /home/charl/Documents/Work/Git/Motorscope/MS4_ESP32_Full/build/../main/scr/module/file_sys.cpp:2045

0x400e0c03: file_sys_write_generic_redundant at /home/charl/Documents/Work/Git/Motorscope/MS4_ESP32_Full/build/../main/scr/module/file_sys.cpp:2206

I am very frustrated by this as it is randomly either fclose, wifi, fopen, btcore etc, but every time the core tight loop is at cache_utils.c:147

Please help asap

AxelLin commented 2 years ago

Thanks for reporting, we will look into.

@Alvin1Zhang Any progress?

fromeijn commented 1 year ago

im still seeing the same problem in v4.3.4 when opening a (second) file with fopen

0x40081b9e: panic_abort at ~/esp-idf/components/esp_system/panic.c:404
0x40092d45: esp_system_abort at ~/esp-idf/components/esp_system/system_api.c:112
0x4009844a: abort at ~/esp-idf/components/newlib/abort.c:46
0x40159010: task_wdt_isr at ~/esp-idf/components/esp_common/src/task_wdt.c:182 (discriminator 1)
0x40083db1: _xt_lowint1 at ~/esp-idf/components/freertos/port/xtensa/xtensa_vectors.S:1105
0x4008b932: spi_flash_disable_interrupts_caches_and_other_cpu at ~/esp-idf/components/spi_flash/cache_utils.c:160 (discriminator 1)
0x4008c825: cache_disable at ~/esp-idf/components/spi_flash/spi_flash_os_func_app.c:70
0x4008c831: spi1_start at ~/esp-idf/components/spi_flash/spi_flash_os_func_app.c:99
0x40090cbd: spiflash_start_default at ~/esp-idf/components/spi_flash/esp_flash_api.c:111
0x4008c641: esp_flash_read at ~/esp-idf/components/spi_flash/esp_flash_api.c:691
0x40150e5f: esp_partition_read at ~/esp-idf/components/spi_flash/partition.c:419
0x400ffaa3: spiffs_api_read at ~/esp-idf/components/spiffs/spiffs_api.c:36
0x40102ee9: spiffs_phys_rd at ~/esp-idf/components/spiffs/spiffs/src/spiffs_cache.c:161
0x40100ec3: spiffs_obj_lu_find_entry_visitor at ~/esp-idf/components/spiffs/spiffs/src/spiffs_nucleus.c:166
0x40102c15: spiffs_obj_lu_find_free_obj_id at ~/esp-idf/components/spiffs/spiffs/src/spiffs_nucleus.c:2212
0x40100006: SPIFFS_open at ~/esp-idf/components/spiffs/spiffs/src/spiffs_hydrogen.c:250
0x400ff80d: vfs_spiffs_open at ~/esp-idf/components/spiffs/esp_spiffs.c:508
0x400d6153: esp_vfs_open at ~/esp-idf/components/vfs/vfs.c:376 (discriminator 3)
0x401c629c: _fopen_r at /builds/idf/crosstool-NG/.build/xtensa-esp32-elf/src/newlib/newlib/libc/stdio/fopen.c:129
0x401c6345: fopen at /builds/idf/crosstool-NG/.build/xtensa-esp32-elf/src/newlib/newlib/libc/stdio/fopen.c:168
igrr commented 1 year ago

@fromeijn could you please attach your sdkconfig file?

fromeijn commented 1 year ago

Thanks igrr for looking in to this, see attached sdkconfig.log

igrr commented 1 year ago
CONFIG_SPI_FLASH_YIELD_DURING_ERASE=y
CONFIG_SPI_FLASH_ERASE_YIELD_DURATION_MS=20
CONFIG_SPI_FLASH_ERASE_YIELD_TICKS=1
CONFIG_SPI_FLASH_WRITE_CHUNK_SIZE=8192

It seems that the options required to yield from flash operations are enabled, so it's odd that the task watchdog is triggering.

Do you think you could reduce your application to a smaller example which reproduces this issue? This would help us recreate the issue on our end. Also, how frequently does the issue happen, or how much time it takes to reproduce it?

AxelLin commented 1 year ago
CONFIG_SPI_FLASH_YIELD_DURING_ERASE=y
CONFIG_SPI_FLASH_ERASE_YIELD_DURATION_MS=20
CONFIG_SPI_FLASH_ERASE_YIELD_TICKS=1
CONFIG_SPI_FLASH_WRITE_CHUNK_SIZE=8192

It seems that the options required to yield from flash operations are enabled, so it's odd that the task watchdog is triggering.

The original issue report (https://github.com/espressif/esp-idf/issues/6096#issue-739336614) already has above settings (see the files.zip). This issue has been there for years.

AxelLin commented 1 year ago

https://github.com/espressif/esp-idf/issues/4984#issuecomment-1156445364

AxelLin commented 1 year ago
CONFIG_SPI_FLASH_YIELD_DURING_ERASE=y
CONFIG_SPI_FLASH_ERASE_YIELD_DURATION_MS=20
CONFIG_SPI_FLASH_ERASE_YIELD_TICKS=1
CONFIG_SPI_FLASH_WRITE_CHUNK_SIZE=8192

It seems that the options required to yield from flash operations are enabled, so it's odd that the task watchdog is triggering.

FYI, https://github.com/espressif/esp-idf/issues/11032#issuecomment-1560531918 has the same settings as above, but still hit WDT.

AxelLin commented 1 year ago

@igrr @Alvin1Zhang Do you have any progress for this issue? (This was reported since 2020, and there are several related reports linked in above comments)

grginho commented 1 year ago

Hello everyone,

I'm also experiencing the watchdog timer getting triggered in read and fopen functions, using the LittleFS file system. Backtrace says that the code is stuck at esp-idf/components/spi_flash/cache_utils.c::167 while (!s_flash_op_can_start) { // Busy loop and wait for spi_flash_op_block_func to disable cache // on the other CPU } Regards,

packn366 commented 1 year ago

I had a similar experience. Reading NVS causes WDT elapsed at "while (!s_flash_op_can_start)". What I found is it happens much less times after I updated IDF from 4.4 to 5.1.