espressif / esp-idf

Espressif IoT Development Framework. Official development framework for Espressif SoCs.
Apache License 2.0
13.63k stars 7.28k forks source link

nvs_flash_init() causes watchdog timeout (IDFGH-13458) #14362

Open owen63 opened 2 months ago

owen63 commented 2 months ago

Answers checklist.

IDF version.

v5.2.1

Espressif SoC revision.

ESP32-S3 v0.2

Operating System used.

Windows

How did you build your project?

VS Code IDE

If you are using Windows, please specify command line type.

CMD

Development Kit.

Custom Board

Power Supply used.

External 3.3V

What is the expected behavior?

nvs_flash_init() should not cause watchdog timeout

What is the actual behavior?

nvs_flash_init() intermittently causes watchdog timeout

Steps to reproduce.

  1. Power up the board
  2. Watchdog fires and dumps backtrace

Debug Logs.

E (326) boot: ota data partition invalid, falling back to factory
I (678) cpu_start: Multicore app
I (687) cpu_start: Pro cpu start user code
I (688) cpu_start: cpu freq: 160000000 Hz
I (688) cpu_start: Application information:
I (691) cpu_start: Project name:     ccu
I (696) cpu_start: App version:      v4.0.0
I (701) cpu_start: Compile time:     Aug 12 2024 11:50:07
I (707) cpu_start: ELF file SHA256:  f280936d011d44ec...
I (713) cpu_start: ESP-IDF:          v5.2.1
I (718) cpu_start: Min chip rev:     v0.0
I (722) cpu_start: Max chip rev:     v0.99
I (727) cpu_start: Chip rev:         v0.2
I (732) heap_init: Initializing. RAM available for dynamic allocation:
I (739) heap_init: At 3FCA76E8 len 00042028 (264 KiB): RAM
I (745) heap_init: At 3FCE9710 len 00005724 (21 KiB): RAM
I (752) heap_init: At 3FCF0000 len 00008000 (32 KiB): DRAM
I (758) heap_init: At 600FE7E8 len 00001800 (6 KiB): RTCRAM
I (765) spi_flash: detected chip: gd
I (768) spi_flash: flash io: dio
W (772) flash_encrypt: Flash encryption mode is DEVELOPMENT (not secure)
W (780) i2c: This driver is an old driver, please migrate your application code to adapt `driver/i2c_master.h`
I (792) i2c_manager: Starting I2C master at port 0.
I (796) i2c_manager: Initialised port 0 (SDA: 14, SCL: 13, speed: 100000 Hz.)
I (804) gpio: GPIO[47]| InputEn: 0| OutputEn: 0| OpenDrain: 0| Pullup: 1| Pulldown: 0| Intr:0
I (813) gpio: GPIO[48]| InputEn: 0| OutputEn: 0| OpenDrain: 0| Pullup: 1| Pulldown: 0| Intr:0
I (823) gpio: GPIO[2]| InputEn: 0| OutputEn: 0| OpenDrain: 0| Pullup: 1| Pulldown: 0| Intr:0
I (832) nvs_sec_provider: NVS Encryption - Registering Flash encryption-based scheme...
I (841) sleep: Configure to isolate all GPIO pins in sleep state
I (847) sleep: Enable automatic switching of GPIO sleep configuration
I (854) main_task: Started on CPU0
I (864) main_task: Calling app_main()
E (5864) task_wdt: Task watchdog got triggered. The following tasks/users did not reset the watchdog in time:
E (5864) task_wdt:  - IDLE0 (CPU 0)
E (5864) task_wdt: Tasks currently running:
E (5864) task_wdt: CPU 0: main
E (5864) task_wdt: CPU 1: IDLE1
E (5864) task_wdt: Print CPU 0 (current core) backtrace
Backtrace: 0x42048713:0x3FC9DFA0 0x403779B1:0x3FC9DFD0 0x42098234:0x3FCACB10 0x4206CD13:0x3FCACB30 0x4206CE1A:0x3FCACB70 0x4206C7EE:0x3FCACB90 0x4204E4AF:0x3FCACBF0 0x420FA8CB:0x3FCACC30 0x4204DD40:0x3FCACC50 0x4204DF56:0x3FCACCE0 0x4204E1BA:0x3FCACD20 0x4204C783:0x3FCACD90 0x4204CEC3:0x3FCACDF0 0x4204CF6E:0x3FCACE20 0x4204ADED:0x3FCACE50 0x4204AE01:0x3FCACE70 0x4204B6CB:0x3FCACE90 0x4200B057:0x3FCACF00 0x42104C07:0x3FCAD040
E (10864) task_wdt: Task watchdog got triggered. The following tasks/users did not reset the watchdog in time:
E (10864) task_wdt:  - IDLE0 (CPU 0)
E (10864) task_wdt: Tasks currently running:
E (10864) task_wdt: CPU 0: main
E (10864) task_wdt: CPU 1: IDLE1
E (10864) task_wdt: Print CPU 0 (current core) backtrace
Backtrace: 0x42048713:0x3FC9DFA0 0x403779B1:0x3FC9DFD0 0x42098234:0x3FCACB10 0x4206CD13:0x3FCACB30 0x4206CE1A:0x3FCACB70 0x4206C7EE:0x3FCACB90 0x4204E4AF:0x3FCACBF0 0x420FA8CB:0x3FCACC30 0x4204DD40:0x3FCACC50 0x4204DF56:0x3FCACCE0 0x4204E1BA:0x3FCACD20 0x4204C783:0x3FCACD90 0x4204CEC3:0x3FCACDF0 0x4204CF6E:0x3FCACE20 0x4204ADED:0x3FCACE50 0x4204AE01:0x3FCACE70 0x4204B6CB:0x3FCACE90 0x4200B057:0x3FCACF00 0x42104C07:0x3FCAD040

(.venv) path\sandpit\SD_3.0_v1>xtensa-esp32-elf-addr2line -pfiaC -e ./ccu.elf 0x42048713:0x3FC9DFA0 0x403779B1:0x3FC9DFD0 0x42098234:0x3FCACB10 0x4206CD13:0x3FCACB30 0x4206CE1A:0x3FCACB70 0x4206C7EE:0x3FCACB90 0x4204E4AF:0x3FCACBF0 0x420FA8CB:0x3FCACC30 0x4204DD40:0x3FCACC50 0x4204DF56:0x3FCACCE0 0x4204E1BA:0x3FCACD20 0x4204C783:0x3FCACD90 0x4204CEC3:0x3FCACDF0 0x4204CF6E:0x3FCACE20 0x4204ADED:0x3FCACE50 0x4204AE01:0x3FCACE70 0x4204B6CB:0x3FCACE90 0x4200B057:0x3FCACF00 0x42104C07:0x3FCAD040

0x42048713: task_wdt_timeout_handling at path/Espressif/esp-idf-v5.2.1/components/esp_system/task_wdt/task_wdt.c:441
 (inlined by) task_wdt_isr at path/Espressif/esp-idf-v5.2.1/components/esp_system/task_wdt/task_wdt.c:515
0x403779b1: _xt_lowint1 at path/Espressif/esp-idf-v5.2.1/components/xtensa/xtensa_vectors.S:1240
0x42098234: aes_ll_get_state at path/Espressif/esp-idf-v5.2.1/components/hal/esp32s3/include/hal/aes_ll.h:115
 (inlined by) aes_hal_wait_done at path/Espressif/esp-idf-v5.2.1/components/hal/aes_hal.c:84
0x4206cd13: esp_aes_dma_wait_complete at path/Espressif/esp-idf-v5.2.1/components/mbedtls/port/aes/dma/esp_aes.c:241
 (inlined by) esp_aes_process_dma at path/Espressif/esp-idf-v5.2.1/components/mbedtls/port/aes/dma/esp_aes.c:458
0x4206ce1a: esp_aes_crypt_ecb at path/Espressif/esp-idf-v5.2.1/components/mbedtls/port/aes/dma/esp_aes.c:735
0x4206c7ee: esp_aes_crypt_xts at path/Espressif/esp-idf-v5.2.1/components/mbedtls/port/aes/esp_aes_xts.c:207
0x4204e4af: nvs::NVSEncryptedPartition::read(unsigned int, void*, unsigned int) at path/Espressif/esp-idf-v5.2.1/components/nvs_flash/src/nvs_encrypted_partition.cpp:58
 (inlined by) nvs::NVSEncryptedPartition::read(unsigned int, void*, unsigned int) at path/Espressif/esp-idf-v5.2.1/components/nvs_flash/src/nvs_encrypted_partition.cpp:34
0x420fa8cb: nvs::Page::readEntry(unsigned int, nvs::Item&) const at path/Espressif/esp-idf-v5.2.1/components/nvs_flash/src/nvs_page.cpp:854
0x4204dd40: nvs::Page::mLoadEntryTable() at path/Espressif/esp-idf-v5.2.1/components/nvs_flash/src/nvs_page.cpp:639
0x4204df56: nvs::Page::load(nvs::Partition*, unsigned long) at path/Espressif/esp-idf-v5.2.1/components/nvs_flash/src/nvs_page.cpp:84
0x4204e1ba: nvs::PageManager::load(nvs::Partition*, unsigned long, unsigned long) at path/Espressif/esp-idf-v5.2.1/components/nvs_flash/src/nvs_pagemanager.cpp:25 (discriminator 1)
0x4204c783: nvs::Storage::init(unsigned long, unsigned long) at path/Espressif/esp-idf-v5.2.1/components/nvs_flash/src/nvs_storage.cpp:95
0x4204cec3: nvs::NVSPartitionManager::init_custom(nvs::Partition*, unsigned long, unsigned long) at path/Espressif/esp-idf-v5.2.1/components/nvs_flash/src/nvs_partition_manager.cpp:92
0x4204cf6e: nvs::NVSPartitionManager::secure_init_partition(char const*, nvs_sec_cfg_t*) at path/Espressif/esp-idf-v5.2.1/components/nvs_flash/src/nvs_partition_manager.cpp:131
0x4204aded: nvs_flash_secure_init_partition at path/Espressif/esp-idf-v5.2.1/components/nvs_flash/src/nvs_api.cpp:175 (discriminator 1)
 (inlined by) nvs_flash_secure_init_partition at path/Espressif/esp-idf-v5.2.1/components/nvs_flash/src/nvs_api.cpp:167 (discriminator 1)
0x4204ae01: nvs_flash_secure_init at path/Espressif/esp-idf-v5.2.1/components/nvs_flash/src/nvs_api.cpp:180
0x4204b6cb: nvs_flash_init at path/Espressif/esp-idf-v5.2.1/components/nvs_flash/src/nvs_api.cpp:155
0x4200b057: app_main at path/ccu/main/main.cpp:933
0x42104c07: main_task at path/Espressif/esp-idf-v5.2.1/components/freertos/app_startup.c:208

More Information.

This has occurred unexpectedly on one device. Am not yet in a position to test on many devices but have not seen this behaviour on my main test board. It happens intermittently. It seems unlikely to be a chip issue but can't replicate it on other chips. It is nvs_flash_init() on encrypted nvs partition. Startup code is:

`extern "C" void app_main(void) {

if CONFIG_RESTART_DEBUG_INFO

// Record stack on regular shutdown - panic handling is done via
// --wrap=esp_panic_handler in CMakeLists.txt
esp_register_shutdown_handler(debugUpdate);
#endif

RestoreFromDeepSleep();

sleepSemaphore = xSemaphoreCreateBinary();
if (sleepSemaphore == NULL) {
    Error(CCUError::AllocError);
}
xSemaphoreGive(sleepSemaphore);

// Initialise nvram partition
esp_err_t err = nvs_flash_init();

`

owen63 commented 2 months ago

Bug happens 1 in 10 times when woken from deep sleep by IO pin. Flash and nvs are encrypted. Minimal reproducible code is:

extern "C" void app_main(void)
{
    // Restore RTC wakeup pins to GPIO
    rtc_gpio_deinit(GPIO_NUM_6);

    // Initialise nvram partition
    // WATCHDOG TIMEOUT HERE SOMETIMES AFTER WAKE FROM SLEEP
    nvs_flash_init();

    // Configure wakeup io pin
    gpio_reset_pin(GPIO_NUM_6);
    gpio_set_direction(GPIO_NUM_6, GPIO_MODE_INPUT);
    gpio_set_pull_mode(GPIO_NUM_6, GPIO_PULLDOWN_ONLY);
    gpio_set_intr_type(GPIO_NUM_6, GPIO_INTR_ANYEDGE);

    // Back to sleep
    uint64_t wakeup_mask = (1UL << GPIO_NUM_6);
    // Reduce deep sleep current
    rtc_gpio_isolate(GPIO_NUM_6);
    esp_sleep_enable_ext1_wakeup(wakeup_mask, ESP_EXT1_WAKEUP_ANY_HIGH);
    ESP_LOGI(TAG, "zzz");
    esp_deep_sleep_start();
}
owen63 commented 2 months ago

Not reproducible after these steps:

  1. Created a test app with app_main() per above. Flashed as unencrypted app on encrypted chip = Invalid header
  2. Enabled flash and nvs encryption on test app and re-flashed. Could not reproduce.
  3. Re-flashed original minimal app. Could not reproduce.
  4. Erased entire flash. Re-flashed original minimal app. Could not reproduce.

Working on reproducing it...

Harshal5 commented 1 month ago

Hi @owen63, were you able to create a reproducible application? I tried the above code snippet but it was not reproducible on my side as well.