espressif / esp-idf

Espressif IoT Development Framework. Official development framework for Espressif SoCs.
Apache License 2.0
13.35k stars 7.21k forks source link

NVS will not init; WDT kickes in before it finishes (IDFGH-5728) #7447

Closed npotts closed 2 years ago

npotts commented 3 years ago

Environment

Problem Description

Given the following, super simple program"

The device blocks at esp_err_t ret = nvs_flash_init(); and the WDT kicks and reset the device later.

Expected Behavior

nvs_flash_init(); to finish quickly and progress

Actual Behavior

System blocks, yield the following logs

Steps to reproduce

  1. make flash monitor

// If possible, attach a picture of your setup/wiring here.

Code to reproduce this issue

#include <string.h>
#include "nvs_flash.h"
#include "esp_log.h"
#include "freertos/FreeRTOS.h"
#include "freertos/task.h"
#include "freertos/event_groups.h"
#include "esp_task_wdt.h"

static const char *TAG = "main";

void init_nvs(void *parameter) {
  // esp_phy_erase_cal_data_in_nvs();
  ESP_LOGI(TAG, "Since the normal routine wont do this... ");
  esp_err_t ret = nvs_flash_init();
  if (ret == ESP_ERR_NVS_NO_FREE_PAGES || ret == ESP_ERR_NVS_NEW_VERSION_FOUND) {
    ESP_ERROR_CHECK(nvs_flash_erase());
    ret = nvs_flash_init();
  }
  ESP_ERROR_CHECK(ret);
  ESP_LOGI(TAG, "NVS inited");  
}

void setup() {
init_nvs(0);

void app_main(void) {
  setup();
  ESP_LOGI(TAG, "[APP] Startup..");
  ESP_LOGI(TAG, "[APP] Free memory: %d bytes", esp_get_free_heap_size());
  ESP_LOGI(TAG, "[APP] IDF version: %s", esp_get_idf_version());

  for(;;) {
      vTaskDelay(1000);
    }
}

// If your code is longer than 30 lines, GIST is preferred.

Debug Logs

rst:0x8 (TG1WDT_SYS_RESET),boot:0x13 (SPI_FAST_FLASH_BOOT)
configsip: 0, SPIWP:0xee
clk_drv:0x00,q_drv:0x00,d_drv:0x00,cs0_drv:0x00,hd_drv:0x00,wp_drv:0x00
mode:DIO, clock div:1
load:0x3fff0030,len:7224
load:0x40078000,len:14980
load:0x40080400,len:4444
entry 0x400806a4
I (54) boot: ESP-IDF v4.4-dev-2594-ga20df743f1 2nd stage bootloader
I (54) boot: compile time 23:32:47
I (54) boot: chip revision: 1
I (59) boot_comm: chip revision: 1, min. bootloader chip revision: 0
I (66) qio_mode: Enabling default flash chip QIO
I (71) boot.esp32: SPI Speed      : 80MHz
I (76) boot.esp32: SPI Mode       : QIO
I (80) boot.esp32: SPI Flash Size : 4MB
W (85) boot.esp32: PRO CPU has been reset by WDT.
W (90) boot.esp32: WDT reset info: PRO CPU PC=0x400841d0
0x400841d0: esp_ipc_isr_call_and_wait at /home/n/Work/tp/esp-idf/components/esp_ipc/src/esp_ipc_isr/esp_ipc_isr.c:208 (discriminator 1)

W (96) boot.esp32: WDT reset info: APP CPU PC=0x40080200
0x40080200: _Level4Vector at /home/n/Work/tp/esp-idf/components/freertos/port/xtensa/xtensa_vectors.S:1623

I (102) boot: Enabling RNG early entropy source...
I (108) boot: Partition Table:
I (112) boot: ## Label            Usage          Type ST Offset   Length
I (119) boot:  0 nvs              WiFi data        01 02 00009000 00006000
I (126) boot:  1 phy_init         RF data          01 01 0000f000 00001000
I (134) boot:  2 factory          factory app      00 00 00010000 00100000
I (142) boot: End of partition table
I (146) boot_comm: chip revision: 1, min. application chip revision: 0
I (153) esp_image: segment 0: paddr=00010020 vaddr=3f400020 size=0872ch ( 34604) map
I (171) esp_image: segment 1: paddr=00018754 vaddr=3ffb0000 size=023dch (  9180) load
I (174) esp_image: segment 2: paddr=0001ab38 vaddr=40080000 size=054e0h ( 21728) load
I (186) esp_image: segment 3: paddr=00020020 vaddr=400d0020 size=1779ch ( 96156) map
I (213) esp_image: segment 4: paddr=000377c4 vaddr=400854e0 size=05e38h ( 24120) load
I (222) esp_image: segment 5: paddr=0003d604 vaddr=50000000 size=00010h (    16) load
I (227) boot: Loaded app from partition at offset 0x10000
I (228) boot: Disabling RNG early entropy source...
I (242) cpu_start: Pro cpu up.
I (242) cpu_start: Starting app cpu, entry point is 0x40080fb8
0x40080fb8: call_start_cpu1 at /home/n/Work/tp/esp-idf/components/esp_system/port/cpu_start.c:156

I (0) cpu_start: App cpu up.
I (256) cpu_start: Pro cpu start user code
I (256) cpu_start: cpu freq: 160000000
I (256) cpu_start: Application information:
I (261) cpu_start: Project name:     app-template
I (266) cpu_start: App version:      v0.0.1-5-g503cf48-dirty
I (273) cpu_start: Compile time:     Aug 19 2021 23:33:23
I (279) cpu_start: ELF file SHA256:  6f3b657249103e41...
I (285) cpu_start: ESP-IDF:          v4.4-dev-2594-ga20df743f1
I (291) heap_init: Initializing. RAM available for dynamic allocation:
I (299) heap_init: At 3FFAE6E0 len 00001920 (6 KiB): DRAM
I (305) heap_init: At 3FFB2CE0 len 0002D320 (180 KiB): DRAM
I (311) heap_init: At 3FFE0440 len 00003AE0 (14 KiB): D/IRAM
I (317) heap_init: At 3FFE4350 len 0001BCB0 (111 KiB): D/IRAM
I (324) heap_init: At 4008B318 len 00014CE8 (83 KiB): IRAM
I (330) spi_flash: detected chip: generic
I (335) spi_flash: flash io: qio
I (339) sleep: Configure to isolate all GPIO pins in sleep state
I (345) sleep: Enable automatic switching of GPIO sleep configuration
I (353) cpu_start: Starting scheduler on PRO CPU.
I (0) cpu_start: Starting scheduler on APP CPU.
I (363) main: Since the normal routine wont do this...

<snip: boot loop>

Other items if possible

npotts commented 3 years ago

All this code has worked (for years) previously. Im sure I am doing somethign wrong, but I am not sure what

0xjakob commented 3 years ago

Hi @npotts. How large is your NVS partition? Could you please try to turn on power management (idf.py menuconfig, then navigate to Component configPower Management) and let us know if this helps or not?

0xjakob commented 3 years ago

Also, what's your configuration like? Could you post the sdkconfig file which your application build has generated (the checkbox from the template)?

npotts commented 3 years ago

I have not willingly setup the NVS partition, so Im not really sure on its size.

Before I modify the power management stuff:
sdkconfig.txt

npotts commented 3 years ago

Could you please try to turn on power management (idf.py menuconfig, then navigate to Component configPower Management) and let us know if this helps or not?

I did that and there is no change. It still blocks and the WDT kicks in.

0xjakob commented 3 years ago

@npotts Sorry for the long delay answering... If you delete the build folder by using idf.py fullclean and do a complete build, then idf.py build will print the partition table like this:

[23/1022] Generating ../../partition_table/partition-table.bin
Partition table binary generated. Contents:
*******************************************************************************
# ESP-IDF Partition Table
# Name, Type, SubType, Offset, Size, Flags
nvs,data,nvs,0xb000,20K,
otadata,data,ota,0x10000,8K,
phy_init,data,phy,0x12000,4K,
factory,app,factory,0x20000,2432K,
ota_0,app,ota_0,0x280000,64K,
ota_1,app,ota_1,0x290000,64K,
flash_test,data,fat,0x2a0000,528K,
nvs_key,data,nvs_keys,0x324000,4K,encrypted
*******************************************************************************

Does this error happen only on one particular device or on all devices you use, also e.g. brand-new ones (if you have them)?

Is it really an infinite loop or does something happen after five minutes (We've seen a similar issue with slow execution before)?

Is it possible for you to read out the nvs flash partition and send it to us? Then we can try it on our devices and see if the issue reproduces. This is just a question, please note that there might be credentials from WiFi and other confidential information in it, so please don't upload it here directly.

tuupola commented 3 years ago

I have same problem with code that has been working before. Tried to compile with recent esp-idf and the code crashes when calling nvs_flash_init();. If I comment the nvs initialisation out code does not crash. Tried with three different boards.

Environment

Note that the problem does not exist with ESP-IDF v4.3.

Debug Logs

ets Jun  8 2016 00:22:57

rst:0x8 (TG1WDT_SYS_RESET),boot:0x13 (SPI_FAST_FLASH_BOOT)
configsip: 188777542, SPIWP:0xee
clk_drv:0x00,q_drv:0x00,d_drv:0x00,cs0_drv:0x00,hd_drv:0x00,wp_drv:0x00
mode:DIO, clock div:2
load:0x3fff0030,len:6896
load:0x40078000,len:14288
load:0x40080400,len:3744
entry 0x40080684
I (55) boot: ESP-IDF v4.4-dev-2594-ga20df743f1 2nd stage bootloader
I (55) boot: compile time 15:21:20
I (55) boot: chip revision: 1
I (60) boot_comm: chip revision: 1, min. bootloader chip revision: 0
I (67) boot.esp32: SPI Speed      : 40MHz
I (72) boot.esp32: SPI Mode       : DIO
I (77) boot.esp32: SPI Flash Size : 4MB
W (81) boot.esp32: PRO CPU has been reset by WDT.
W (86) boot.esp32: WDT reset info: PRO CPU PC=0x4008590f
0x4008590f: _xt_highint4 at /Users/tuupola/esp/esp-idf/components/freertos/port/xtensa/xtensa_vector_defaults.S:101

W (92) boot.esp32: WDT reset info: APP CPU PC=0x4008590c
0x4008590c: _xt_highint4 at /Users/tuupola/esp/esp-idf/components/freertos/port/xtensa/xtensa_vector_defaults.S:100

I (98) boot: Enabling RNG early entropy source...
I (104) boot: Partition Table:
I (108) boot: ## Label            Usage          Type ST Offset   Length
I (115) boot:  0 nvs              WiFi data        01 02 00009000 00006000
I (122) boot:  1 phy_init         RF data          01 01 0000f000 00001000
I (130) boot:  2 factory          factory app      00 00 00010000 00100000
I (138) boot: End of partition table
I (142) boot_comm: chip revision: 1, min. application chip revision: 0
I (149) esp_image: segment 0: paddr=00010020 vaddr=3f400020 size=0f0cch ( 61644) map
I (180) esp_image: segment 1: paddr=0001f0f4 vaddr=3ffb0000 size=00f24h (  3876) load
I (182) esp_image: segment 2: paddr=00020020 vaddr=400d0020 size=217f4h (137204) map
I (235) esp_image: segment 3: paddr=0004181c vaddr=3ffb0f24 size=01698h (  5784) load
I (238) esp_image: segment 4: paddr=00042ebc vaddr=40080000 size=0e354h ( 58196) load
I (265) esp_image: segment 5: paddr=00051218 vaddr=50000000 size=00010h (    16) load
I (272) boot: Loaded app from partition at offset 0x10000
I (272) boot: Disabling RNG early entropy source...
I (285) cpu_start: Pro cpu up.
I (286) cpu_start: Starting app cpu, entry point is 0x400811b8
0x400811b8: call_start_cpu1 at /Users/tuupola/esp/esp-idf/components/esp_system/port/cpu_start.c:156

I (0) cpu_start: App cpu up.
I (300) cpu_start: Pro cpu start user code
I (300) cpu_start: cpu freq: 160000000
I (300) cpu_start: Application information:
I (304) cpu_start: Project name:     esp-m5stick
I (309) cpu_start: App version:      a62612c-dirty
I (315) cpu_start: Compile time:     Sep  2 2021 21:15:43
I (321) cpu_start: ELF file SHA256:  e63e4a5f71821764...
I (327) cpu_start: ESP-IDF:          v4.4-dev-2594-ga20df743f1
I (334) heap_init: Initializing. RAM available for dynamic allocation:
I (341) heap_init: At 3FFAE6E0 len 00001920 (6 KiB): DRAM
I (347) heap_init: At 3FFB30C8 len 0002CF38 (179 KiB): DRAM
I (353) heap_init: At 3FFE0440 len 00003AE0 (14 KiB): D/IRAM
I (359) heap_init: At 3FFE4350 len 0001BCB0 (111 KiB): D/IRAM
I (366) heap_init: At 4008E354 len 00011CAC (71 KiB): IRAM
I (373) spi_flash: detected chip: gd
I (376) spi_flash: flash io: dio
I (381) sleep: Configure to isolate all GPIO pins in sleep state
I (387) sleep: Enable automatic switching of GPIO sleep configuration
I (394) cpu_start: Starting scheduler on PRO CPU.
I (0) cpu_start: Starting scheduler on APP CPU.
I (405) main: SDK version: v4.4-dev-2594-ga20df743f1
I (415) main: Heap when starting: 294272
I (415) main: Initializing I2C
I (415) i2c_helper: Starting I2C master at port 0.
I (425) main: Initializing AXP192
I (435) main: Initializing BM8563
I (435) main: Setting BM8563 alarm
I (435) main: Setting BM8563 timer
I (445) main: Initializing display
I (1345) mipi_display: Display initialized.
I (1355) main: Initializing non volatile storage
ets Jun  8 2016 00:22:57

rst:0x8 (TG1WDT_SYS_RESET),boot:0x13 (SPI_FAST_FLASH_BOOT)
configsip: 188777542, SPIWP:0xee
clk_drv:0x00,q_drv:0x00,d_drv:0x00,cs0_drv:0x00,hd_drv:0x00,wp_drv:0x00
mode:DIO, clock div:2
load:0x3fff0030,len:6896
load:0x40078000,len:14288
load:0x40080400,len:3744
entry 0x40080684
I (55) boot: ESP-IDF v4.4-dev-2594-ga20df743f1 2nd stage bootloader
I (55) boot: compile time 15:21:20
I (55) boot: chip revision: 1
0xjakob commented 2 years ago

@tuupola @npotts Please try the attached patch and let me know if it mitigates your problem: 0001-flash-permance.zip

zhuchen99899 commented 2 years ago

I meet the same issue... Module or chip used: ESP32-PICO-D4 IDF version: v4.4 Power Supply: USB and i try the dc-dc supply ,it can't work too. Operating System: ubuntu 18.04 Using an IDE?: vscode but i don't try another IDF version. i will try IDF v4.3 tomorrow.

0xjakob commented 2 years ago

@zhuchen99899 Did you try the patch mentioned in my last post?

ataweg commented 2 years ago

I have a similar problem. I have extended the blink example with nvs_flash_init () for analysis. If the example is built with CMake, blink will work. If it is built with make, the part after nvs_flash_init () is not executed and a PRO CPU watchdog reset occurs. I used the current esf-idf from today.

0xjakob commented 2 years ago

@ataweg Could you please try this patch which I posted above and let me know if it solves or mitigate the problem?

AxelLin commented 2 years ago

@ataweg What's the "git describe --tags" output? (current esf-idf from today does not tell which branch is used)

@0xjakob Why build with cmake v.s. make has different result?

0xjakob commented 2 years ago

@AxelLin

Why build with cmake v.s. make has different result?

Honestly, I don't know, might be simple non-deterministic build (make always pulls in all dependencies).

Regarding the issue: we have a problem with slow flash in some circumstances which is often noticed when applications try NVS at the startup. Hence the patch. So far, it's been difficult for us to reproduce the problem locally.

ataweg commented 2 years ago

@AxelLin v4.4-dev-3235-g3e370c4296

@0xjakob patch didn't help build_log.2021-09-29_08-02.txt.zip

it is fine also for esp-idf v4.3.1-29-ge493a4c30e

0xjakob commented 2 years ago

@ataweg Sorry to hear that. Could you post the sdkconfig file from your application's main directory here? That would help us reproducing and debugging the issue.

ataweg commented 2 years ago

@0xjakob blink_nvs.2021-09-29_14-25.zip

vinifr commented 2 years ago

Hello. The last code commit in which nvs_flash_init is working good is this:

commit 4dd88329c1e8b6fb3afc0355b1d5fad0d7a624e9 Author: Jakob Hasse jakob.hasse@espressif.com Date: Thu Jun 3 11:59:00 2021 +0800

ataweg commented 2 years ago

@vinifr I can confirm this for my project.

revk commented 2 years ago

Same here, the code and partition table I have been using has worked on previous builds and stable with no problems, but on latest it watchdogs in nvs init, just the same.

0xjakob commented 2 years ago

There seems to be an issue with our old make build system. Once I build with make, I also encounter this issue. If you have a chance, please try to build with cmake (idf.py build) as a temporary workaround.

revk commented 2 years ago

That is weird, but yes, using idf.py build and it no longer breaks. How can the make system do this?

ataweg commented 2 years ago

Yes, with the cmake toolchain the sample code from above works fine, even with the last commit of the master branch.

I setup a new make enviroment according the recipe at the end of (https://github.com/espressif/esp-idf/issues/7655.) My project works fine with commit 4dd8832, but not with the last one.

So if the last commit works for CMake, it shouldn't be such a big problem that it also works with make. I can understand that Espressif has reduced the quality assurance for make. The community will already find the bugs, but Espressif should already put them out.

0xjakob commented 2 years ago

I can understand that Espressif has reduced the quality assurance for make.

We officially still support Make in all 4.x releases. So if stuff doesn't work out like in this MR, it will be treated as a bug and will be fixed unless the support period for that release is over.

That being said, we usually use the new cmake build system, aka idf.py build internally and it will be the only available build system from release v5.0. This is the reason why some bugs may not be noticed by us at first. Our internal CI also runs a complete make testing cycle, though, which catches most make build errors.

We are working on a fix for this already and will keep you updated here.

Edit: In our programming guide there is a page regarding the support period, including the actual start and end dates for each release.