SmingHub / Sming

Sming - powerful open source framework simplifying the creation of embedded C++ applications.
https://sming.readthedocs.io
GNU Lesser General Public License v3.0
1.47k stars 347 forks source link

ESP32, IDF 5.0: error with nvs_flash_init #2653

Open mr-sven opened 1 year ago

mr-sven commented 1 year ago

I have an issue with initialization of Wifi, I used a clean machine and try to deploy the Basic_Wifi sample. The ESP freezes at boot and the WDT is resetting the chip.

commands:

$ git clone https://github.com/SmingHub/Sming /opt/sming
$ INSTALL_IDF_VER=5.0 $SMING_HOME/../Tools/install.sh esp32
$ source /opt/sming/Sming/../Tools/export.sh
$ cd /opt/sming/samples/Basic_WiFi/
$ make SMING_ARCH=Esp32 flash

output of WDT and addresses

W (76) boot.esp32: PRO CPU has been reset by WDT.
W (82) boot.esp32: WDT reset info: PRO CPU PC=0x40082230
W (88) boot.esp32: WDT reset info: APP CPU PC=0x40080240

$ xtensa-esp32-elf-addr2line -fe out/Esp32/esp32/debug/build/app.out 0x40082230
_xt_highint5
/opt/esp-idf-5.0/components/freertos/FreeRTOS-Kernel/portable/xtensa/xtensa_vector_defaults.S:192

$ xtensa-esp32-elf-addr2line -fe out/Esp32/esp32/debug/build/app.out 0x40080240
_Level5Vector
/opt/esp-idf-5.0/components/freertos/FreeRTOS-Kernel/portable/xtensa/xtensa_vectors.S:1647

In the end I drilled it down to the function nvs_flash_init. I used the Basic_Blink sample and added the function and it freezes.

#include <nvs_flash.h>

void init()
{
    pinMode(LED_PIN, OUTPUT);
    procTimer.initializeMs(1000, blink).start();
    esp_err_t ret = nvs_flash_init();
}

The crazy thing is, if I use the nvs_rw_value example of the installed IDF it is working.

commands:

$ cd /opt/esp-idf/examples/storage/nvs_rw_value
$ ../../../tools/idf.py set-target esp32
$ ../../../tools/idf.py build
$ ../../../tools/idf.py -p /dev/ttyUSB0 flash monitor
...
I (0) cpu_start: Starting scheduler on APP CPU.

Opening Non-Volatile Storage (NVS) handle... Done
Reading restart counter from NVS ... Done
Restart counter = 1
Updating restart counter in NVS ... Done
Committing updates in NVS ... Done
...
slaff commented 1 year ago

Just for my understanding: Is the Basic_Blink application working for you? Also why are you adding platform/arch/soc specific code in an application? It will make it hard to use in another architecture. And finally the nvs initialization for ESP32 is already done behind the scenes for you in Sming Components/Network/Arch/Esp32/Platform/init.cpp?

mr-sven commented 1 year ago

Ahm, Basic_Blink is working, Basic_Wifi is not working. As soon as I remove the DISABLE_NETWORK switch form the Basic_Blink, it is also not working. So I dig into. In Sming/Arch/Esp32/Components/esp32/src/startup.cpp there is esp_network_initialise which is executed as soon as network is on. In Sming/Components/Network/Arch/Esp32/Platform/init.cpp the first action is to initialize the NVS, and exactly there it freezes.

So for simple test you can execute the nvs_flash_init in the Basic_Blink and it freezes also.

slaff commented 1 year ago

@mr-sven Thank you.

So for simple test you can execute the nvs_flash_init in the Basic_Blink and it freezes also.

@mikee47 any idea what can cause this issue with IDF 5.0 ?

mikee47 commented 1 year ago

I've traced it through to the spi_flash_disable_interrupts_caches_and_other_cpu() function (called during loading of the partition table) which hangs on the first spi_flash_disable_cache() call. I have a suspicion there's an interrupt service routine running out of flash, which shouldn't be.

mikee47 commented 1 year ago

I've attempted to load the partition table as early as possible by calling esp_partition_find directly from app_main, but just hangs there instead.

A temporary fix is to run make sdk-menuconfig and set FREERTOS_UNICORE=y.

mr-sven commented 1 year ago

Yes, that works for the moment. That was also my idea that anything stuck at task level or an ISR, because the WDT PC addresses are always at some interrupts.

mr-sven commented 1 year ago

I've diffed the sdkconfig from IDF and Sming. The only settings what points me are CONFIG_ESP_SYSTEM_CHECK_INT_LEVEL_4 and CONFIG_ESP_SYSTEM_CHECK_INT_LEVEL_5, some WDT and FREERTOS Flags.

--- idf.sdkconfig       2023-07-25 11:25:11.701113817 +0200
+++ sming.sdkconfig     2023-07-25 11:29:56.472197308 +0200
@@ -16,4 +15,0 @@
-CONFIG_APPTRACE_DEST_NONE=y
-CONFIG_APPTRACE_DEST_UART_NONE=y
-CONFIG_APPTRACE_LOCK_ENABLE=y
-CONFIG_APPTRACE_UART_TASK_PRIO=1
@@ -37,2 +33,2 @@
-CONFIG_COMPILER_OPTIMIZATION_DEFAULT=y
-CONFIG_COMPILER_OPTIMIZATION_LEVEL_DEBUG=y
+CONFIG_COMPILER_OPTIMIZATION_LEVEL_RELEASE=y
+CONFIG_COMPILER_OPTIMIZATION_SIZE=y
@@ -46,2 +41,0 @@
-CONFIG_ESP32_APPTRACE_DEST_NONE=y
-CONFIG_ESP32_APPTRACE_LOCK_ENABLE=y
@@ -115 +108,0 @@
-CONFIG_ESP_HTTP_CLIENT_ENABLE_HTTPS=y
@@ -136,3 +128,0 @@
-CONFIG_ESP_PROTOCOMM_SUPPORT_SECURITY_VERSION_0=y
-CONFIG_ESP_PROTOCOMM_SUPPORT_SECURITY_VERSION_1=y
-CONFIG_ESP_PROTOCOMM_SUPPORT_SECURITY_VERSION_2=y
@@ -152 +142 @@
-CONFIG_ESP_SYSTEM_EVENT_TASK_STACK_SIZE=2304
+CONFIG_ESP_SYSTEM_EVENT_TASK_STACK_SIZE=16384
@@ -165 +154,0 @@
-CONFIG_ESP_TLS_USING_MBEDTLS=y
@@ -181,19 +169,0 @@
-CONFIG_ETH_DMA_BUFFER_SIZE=512
-CONFIG_ETH_DMA_RX_BUFFER_NUM=10
-CONFIG_ETH_DMA_TX_BUFFER_NUM=10
-CONFIG_ETH_ENABLED=y
-CONFIG_ETH_PHY_INTERFACE_RMII=y
-CONFIG_ETH_RMII_CLK_IN_GPIO=0
-CONFIG_ETH_RMII_CLK_INPUT=y
-CONFIG_ETH_USE_ESP32_EMAC=y
-CONFIG_ETH_USE_SPI_ETHERNET=y
-CONFIG_FATFS_AUTO_TYPE=y
-CONFIG_FATFS_CODEPAGE=437
-CONFIG_FATFS_CODEPAGE_437=y
-CONFIG_FATFS_FS_LOCK=0
-CONFIG_FATFS_LFN_NONE=y
-CONFIG_FATFS_PER_FILE_CACHE=y
-CONFIG_FATFS_SECTOR_4096=y
-CONFIG_FATFS_SECTORS_PER_CLUSTER_1=y
-CONFIG_FATFS_TIMEOUT_MS=10000
-CONFIG_FATFS_VOLUME_COUNT=2
@@ -207,0 +178 @@
+CONFIG_FREERTOS_GENERATE_RUN_TIME_STATS=y
@@ -214,0 +186 @@
+CONFIG_FREERTOS_RUN_TIME_STATS_USING_ESP_TIMER=y
@@ -217 +188,0 @@
-CONFIG_FREERTOS_TASK_FUNCTION_WRAPPER=y
@@ -222,0 +194,4 @@
+CONFIG_FREERTOS_USE_STATS_FORMATTING_FUNCTIONS=y
+CONFIG_FREERTOS_USE_TRACE_FACILITY=y
+CONFIG_FREERTOS_VTASKLIST_INCLUDE_COREID=y
+CONFIG_FREERTOS_WATCHPOINT_END_OF_STACK=y
@@ -228,4 +202,0 @@
-CONFIG_HTTPD_ERR_RESP_NO_DELAY=y
-CONFIG_HTTPD_MAX_REQ_HDR_LEN=512
-CONFIG_HTTPD_MAX_URI_LEN=512
-CONFIG_HTTPD_PURGE_BUF_LEN=32
@@ -242 +212,0 @@
-CONFIG_LCD_PANEL_IO_FORMAT_BUF_SIZE=32
@@ -293 +263 @@
-CONFIG_LWIP_TCPIP_TASK_STACK_SIZE=3072
+CONFIG_LWIP_TCPIP_TASK_STACK_SIZE=8192
@@ -311,3 +280,0 @@
-CONFIG_MBEDTLS_CERTIFICATE_BUNDLE_DEFAULT_FULL=y
-CONFIG_MBEDTLS_CERTIFICATE_BUNDLE_MAX_CERTS=200
-CONFIG_MBEDTLS_CERTIFICATE_BUNDLE=y
@@ -365,4 +331,0 @@
-CONFIG_MQTT_PROTOCOL_311=y
-CONFIG_MQTT_TRANSPORT_SSL=y
-CONFIG_MQTT_TRANSPORT_WEBSOCKET_SECURE=y
-CONFIG_MQTT_TRANSPORT_WEBSOCKET=y
@@ -375 +338 @@
-CONFIG_OPTIMIZATION_LEVEL_DEBUG=y
+CONFIG_OPTIMIZATION_LEVEL_RELEASE=y
@@ -393 +355,0 @@
-CONFIG_SEMIHOSTFS_MAX_MOUNT_POINTS=1
@@ -573,11 +534,0 @@
-CONFIG_SPIFFS_CACHE_WR=y
-CONFIG_SPIFFS_CACHE=y
-CONFIG_SPIFFS_GC_MAX_RUNS=10
-CONFIG_SPIFFS_MAX_PARTITIONS=3
-CONFIG_SPIFFS_META_LENGTH=4
-CONFIG_SPIFFS_OBJ_NAME_LEN=32
-CONFIG_SPIFFS_PAGE_CHECK=y
-CONFIG_SPIFFS_PAGE_SIZE=256
-CONFIG_SPIFFS_USE_MAGIC_LENGTH=y
-CONFIG_SPIFFS_USE_MAGIC=y
-CONFIG_SPIFFS_USE_MTIME=y
@@ -597,0 +549 @@
+CONFIG_SPI_MASTER_IN_IRAM=y
@@ -601,2 +552,0 @@
-CONFIG_SUPPORT_TERMIOS=y
-CONFIG_SUPPRESS_SELECT_DEBUG_OUTPUT=y
@@ -604 +554 @@
-CONFIG_SYSTEM_EVENT_TASK_STACK_SIZE=2304
+CONFIG_SYSTEM_EVENT_TASK_STACK_SIZE=16384
@@ -611 +561 @@
-CONFIG_TCPIP_TASK_STACK_SIZE=3072
+CONFIG_TCPIP_TASK_STACK_SIZE=8192
@@ -631,14 +580,0 @@
-CONFIG_UNITY_ENABLE_DOUBLE=y
-CONFIG_UNITY_ENABLE_FLOAT=y
-CONFIG_UNITY_ENABLE_IDF_TEST_RUNNER=y
-CONFIG_VFS_SEMIHOSTFS_MAX_MOUNT_POINTS=1
-CONFIG_VFS_SUPPORT_DIR=y
-CONFIG_VFS_SUPPORT_IO=y
-CONFIG_VFS_SUPPORT_SELECT=y
-CONFIG_VFS_SUPPORT_TERMIOS=y
-CONFIG_VFS_SUPPRESS_SELECT_DEBUG_OUTPUT=y
-CONFIG_WIFI_PROV_AUTOSTOP_TIMEOUT=30
-CONFIG_WIFI_PROV_SCAN_MAX_ENTRIES=16
-CONFIG_WIFI_PROV_STA_ALL_CHANNEL_SCAN=y
-CONFIG_WL_SECTOR_SIZE=4096
-CONFIG_WL_SECTOR_SIZE_4096=y
@@ -647,2 +582,0 @@
-CONFIG_WS_BUFFER_SIZE=1024
-CONFIG_WS_TRANSPORT=y

Edit: I removed the WDT Initialization from startup.cpp and enabled the IDF WDT in config, but the bug is still there.

mr-sven commented 1 year ago

I don't know but at the moment I have no clue. One difference I found is, that the IDF Sample bootloader output ends with the following:

I (29) boot: ESP-IDF v5.0-547-g20e8e86c80 2nd stage bootloader
I (29) boot: compile time 11:47:34
I (29) boot: chip revision: v1.0
I (33) boot.esp32: SPI Speed      : 40MHz
I (38) boot.esp32: SPI Mode       : DIO
I (42) boot.esp32: SPI Flash Size : 4MB
I (47) boot: Enabling RNG early entropy source...
I (52) boot: Partition Table:
I (56) boot: ## Label            Usage          Type ST Offset   Length
I (63) boot:  0 nvs              WiFi data        01 02 00009000 00006000
I (71) boot:  1 phy_init         RF data          01 01 0000f000 00001000
I (78) boot:  2 factory          factory app      00 00 00010000 00100000
I (86) boot: End of partition table
I (90) esp_image: segment 0: paddr=00010020 vaddr=3f400020 size=08edch ( 36572) map
I (112) esp_image: segment 1: paddr=00018f04 vaddr=3ffb0000 size=01e8ch (  7820) load
I (115) esp_image: segment 2: paddr=0001ad98 vaddr=40080000 size=05280h ( 21120) load
I (126) esp_image: segment 3: paddr=00020020 vaddr=400d0020 size=19e60h (106080) map
I (165) esp_image: segment 4: paddr=00039e88 vaddr=40085280 size=06db4h ( 28084) load
I (183) boot: Loaded app from partition at offset 0x10000
I (183) boot: Disabling RNG early entropy source...
I (194) cpu_start: Pro cpu up.
I (194) cpu_start: Starting app cpu, entry point is 0x40081160
I (180) cpu_start: App cpu up.
I (209) cpu_start: Pro cpu start user code
I (209) cpu_start: cpu freq: 160000000 Hz
I (209) cpu_start: Application information:
I (213) cpu_start: Project name:     nvs_rw_value
I (219) cpu_start: App version:      v5.0-547-g20e8e86c80
I (225) cpu_start: Compile time:     Jul 25 2023 11:47:30
I (231) cpu_start: ELF file SHA256:  2f3b43f724c4c6c4...
I (237) cpu_start: ESP-IDF:          v5.0-547-g20e8e86c80
I (243) cpu_start: Min chip rev:     v0.0
I (248) cpu_start: Max chip rev:     v3.99
I (253) cpu_start: Chip rev:         v1.0
I (258) heap_init: Initializing. RAM available for dynamic allocation:
I (265) heap_init: At 3FFAE6E0 len 00001920 (6 KiB): DRAM
I (271) heap_init: At 3FFB27C0 len 0002D840 (182 KiB): DRAM
I (277) heap_init: At 3FFE0440 len 00003AE0 (14 KiB): D/IRAM
I (283) heap_init: At 3FFE4350 len 0001BCB0 (111 KiB): D/IRAM
I (290) heap_init: At 4008C034 len 00013FCC (79 KiB): IRAM
I (297) spi_flash: detected chip: generic
I (301) spi_flash: flash io: dio
I (305) cpu_start: Starting scheduler on PRO CPU.
I (0) cpu_start: Starting scheduler on APP CPU.

The Sming sample bootloader output:

I (57) boot: ESP-IDF v5.0-547-g20e8e86c80 2nd stage bootloader
I (58) boot: compile time 11:46:08
I (58) boot: chip revision: v1.0
I (62) boot.esp32: SPI Speed      : 40MHz
I (67) boot.esp32: SPI Mode       : DIO
I (72) boot.esp32: SPI Flash Size : 4MB
I (94) boot: Enabling RNG early entropy source...
I (99) boot: Partition Table:
I (103) boot: ## Label            Usage          Type ST Offset   Length
I (110) boot:  0 spiFlash         unknown          02 01 00000000 00400000
I (117) boot:  1 nvs              WiFi data        01 02 00009000 00006000
I (125) boot:  2 phy_init         RF data          01 01 0000f000 00001000
I (133) boot:  3 factory          factory app      00 00 00010000 000f0000
I (140) boot: End of partition table
I (144) esp_image: segment 0: paddr=00010020 vaddr=3f400020 size=0804ch ( 32844) map
I (165) esp_image: segment 1: paddr=00018074 vaddr=3ffb0000 size=01bech (  7148) load
I (168) esp_image: segment 2: paddr=00019c68 vaddr=40080000 size=063b0h ( 25520) load
I (183) esp_image: segment 3: paddr=00020020 vaddr=400d0020 size=09f64h ( 40804) map
I (198) esp_image: segment 4: paddr=00029f8c vaddr=400863b0 size=04c28h ( 19496) load
I (211) boot: Loaded app from partition at offset 0x10000
I (212) boot: Disabling RNG early entropy source...
I (223) cpu_start: Pro cpu up.
I (223) cpu_start: Starting app cpu, entry point is 0x40080f0c
I (0) cpu_start: App cpu up.
I (237) cpu_start: Pro cpu start user code
I (237) cpu_start: cpu freq: 160000000 Hz
I (237) cpu_start: Application information:
I (242) cpu_start: Project name:     Sming
I (247) cpu_start: App version:      4.0.0-rc4-539-ge1b0f2b2-dirty
I (254) cpu_start: Compile time:     Jul 25 2023 11:46:11
I (260) cpu_start: ELF file SHA256:  7240a6988c72f902...
I (266) cpu_start: ESP-IDF:          v5.0-547-g20e8e86c80
I (272) cpu_start: Min chip rev:     v0.0
I (277) cpu_start: Max chip rev:     v3.99
I (282) cpu_start: Chip rev:         v1.0
I (286) heap_init: Initializing. RAM available for dynamic allocation:
I (294) heap_init: At 3FFAE6E0 len 00001920 (6 KiB): DRAM
I (300) heap_init: At 3FFB2570 len 0002DA90 (182 KiB): DRAM
I (306) heap_init: At 3FFE0440 len 00003AE0 (14 KiB): D/IRAM
I (312) heap_init: At 3FFE4350 len 0001BCB0 (111 KiB): D/IRAM
I (319) heap_init: At 4008AFD8 len 00015028 (84 KiB): IRAM
I (0) cpu_start: Starting scheduler on APP CPU.

So there is the spi_flash output missing.

mr-sven commented 1 year ago

Ok, that was also not the problem. Its only caused because the message is enclosed in ESP_LOGI and not ESP_EARLY_LOGI.

mr-sven commented 1 year ago

I've tried hard on setting sdkconfig switches, but nothing helped. It looks like there are some compiler switches missing or any other thing. I'm totally lost at the moment.

mikee47 commented 1 year ago

Do you need both CPUs? If not, stick to unicore for now.

mikee47 commented 1 year ago

Getting rid of FreeRTOS entirely would be ideal, but the WiFi stack is closed-source and has some embedded dependencies on RTOS operation.

Consipiracy edit: Strategy by Espressif to ensure they have a closed-source BLOB which gets them full control if required.

profjmer commented 1 year ago

Other samples reset the ESP32 too. Those using WiFi. I had a functional web site running on ESP32, but the new version brakes the code and ESP32 is always resetting. How can we identify Sming version from the uart log ?

mr-sven commented 1 year ago

Afaik. the problem is located in the SPI flash access. I currently had no chance to test against IDF 4.4, I think it is caused by IDF 5.

mikee47 commented 8 months ago

Afaik. the problem is located in the SPI flash access. I currently had no chance to test against IDF 4.4, I think it is caused by IDF 5.

I've updated sming IDF 4.4 to latest Espressif release (date 3/1/24) and can confirm that the above issue is still present, however the same fix applies and Basic_Ssl sample is working.

To update existing IDF installation manually:

cd /opt/esp-idf-4.4
git pull
git submodule update --init --recursive
python3 tools/idf_tools.py --non-interactive install-python-env