SmingHub / Sming

Sming - powerful open source framework simplifying the creation of embedded C++ applications.
https://sming.readthedocs.io
GNU Lesser General Public License v3.0
1.48k stars 347 forks source link

Properly System.restart() esp32 #2500

Closed kmihaylov closed 7 months ago

kmihaylov commented 2 years ago

Hello. When ESP utilizes WiFi it couldn't properly disconnect or restart the device.

E (18626) task_wdt: Task watchdog got triggered. The following tasks did not reset the watchdog in time:
E (18626) task_wdt:  - Sming (CPU 1)
E (18626) task_wdt: Tasks currently running:
E (18626) task_wdt: CPU 0: IDLE
E (18626) task_wdt: CPU 1: IDLE
E (18626) task_wdt: Aborting.

abort() was called at PC 0x400de9fc on core 0

Backtrace:0x40087b6a:0x3ffb0580 0x400882b5:0x3ffb05a0 0x4008e466:0x3ffb05c0 0x400de9fc:0x3ffb0630 0x40081d49:0x3ffb0650 0x4012fae7:0x3ffbaee0 0x400de80a:0x3ffbaf00 0x40089408:0x3ffbaf20

ELF file SHA256: 5e395847006b60a9

Rebooting...

I use the Basic_WiFi example and I've added System.restart(10000); at the end of the init() function.

Also the same applies if I try to use Wifi.disconnect(); or WifiStation.enable(false);.

kmihaylov commented 2 years ago

Hm, some answers here are missing.

Anyway, I'm watching esp-idf's function esp_wifi_stop(). How do I properly call it from within Sming?

kmihaylov commented 2 years ago

I'm answering myself. the esp_wifi_stop() and esp_wifi_deinit() functions can easily be accessed once the user adds the #include <esp_wifi.h> header to the application.cpp file.

I found that System.restart() in the Basic_Wifi example will work if the Wifi Access Point is disabled (I removed/commended out the stuff related to the AP). It seems that some task continues to work if the app isn't prepared for restart leading to WDT reset dropping to GDB.

Therefore I'm closing my issue now. The restart process seems to need more attention by the developer.

mikee47 commented 2 years ago

The restart process seems to need more attention by the developer

If you'd like to suggest improvements please feel free to open a PR!

mikee47 commented 2 years ago

@kmihaylov If it's OK with you we should leave this issue open as it definitely needs a fix.

kmihaylov commented 2 years ago

Still trying to figure out why the system panics. I was quick to say the Access Point feature of the Basic_Wifi sample was the reason for the panic.

Later I found that the panic still occurs, despite the AP feature was turned off.

Actually if I use System.onReady() to turn on the Wifi (WifiStation.enable(true)) the panic occurs. But if the code is consecutively executed:

    System.onReady(ready);

    WifiStation.enable(true);
    WifiStation.config(_F(WIFI_SSID), _F(WIFI_PWD));

    WifiStation.setIP(IpAddress(192, 168, 1, 171));

    // Set callback that should be triggered when we have assigned IP
    WifiEvents.onStationGotIP(connectOk);

    // Set callback that should be triggered if we are disconnected or connection attempt failed
    WifiEvents.onStationDisconnect(connectFail);

    esp_wifi_stop();
    esp_wifi_deinit();
    System.restart(5000);

The panic is missing. One reason to believe this works is that the wifi is stopped before it was fully initialized. I also tried with WifiStation.enable(true) in the init() and then to turn it off with a 2 second delay with a timer with no luck.


One way to mitigate this is to change the CONFIG_ESP_SYSTEM_PANIC config variable. It resides in Component config -> ESP System Settings

https://docs.espressif.com/projects/esp-idf/en/v4.4/esp32/api-guides/fatal-errors.html

make SMING_ARCH=Esp32 sdk-menuconfig
...
make SMING_ARCH=Esp32 Sming-build all

(see https://sming.readthedocs.io/en/latest/_inc/Sming/Arch/Esp32/README.html?highlight=menuconfig )

Obviously the default is CONFIG_ESP_SYSTEM_PANIC_GDBSTUB. I changed it to CONFIG_ESP_SYSTEM_PANIC_PRINT_REBOOT and now at least the system reboots after panicing.

This is important, otherwise I was unable to properly OTA update my devices.

kmihaylov commented 2 years ago

@mikee47

My Esp32 code used to do WifiStation.enable(true) before System.onReady(). This way the Wifi is working.

I found that if I try to enable the wifi with System.onReady, then it brakes.

See the code below:

#include <SmingCore.h>
#include "esp_wifi.h"

void wifiInit() {
    esp_wifi_deinit();
    esp_netif_create_default_wifi_sta();
    wifi_init_config_t cfg = WIFI_INIT_CONFIG_DEFAULT();
    ESP_ERROR_CHECK(esp_wifi_init(&cfg));
    esp_wifi_set_mode(WIFI_MODE_STA);
    wifi_config_t wifiCfg;
    esp_wifi_set_config(WIFI_IF_STA, &wifiCfg);
    esp_wifi_start();
}

void init()
{
    wifiInit();
    //System.onReady(wifiInit);
}

This is the modified Basic_Blink (w/ networking enabled). If you comment out the wifiInit() function and use the onReady handler instead, the chip enters in endless restart (or drops to GDP shell, depending on IDF's build options).

I followed this example: https://github.com/espressif/esp-idf/blob/7c2b137ffe8a91ef0967cd5da1fe330ff5912399/examples/wifi/getting_started/station/main/station_example_main.c

And the above mentioned code is the extracted minimum of functions. I thik the system freezes exactly on esp_wifi_start().

esp_netif_init() is executed in Sming/Components/Network/Arch/Esp32/Platform/init.cpp so I omit it.

The only missing function from this initialization code is esp_event_loop_create_default() and Sming uses its own sming_create_event_loop(). RTOS and events are outside my knowledge so I don't know if this could cause the problem.


I may be wrong to use onReady event to initialize Wifi. As I wrote, w/o it the app loads properly. Even System.restart() works well, but I don't know if WiFi is properly initialized now. I'll go to check it.


Ideas for this message: E (296445) wifi:wifi ipc: failed to post wifi task ?

mikee47 commented 2 years ago

I suspect one problem is that when onReady() gets to run the WiFi stack is being 'automatically' initialised. To diagnose what's happening with the endless rebooting we need to be able to see the bootloader diagnostics by setting BOOTLOADER_LOG_LEVEL appropriately in the SDK config. The Sming WiFi initialisation code can be found in Components/Network/Esp32/Platform/init.cpp, and the station/access code in StationImpl.cpp and AccessPointImpl.cpp accordingly.

kmihaylov commented 2 years ago

I set the BOOTLOADER_LOG_LEVEL to Debug.

However now some overlap occurs:

$ make flash

Basic_Blink: Invoking 'flash' for Esp32/esp32 (debug) architecture
ninja: Entering directory `/home/opt/Sming/Sming/out/Esp32/esp32/debug/build/esp32/sdk'
[1/3] Performing build step for 'bootloader'
ninja: no work to do.
Copying generated SDK libraries
make components application
make[1]: Entering directory '/home/opt/Sming/samples/Basic_Blink'

Building /home/opt/Sming/samples/Basic_Blink/out/Esp32/esp32/debug/lib/clib-App-98050d16e24ff8720e998f71db21cffd.a
make[2]: Nothing to be done for 'build'.
Validating hardware config 'standard'
Verifying partition table...
Writing to 'out/Esp32/esp32/debug/firmware/partitions.bin'
SSL support disabled
Basic_Blink: Linking out/Esp32/esp32/debug/build/app.out
   Section|                   Description| Start (hex)|   End (hex)|Used space
------------------------------------------------------------------------------
      data|        Initialized Data (RAM)|    3FFBDB60|    3FFC174C|   15340
    rodata|           ReadOnly Data (SPI)|    3F400020|    3F412130|   74000
       bss|      Uninitialized Data (RAM)|    3FFC1750|    3FFC5590|   15936
 iram_text|            Cached Code (IRAM)|    40080400|    40093C60|   79968
      text|           Uncached Code (SPI)|    400D0020|    4012D1D3|  381363
Total Used RAM : 31276
Free RAM : 304596
Free IRam : 51104
esptool.py v3.2-dev
Merged 2 ELF sections
make[1]: Leaving directory '/home/opt/Sming/samples/Basic_Blink'
Killing Terminal to free /dev/ttyUSB0
make: [/home/opt/Sming/Sming/Components/terminal/component.mk:32: kill_term] Killed (ignored)
WriteFlash 0x1000=/home/opt/Sming/Sming/out/Esp32/esp32/debug/build/esp32/sdk/bootloader/bootloader.bin 0x00008000=out/Esp32/esp32/debug/firmware/partitions.bin   0x00010000=out/Esp32/esp32/debug/firmware/app.bin
/usr/bin/python3 /home/opt/Sming/Sming/Components/esptool/esptool/esptool.py -p /dev/ttyUSB0 -b 115200 --chip esp32 --before default_reset --after hard_reset write_flash -z -fs 4MB -ff 40m -fm dio 0x1000 /home/opt/Sming/Sming/out/Esp32/esp32/debug/build/esp32/sdk/bootloader/bootloader.bin 0x00008000 out/Esp32/esp32/debug/firmware/partitions.bin   0x00010000 out/Esp32/esp32/debug/firmware/app.bin 
usage: esptool write_flash [-h] [--erase-all] [--flash_freq {keep,40m,26m,20m,80m}] [--flash_mode {keep,qio,qout,dio,dout}] [--flash_size FLASH_SIZE] [--spi-connection SPI_CONNECTION]
                           [--no-progress] [--verify] [--encrypt] [--encrypt-files <address> <filename> [<address> <filename> ...]] [--ignore-flash-encryption-efuse-setting]
                           [--compress | --no-compress]
                           <address> <filename> [<address> <filename> ...]
esptool write_flash: error: argument <address> <filename>: Detected overlap at address: 0x8000 for file: out/Esp32/esp32/debug/firmware/partitions.bin
make: *** [/home/opt/Sming/Sming/component.mk:137: flash] Error 2
 $ make map

Basic_Blink: Invoking 'map' for Esp32/esp32 (debug) architecture
Partition map: standard
options: 
Device            Start       End         Size        Type      SubType   Name              Filename
----------------  ----------  ----------  ----------  --------  --------  ----------------  ------------
spiFlash          0x00000000  0x00007fff         32K                      Boot Sector       
spiFlash          0x00008000  0x00008fff          4K                      Partition Table   
spiFlash          0x00009000  0x0000efff         24K  data      nvs       nvs               
spiFlash          0x0000f000  0x0000ffff          4K  data      phy       phy_init          
spiFlash          0x00010000  0x000fffff        960K  app       factory   factory           $(TARGET_BIN)
spiFlash          0x00100000  0x003fffff          3M                      (unused)          
kmihaylov commented 2 years ago

Pretty good bug to bring me the knowledge to make arbitrary partition tables and esp-idf specific options!

I managed to make custom partition table and I learned how to change the partition table offset. Unfortunately I don't see any useful information in the debug messages.

Here I'll post all the options I changed and all the output.

Application code:

#include <SmingCore.h>
#include "esp_wifi.h"

Timer testTimer;

void wifiInit() {
    esp_wifi_deinit();
    esp_netif_create_default_wifi_sta();
    wifi_init_config_t cfg = WIFI_INIT_CONFIG_DEFAULT();
    ESP_ERROR_CHECK(esp_wifi_init(&cfg));
    esp_wifi_set_mode(WIFI_MODE_STA);
    wifi_config_t wifiCfg;
    esp_wifi_set_config(WIFI_IF_STA, &wifiCfg);
    esp_wifi_start();
}

void shutdown() {
    esp_wifi_stop();
    esp_restart();
}

void init()
{
    wifiInit();
    testTimer.initializeMs(5000, shutdown).startOnce();
}
$ make map

Basic_Blink: Invoking 'map' for Esp32/esp32 (debug) architecture
Partition map: bigboot
options: 4m
Device            Start       End         Size        Type      SubType   Name              Filename
----------------  ----------  ----------  ----------  --------  --------  ----------------  ------------
spiFlash          0x00000000  0x0000ffff         64K                      Boot Sector       
spiFlash          0x00010000  0x00010fff          4K                      Partition Table   
spiFlash          0x00011000  0x00016fff         24K  data      nvs       nvs               
spiFlash          0x00017000  0x00017fff          4K  data      phy       phy_init          
spiFlash          0x00018000  0x0001ffff         32K                      (unused)          
spiFlash          0x00020000  0x0010ffff        960K  app       factory   factory           out/Esp32/esp32/debug/firmware/app.bin
spiFlash          0x00110000  0x003fffff       3008K                      (unused)          

I changed the boorloader verbosity level from sdk-menuconfig -> Bootloader config -> Bootloader log verbosity to Debug. (I also changed the Bootloader optimization Level to Debug).

Also I moved the partition table offset in sdk-menuconfig -> Partition Table -> Offset of partition table to 0x10000. (Depends on the custom HWCONFIG).

And here is the output:

--- Quit: Ctrl+] | Menu: Ctrl+T | Help: Ctrl+T followed by Ctrl+H ---
ets Jun  8 2016 00:22:57

rst:0x1 (POWERON_RESET),boot:0x13 (SPI_FAST_FLASH_BOOT)
configsip: 0, SPIWP:0xee
clk_drv:0x00,q_drv:0x00,d_drv:0x00,cs0_drv:0x00,hd_drv:0x00,wp_drv:0x00
mode:DIO, clock div:2
load:0x3fff0030,len:9740
ho 0 tail 12 room 4
load:0x40078000,len:18064
load:0x40080400,len:4544
entry 0x4008071c
I (28) boot: ESP-IDF v4.3-1675-g4a43d72911 2nd stage bootloader
I (28) boot: compile time 20:16:49
D (28) bootloader_flash: XMC chip detected by RDID (00204018), skip.
D (35) bootloader_flash: mmu set block paddr=0x00000000 (was 0xffffffff)
I (42) boot: chip revision: 1
I (45) boot_comm: chip revision: 1, min. bootloader chip revision: 0
D (52) boot.esp32: magic e9
D (55) boot.esp32: segments 03
D (58) boot.esp32: spi_mode 02
D (61) boot.esp32: spi_speed 00
D (65) boot.esp32: spi_size 02
I (68) boot.esp32: SPI Speed      : 40MHz
I (72) boot.esp32: SPI Mode       : DIO
I (77) boot.esp32: SPI Flash Size : 4MB
D (81) boot: Enabling RTCWDT(9000 ms)
I (85) boot: Enabling RNG early entropy source...
D (91) bootloader_flash: mmu set paddr=00010000 count=1 size=c00 src_addr=10000 src_addr_aligned=10000
D (100) boot: mapped partition table 0x10000 at 0x3f400000
D (106) flash_parts: partition table verified, 5 entries
I (111) boot: Partition Table:
I (115) boot: ## Label            Usage          Type ST Offset   Length
D (122) boot: load partition table entry 0x3f400000
D (127) boot: type=2 subtype=1
I (130) boot:  0 spiFlash         unknown          02 01 00000000 00400000
D (138) boot: load partition table entry 0x3f400020
D (143) boot: type=1 subtype=2
I (146) boot:  1 nvs              WiFi data        01 02 00011000 00006000
D (153) boot: load partition table entry 0x3f400040
D (158) boot: type=1 subtype=1
I (161) boot:  2 phy_init         RF data          01 01 00017000 00001000
D (169) boot: load partition table entry 0x3f400060
D (174) boot: type=0 subtype=0
I (177) boot:  3 factory          factory app      00 00 00020000 000f0000
I (185) boot: End of partition table
D (189) boot: Trying partition index -1 offs 0x20000 size 0xf0000
D (195) esp_image: reading image header @ 0x20000
D (200) bootloader_flash: mmu set block paddr=0x00020000 (was 0xffffffff)
D (207) esp_image: image header: 0xe9 0x06 0x02 0x02 40081334
I (212) boot_comm: chip revision: 1, min. application chip revision: 0
I (220) esp_image: segment 0: paddr=00020020 vaddr=3f400020 size=12138h ( 74040) map
D (228) esp_image: free data page_count 0x00000032
D (233) bootloader_flash: mmu set paddr=00020000 count=2 size=12138 src_addr=20020 src_addr_aligned=20000
D (269) bootloader_flash: mmu set block paddr=0x00030000 (was 0xffffffff)
I (269) esp_image: segment 1: paddr=00032160 vaddr=3ffbdb60 size=03bech ( 15340) load
D (273) esp_image: free data page_count 0x00000032
D (278) bootloader_flash: mmu set paddr=00030000 count=1 size=3bec src_addr=32160 src_addr_aligned=30000
D (294) bootloader_flash: mmu set block paddr=0x00030000 (was 0xffffffff)
I (295) esp_image: segment 2: paddr=00035d54 vaddr=40080000 size=0a2c4h ( 41668) load
D (303) esp_image: free data page_count 0x00000032
D (308) bootloader_flash: mmu set paddr=00030000 count=2 size=a2c4 src_addr=35d54 src_addr_aligned=30000
D (335) bootloader_flash: mmu set block paddr=0x00040000 (was 0xffffffff)
I (335) esp_image: segment 3: paddr=00040020 vaddr=400d0020 size=5d1c0h (381376) map
D (339) esp_image: free data page_count 0x00000032
D (344) bootloader_flash: mmu set paddr=00040000 count=6 size=5d1c0 src_addr=40020 src_addr_aligned=40000
D (490) bootloader_flash: mmu set block paddr=0x00090000 (was 0xffffffff)
I (490) esp_image: segment 4: paddr=0009d1e8 vaddr=4008a2c4 size=0999ch ( 39324) load
D (494) esp_image: free data page_count 0x00000032
D (499) bootloader_flash: mmu set paddr=00090000 count=2 size=999c src_addr=9d1e8 src_addr_aligned=90000
D (525) bootloader_flash: mmu set block paddr=0x000a0000 (was 0xffffffff)
I (525) esp_image: segment 5: paddr=000a6b8c vaddr=50000000 size=00010h (    16) load
D (529) esp_image: free data page_count 0x00000032
D (534) bootloader_flash: mmu set paddr=000a0000 count=1 size=10 src_addr=a6b8c src_addr_aligned=a0000
D (543) bootloader_flash: mmu set block paddr=0x000a0000 (was 0xffffffff)
D (550) boot: Calculated hash: d2ce3ff52585bfcf26061df2dd840ae0aebb8d0fb7c282cc8f90aae06d8a770e
I (571) boot: Loaded app from partition at offset 0x20000
I (571) boot: Disabling RNG early entropy source...
D (573) boot: Mapping segment 0 as DROM
D (577) boot: Mapping segment 3 as IROM
D (581) boot: calling set_cache_and_start_app
D (585) boot: configure drom and irom and start
D (590) boot: start: 0x40081334
I (604) cpu_start: Pro cpu up.
I (604) cpu_start: Starting app cpu, entry point is 0x4008128c
I (0) cpu_start: App cpu up.
I (618) cpu_start: Pro cpu start user code
I (618) cpu_start: cpu freq: 160000000
I (618) cpu_start: Application information:
I (623) cpu_start: Project name:     Sming
I (628) cpu_start: App version:      4.0.0-rc4-460-g279460f9-dirty
I (634) cpu_start: Compile time:     Aug 23 2022 20:16:42
I (641) cpu_start: ELF file SHA256:  60b55032d45898b0...
I (647) cpu_start: ESP-IDF:          v4.3-1675-g4a43d72911
I (653) heap_init: Initializing. RAM available for dynamic allocation:
I (660) heap_init: At 3FFAE6E0 len 0000F480 (61 KiB): DRAM
I (666) heap_init: At 3FFC5590 len 0001AA70 (106 KiB): DRAM
I (672) heap_init: At 3FFE0440 len 00003AE0 (14 KiB): D/IRAM
I (679) heap_init: At 3FFE4350 len 0001BCB0 (111 KiB): D/IRAM
I (685) heap_init: At 40093C60 len 0000C3A0 (48 KiB): IRAM
W (692) spi_flash: Detected size(16384k) larger than the size in the binary image header(4096k). Using the size in the binary image header.
I (0) cpu_start: Starting scheduler on APP CPU.
E (12980) task_wdt: Task watchdog got triggered. The following tasks did not reset the watchdog in time:
E (12980) task_wdt:  - Sming (CPU 1)
E (12980) task_wdt: Tasks currently running:
E (12980) task_wdt: CPU 0: IDLE
E (12980) task_wdt: CPU 1: IDLE
E (12980) task_wdt: Aborting.

abort() was called at PC 0x400da9e4 on core 0

Backtrace:0x40081682:0x3ffbe0e0 0x400883f9:0x3ffbe100 0x4008e4fe:0x3ffbe120 0x400da9e4:0x3ffbe190 0x40081dad:0x3ffbe1b0 0x401291fb:0x3ffb2fd0 0x400da7f2:0x3ffb2ff0 0x40089544:0x3ffb3010

ELF file SHA256: 60b55032d45898b0

Rebooting...
ets Jun  8 2016 00:22:57

rst:0xc (SW_CPU_RESET),boot:0x13 (SPI_FAST_FLASH_BOOT)
configsip: 0, SPIWP:0xee
clk_drv:0x00,q_drv:0x00,d_drv:0x00,cs0_drv:0x00,hd_drv:0x00,wp_drv:0x00
mode:DIO, clock div:2
load:0x3fff0030,len:9740
ho 0 tail 12 room 4
load:0x40078000,len:18064
load:0x40080400,len:4544
entry 0x4008071c
I (29) boot: ESP-IDF v4.3-1675-g4a43d72911 2nd stage bootloader
I (29) boot: compile time 20:16:49

The decoded stack:

0x40081682: panic_abort at /home/opt/esp-idf-4.3/components/esp_system/panic.c:404
0x400883f9: esp_system_abort at /home/opt/esp-idf-4.3/components/esp_system/system_api.c:112
0x4008e4fe: abort at /home/opt/esp-idf-4.3/components/newlib/abort.c:46
0x400da9e4: task_wdt_isr at /home/opt/esp-idf-4.3/components/esp_common/src/task_wdt.c:182 (discriminator 1)
0x40081dad: _xt_lowint1 at /home/opt/esp-idf-4.3/components/freertos/port/xtensa/xtensa_vectors.S:1105
0x401291fb: cpu_ll_waiti at /home/opt/esp-idf-4.3/components/hal/esp32/include/hal/cpu_ll.h:183
 (inlined by) esp_pm_impl_waiti at /home/opt/esp-idf-4.3/components/esp_pm/pm_impl.c:824
0x400da7f2: esp_vApplicationIdleHook at /home/opt/esp-idf-4.3/components/esp_common/src/freertos_hooks.c:63
0x40089544: prvIdleTask at /home/opt/esp-idf-4.3/components/freertos/tasks.c:3825

I'll write again the files containing the Wifi initialization code:

Sming/Arch/Esp32/Components/esp32/src/startup.cpp
Sming/Components/Network/Arch/Esp32/Platform/init.cpp
Sming/Components/Network/Arch/Esp32/Platform/StationImpl.cpp
mikee47 commented 2 years ago

@kmihaylov The debugging I was after is actually LOG_DEFAULT_LEVEL=5 (verbose) - sorry for misdirecting you, but glad you learned something from it!

With your sample, the WDT reset looks like this:

V (4470) esp_adapter: thread sem get: sem=3fc96e54
E (11440) task_wdt: Task watchdog got triggered. The following tasks did not reset the watchdog in time:
E (11440) task_wdt:  - Sming (CPU 0)
E (11440) task_wdt: Tasks currently running:
E (11440) task_wdt: CPU 0: IDLE
E (11440) task_wdt: Aborting.

Looks like a semaphore is involved, so probably a deadlock somewhere. Turns out it's the event handling loop, not the networking at all! Sming doesn't use the default event loop but creates its own so that all software timers and queued callbacks run in the Sming thread context. See https://github.com/SmingHub/Sming/pull/2371 for details.

It could be that the IDF expects the event loop to be running in a separate thread, still investigating the exact reason.

I've opened PR #2543 (work in progress) so you can try make CREATE_EVENT_TASK=1. This is NOT a solution!

Asanga-Viraj commented 1 year ago

Hi All,

I fixed this issue... (I have no idea whether this issue was fixed or not) It is the event loop issue. So have to apply changes into event_loop.cpp.

esp_event_loop_args_t loop_args = {
    .queue_size = CONFIG_ESP_SYSTEM_EVENT_QUEUE_SIZE,
    .task_name = "sys_evt",        
    .task_priority = ESP_TASKD_EVENT_PRIO,
    .task_stack_size = ESP_TASKD_EVENT_STACK,
    .task_core_id = 0
};
mikee47 commented 1 year ago

Yes, this issue was fixed in #2543 - you can find an explanation of the cause there.

Asanga-Viraj commented 1 year ago

@mikee47 Thanks.

mikee47 commented 7 months ago

Closing as fixed by #2543