espressif / esp-mdf

Espressif Mesh Development Framework, limited maintain, recommend to use https://github.com/espressif/esp-mesh-lite
Other
766 stars 249 forks source link

ESP32 doesn't boot with mupgrade fallback version flag with two cores after IDF 4.4.2 update (IDFGH-7996) #303

Open VitorBFreitas opened 1 year ago

VitorBFreitas commented 1 year ago

Environment

Problem Description

ESP32 doesn't complete boot with mupgrade fallback version flag with two cores enabled after update from idf 4.4.1 to 4.4.2

Expected Behavior

ESP32 boot normally

Actual Behavior

ESP32 freeze while booting

Steps to reproduce

  1. create a project with mupgrade example
  2. open menuconfig
  3. disable "Run FreeRTOS only on first core" flag
  4. enable "Fall back to the previous version" flag
  5. Build and Flash

Code to reproduce this issue

Mupgrade example at esp-mdf\examples\function_demo\mupgrade

Debug Logs

ets Jul 29 2019 12:21:46

rst:0x1 (POWERON_RESET),boot:0x13 (SPI_FAST_FLASH_BOOT)
configsip: 0, SPIWP:0xee
clk_drv:0x00,q_drv:0x00,d_drv:0x00,cs0_drv:0x00,hd_drv:0x00,wp_drv:0x00
mode:DIO, clock div:2
load:0x3fff0030,len:6664
load:0x40078000,len:14848
load:0x40080400,len:3792
0x40080400: _init at ??:?

entry 0x40080694
I (27) boot: ESP-IDF v4.4.2-dirty 2nd stage bootloader
I (27) boot: compile time 13:25:51
I (27) boot: chip revision: 3
I (31) boot_comm: chip revision: 3, min. bootloader chip revision: 0
I (38) boot.esp32: SPI Speed      : 40MHz
I (42) boot.esp32: SPI Mode       : DIO
I (47) boot.esp32: SPI Flash Size : 4MB
I (51) boot: Enabling RNG early entropy source...
I (57) boot: Partition Table:
I (60) boot: ## Label            Usage          Type ST Offset   Length
I (68) boot:  0 nvs              WiFi data        01 02 00009000 00005000
I (75) boot:  1 otadata          OTA data         01 00 0000e000 00002000
I (83) boot:  2 app0             OTA app          00 10 00010000 001b0000
I (90) boot:  3 app1             OTA app          00 11 001c0000 001b0000
I (98) boot:  4 spiffs           Unknown data     01 82 00370000 00060000
I (105) boot:  5 coredump         Unknown data     01 03 003d0000 00010000
I (113) boot:  6 reserved         Unknown data     01 fe 003e0000 00020000
I (120) boot: End of partition table
I (124) boot: No factory image, trying OTA 0
I (129) boot_comm: chip revision: 3, min. application chip revision: 0
I (137) esp_image: segment 0: paddr=00010020 vaddr=3f400020 size=38a60h (232032) map
I (229) esp_image: segment 1: paddr=00048a88 vaddr=3ffb0000 size=03c60h ( 15456) load
I (235) esp_image: segment 2: paddr=0004c6f0 vaddr=40080000 size=03928h ( 14632) load
I (242) esp_image: segment 3: paddr=00050020 vaddr=400d0020 size=dd4d0h (906448) map
I (571) esp_image: segment 4: paddr=0012d4f8 vaddr=40083928 size=11bd4h ( 72660) load
I (601) esp_image: segment 5: paddr=0013f0d4 vaddr=50000000 size=00010h (    16) load
I (612) boot: Loaded app from partition at offset 0x10000
I (613) boot: Set actual ota_seq=1 in otadata[0]
I (613) boot: Disabling RNG early entropy source...
I (629) cpu_start: Pro cpu up.
D (629) efuse: In EFUSE_BLK0__DATA3_REG is used 1 bits starting with 15 bit
D (630) efuse: In EFUSE_BLK0__DATA5_REG is used 1 bits starting with 20 bit
D (638) efuse: In EFUSE_BLK0__DATA3_REG is used 3 bits starting with 9 bit
D (645) efuse: In EFUSE_BLK0__DATA3_REG is used 1 bits starting with 2 bit
I (652) cpu_start: Starting app cpu, entry point is 0x4008119c
0x4008119c: call_start_cpu1 at C:/esp/esp-idf/components/esp_system/port/cpu_start.c:160

I (0) cpu_start: App cpu up.
D (666) clk: RTC_SLOW_CLK calibration value: 3282445
I (679) cpu_start: Pro cpu start user code
I (679) cpu_start: cpu freq: 160000000
I (679) cpu_start: Application information:
I (683) cpu_start: Project name:     mesh-shopBA
I (688) cpu_start: App version:      2.1.13
I (693) cpu_start: Compile time:     Aug  5 2022 13:25:51
I (699) cpu_start: ELF file SHA256:  b8d1da9f0e1e8a22...
I (705) cpu_start: ESP-IDF:          v4.4.2-dirty
D (711) memory_layout: Checking 7 reserved memory ranges:
D (716) memory_layout: Reserved memory range 0x3ffae000 - 0x3ffae6e0
D (723) memory_layout: Reserved memory range 0x3ffb0000 - 0x3ffb8818
D (729) memory_layout: Reserved memory range 0x3ffe0000 - 0x3ffe0440
D (735) memory_layout: Reserved memory range 0x3ffe3f20 - 0x3ffe4350
D (742) memory_layout: Reserved memory range 0x40070000 - 0x40078000
D (748) memory_layout: Reserved memory range 0x40078000 - 0x40080000
0x40080000: _WindowOverflow4 at C:/esp/esp-idf/components/freertos/port/xtensa/xtensa_vectors.S:1736

D (755) memory_layout: Reserved memory range 0x40080000 - 0x400954fc
0x40080000: _WindowOverflow4 at C:/esp/esp-idf/components/freertos/port/xtensa/xtensa_vectors.S:1736

D (761) memory_layout: Building list of available memory regions:
D (767) memory_layout: Available memory region 0x3ffae6e0 - 0x3ffb0000
D (774) memory_layout: Available memory region 0x3ffb8818 - 0x3ffc0000
D (781) memory_layout: Available memory region 0x3ffc0000 - 0x3ffc2000
D (787) memory_layout: Available memory region 0x3ffc2000 - 0x3ffc4000
D (794) memory_layout: Available memory region 0x3ffc4000 - 0x3ffc6000
D (800) memory_layout: Available memory region 0x3ffc6000 - 0x3ffc8000
D (807) memory_layout: Available memory region 0x3ffc8000 - 0x3ffca000
D (814) memory_layout: Available memory region 0x3ffca000 - 0x3ffcc000
D (820) memory_layout: Available memory region 0x3ffcc000 - 0x3ffce000
D (827) memory_layout: Available memory region 0x3ffce000 - 0x3ffd0000
D (833) memory_layout: Available memory region 0x3ffd0000 - 0x3ffd2000
D (840) memory_layout: Available memory region 0x3ffd2000 - 0x3ffd4000
D (846) memory_layout: Available memory region 0x3ffd4000 - 0x3ffd6000
D (853) memory_layout: Available memory region 0x3ffd6000 - 0x3ffd8000
D (860) memory_layout: Available memory region 0x3ffd8000 - 0x3ffda000
D (866) memory_layout: Available memory region 0x3ffda000 - 0x3ffdc000
D (873) memory_layout: Available memory region 0x3ffdc000 - 0x3ffde000
D (879) memory_layout: Available memory region 0x3ffde000 - 0x3ffe0000
D (886) memory_layout: Available memory region 0x3ffe0440 - 0x3ffe3f20
D (893) memory_layout: Available memory region 0x3ffe4350 - 0x3ffe8000
D (899) memory_layout: Available memory region 0x3ffe8000 - 0x3fff0000
D (906) memory_layout: Available memory region 0x3fff0000 - 0x3fff8000
D (912) memory_layout: Available memory region 0x3fff8000 - 0x3fffc000
D (919) memory_layout: Available memory region 0x3fffc000 - 0x40000000
D (926) memory_layout: Available memory region 0x400954fc - 0x40096000
D (932) memory_layout: Available memory region 0x40096000 - 0x40098000
D (939) memory_layout: Available memory region 0x40098000 - 0x4009a000
D (945) memory_layout: Available memory region 0x4009a000 - 0x4009c000
D (952) memory_layout: Available memory region 0x4009c000 - 0x4009e000
D (959) memory_layout: Available memory region 0x4009e000 - 0x400a0000
I (965) heap_init: Initializing. RAM available for dynamic allocation:
D (973) heap_init: New heap initialised at 0x3ffae6e0
I (978) heap_init: At 3FFAE6E0 len 00001920 (6 KiB): DRAM
D (984) heap_init: New heap initialised at 0x3ffb8818
I (989) heap_init: At 3FFB8818 len 000277E8 (157 KiB): DRAM
I (995) heap_init: At 3FFE0440 len 00003AE0 (14 KiB): D/IRAM
I (1001) heap_init: At 3FFE4350 len 0001BCB0 (111 KiB): D/IRAM
D (1008) heap_init: New heap initialised at 0x400954fc
I (1013) heap_init: At 400954FC len 0000AB04 (42 KiB): IRAM
D (1020) intr_alloc: Connected src 46 to int 2 (cpu 0)
D (1025) FLASH_HAL: extra_dummy: 1
D (1028) spi_flash: trying chip: issi
D (1032) spi_flash: trying chip: gd
D (1035) spi_flash: trying chip: mxic
D (1039) spi_flash: trying chip: winbond
D (1043) spi_flash: trying chip: generic
I (1047) spi_flash: detected chip: generic
I (1052) spi_flash: flash io: dio
D (1056) cpu_start: calling init function: 0x40192c20
0x40192c20: _GLOBAL__sub_I___cxa_get_globals_fast at /builds/idf/crosstool-NG/.build/HOST-x86_64-w64-mingw32/xtensa-esp32-elf/src/gcc/libstdc++-v3/libsupc++/eh_globals.cc:145

D (1061) cpu_start: calling init function: 0x40103694
0x40103694: s_set_default_wifi_log_level at C:/esp/esp-idf/components/esp_wifi/src/wifi_init.c:63

D (1066) cpu_start: calling init function: 0x400f7704
0x400f7704: esp_ipc_init at C:/esp/esp-idf/components/esp_ipc/src/esp_ipc.c:105

D (1072) cpu_start: calling init function: 0x400ea0b0
0x400ea0b0: mupgrade_partition_switch at C:/esp/esp-mdf/components/mupgrade/mupgrade_check.c:134

D (1076) [mupgrade_check, 138]: Add an identifier to the firmware: ** MUPGRADE_FIRMWARE_FLAG **
D (1085) cpu_start: calling init function: 0x400d4164
0x400d4164: esp_reset_reason_init at C:/esp/esp-idf/components/esp_system/port/soc/esp32/reset_reason.c:68

D (1090) cpu_start: calling init function: 0x400d355c
0x400d355c: esp_ota_init_app_elf_sha256 at C:/esp/esp-idf/components/app_update/esp_app_desc.c:68

D (1095) intr_alloc: Connected src 17 to int 3 (cpu 0)
D (1101) intr_alloc: Connected src 24 to int 9 (cpu 0)
I (1106) cpu_start: Starting scheduler on PRO CPU.
D (1112)D (1112) intr_alloc: Connected src 25 to int 2 (cpu 1)
I (1117) cpu_start: Starting scheduler on APP CPU.
 partition: Loading the partition table
D (1127) partition: Partition table MD5 verified
Jiangyafeng commented 1 year ago

@VitorBFreitas Thank you very much for this question. It is true that the latest idf v4.4.2 has this problem in adapting to mdf, and this problem will be fixed as soon as possible. If there is any new progress, it will be updated as soon as possible.

Jiangyafeng commented 1 year ago

@VitorBFreitas You can not enable Component config -> MDF Mupgrade -> Fall back to the previous version. There is no problem.

VitorBFreitas commented 1 year ago

@Jiangyafeng Unfortunetly this is a feature that I'm using in my actual project to prevent loose communication with MQTT broker case a bad firmware is flashed.

mmrein commented 1 year ago

Is there some fix for this? Does anyone know what causes this? Does v4.4.3 have same problem?

I'm planning to make some changes in my project which would require some safety measures with possibility to revert to previous firmware in case wrong image has been flashed.

VitorBFreitas commented 1 year ago

@mmrein It's occurs with esp-idf v4.4.3 as well.

mmrein commented 1 year ago

I didn't realize this occurs even after first flash, not just after OTA. So I can confirm this bug is still present in v4.4.4

mmrein commented 1 year ago

Found some kind of clue - with debul log level set to Verbose the app seems to be starting, but on level Debug or less it freezes:

--- LOG OUTPUT DEFAULT VERBOSITY: DEBUG
D (1564) intr_alloc: Connected src 24 to int 3 (cpu 0) 
I (1569) cpu_start: Starting scheduler on PRO CPU. 
D (1574) intr_alloc: Connected src 25 to int 2 (cpu 1) 
I (1579) cpu_start: Starting scheduler on APP CPU. 
D (1574) partition: Loading the partition table 
D (1594) partition: Partition table MD5 verified
--- FREEZE
--- LOG OUTPUT DEFAULT VERBOSITY: VERBOSE
D (2099) intr_alloc: Connected src 24 to int 3 (cpu 0) 
I (2104) cpu_start: Starting scheduler on PRO CPU. 
V (0) intr_alloc: esp_intr_alloc_intrstatus (cpu 1): checking args 
V (2116) intr_alloc: esp_intr_alloc_intrstatus (cpu 1): Args okay. Resulting flags 0x40E 
D (2124) intr_alloc: Connected src 25 to int 2 (cpu 1) 
I (2129) cpu_start: Starting scheduler on APP CPU. 
D (2110) partition: Loading the partition table 
V (2145) calculated md5: 0x3ffb7cb8   86 8a 0e ec e7 f1 b4 8a  6f 58 5b ed dd 1e ed e3  |........oX[.....| 
V (2149) stored md5: 0x3f45e190   86 8a 0e ec e7 f1 b4 8a  6f 58 5b ed dd 1e ed e3  |........oX[.....| 
D (2158) partition: Partition table MD5 verified 
D (2167) nvs: nvs_open_from_partition ESP-MDF 1 
D (2168) nvs: nvs_set_blob mupgrade_count 4 
D (2172) nvs: nvs_close 1 
D (2174) [mupgrade_check, 127]: version_fallback_task exit 
D (2180) heap_init: New heap initialised at 0x3ffe0440 
D (2185) heap_init: New heap initialised at 0x3ffe4350 
I (2190) spiram: Reserving pool of 32K of internal memory for DMA/internal allocations 
D (2199) spiram: Allocating block of size 32768 bytes 
V (2204) intr_alloc: esp_intr_alloc_intrstatus (cpu 0): checking args 
V (2204) intr_alloc: esp_intr_alloc_intrstatus (cpu 0): Args okay. Resulting flags 0xE 
D (2204) intr_alloc: Connected src 16 to int 9 (cpu 0) 
--- GOING...

I tried to setup proper JTAG GDB debug in Eclipse to try and see where exactly it gets stuck, but as this happens before entering app_main I could not debug anything.

So far it seems like it is getting stuck somewhere in mupgrade_version_fallback_task. I'm not sure about inner workings of that task, but what also got my suspicion (aside from missing brackets in the comparison) was that the task is created without specifying core. So I tried doing just that and so far it seems to be starting normally:

xTaskCreatePinnedToCore(mupgrade_version_fallback_task, "mupgrade_version_fallback", 4 * 1024,
                NULL, CONFIG_MDF_TASK_DEFAULT_PRIOTY + 5, NULL, CONFIG_MDF_TASK_PINNED_TO_CORE);

It worth noting that it started when I used both 0 or 1 instead of CONFIG_MDF_TASK_PINNED_TO_CORE macro, so this may as well be just coincidence. I did not try if the actual fallback process works and as I see it I probably never will and rather implement it using IDF OTA fallback functions.

VitorBFreitas commented 1 year ago

Thanks for share your tests! I'm sure this will help those who get stuck on it.