espressif / esp-idf

Espressif IoT Development Framework. Official development framework for Espressif SoCs.
Apache License 2.0
13.78k stars 7.31k forks source link

OTA downgrade of 3.0 firmware to 3.1 or 3.2 bootloader causes boot loop / brick. (IDFGH-1692) #3932

Closed vonnieda closed 3 years ago

vonnieda commented 5 years ago

Environment

Problem Description

I am in the process of trying to upgrade my app from 3.0 to 3.2. I have done a make erase_flash followed by a make flash to wipe the bootloader, all NVS, and all firmware. The 3.2 firmware installs and works fine. Then, if I perform an OTA downgrade to my previous 3.0 firmware the board goes into a boot loop and is unrecoverable without a flash erase. This also happens for a 3.1 to 3.0 update.

rst:0xc (SW_CPU_RESET),boot:0x3f (SPI_FAST_FLASH_BOOT)
configsip: 0, SPIWP:0xee
clk_drv:0x00,q_drv:0x00,d_drv:0x00,cs0_drv:0x00,hd_drv:0x00,wp_drv:0x00
mode:DIO, clock div:2
load:0x3fff0018,len:4
load:0x3fff001c,len:6264
load:0x40078000,len:10496
ho 0 tail 12 room 4
load:0x40080400,len:6660
entry 0x40080768
I (31) boot: ESP-IDF v3.2.2-120-g90747cc8b 2nd stage bootloader
I (31) boot: compile time 15:56:35
I (31) boot: Enabling RNG early entropy source...
I (37) boot: SPI Speed      : 40MHz
I (41) boot: SPI Mode       : DIO
I (45) boot: SPI Flash Size : 8MB
I (49) boot: Partition Table:
I (53) boot: ## Label            Usage          Type ST Offset   Length
I (60) boot:  0 ota_0            OTA app          00 10 00010000 001db000
I (67) boot:  1 ota_1            OTA app          00 11 001f0000 001db000
I (75) boot:  2 nvs              WiFi data        01 02 003cb000 00004000
I (82) boot:  3 otadata          OTA data         01 00 003cf000 00002000
I (90) boot:  4 phy_init         RF data          01 01 003d1000 00002000
I (97) boot:  5 coredump         Unknown data     01 03 003d3000 00010000
I (105) boot: End of partition table
I (109) esp_image: segment 0: paddr=0x001f0020 vaddr=0x3f400020 size=0x37994 (227732) map
I (198) esp_image: segment 1: paddr=0x002279bc vaddr=0x3ffc0000 size=0x0367c ( 13948) load
I (204) esp_image: segment 2: paddr=0x0022b040 vaddr=0x40080000 size=0x00400 (  1024) load
0x40080000: _WindowOverflow4 at /Users/jason/esp/esp-idf/components/freertos/xtensa_vectors.S:1779

I (205) esp_image: segment 3: paddr=0x0022b448 vaddr=0x40080400 size=0x04bc8 ( 19400) load
I (222) esp_image: segment 4: paddr=0x00230018 vaddr=0x400d0018 size=0xef73c (980796) map
0x400d0018: _flash_cache_start at ??:?

I (567) esp_image: segment 5: paddr=0x0031f75c vaddr=0x40084fc8 size=0x1229c ( 74396) load
0x40084fc8: coex_arbit_insert at ??:?

I (598) esp_image: segment 6: paddr=0x00331a00 vaddr=0x400c0000 size=0x00000 (     0) load
I (611) boot: Loaded app from partition at offset 0x1f0000
I (611) boot: Disabling RNG early entropy source...
Guru Meditation Error: Core  0 panic'ed (LoadProhibited)
. Exception was unhandled.
Register dump:
PC      : 0x4008289f  PS      : 0x00060730  A0      : 0x80082f2c  A1      : 0x3ffe3b90  
0x4008289f: DPORT_READ_PERI_REG at /Users/jason/esp/esp-idf/components/soc/esp32/include/soc/dport_access.h:170
 (inlined by) psram_cache_init at /Users/jason/esp/esp-idf/components/esp32/spiram_psram.c:839

A2      : 0xf7430f09  A3      : 0x000000e1  A4      : 0x0ffd114c  A5      : 0x00000000  
A6      : 0x0000003f  A7      : 0x00000200  A8      : 0x00000800  A9      : 0xfffff3ff  
A10     : 0x3ff49060  A11     : 0x0ffd104f  A12     : 0x00000034  A13     : 0x00000000  
A14     : 0x00000000  A15     : 0x00000004  SAR     : 0x00000017  EXCCAUSE: 0x0000001c  
EXCVADDR: 0xf7430f09  LBEG    : 0x4009642c  LEND    : 0x40096437  LCOUNT  : 0x00000000  
0x4009642c: lmacTxFrame at ??:?

0x40096437: lmacTxFrame at ??:?

Backtrace: 0x4008289f:0x3ffe3b90 0x40082f29:0x3ffe3bb0 0x400d1731:0x3ffe3bd0 0x40081353:0x3ffe3bf0 0x40079267:0x3ffe3c30 0x40079319:0x3ffe3c60 0x40079337:0x3ffe3ca0 0x40079665:0x3ffe3cc0 0x4008079a:0x3ffe3df0 0x40007c31:0x3ffe3eb0 0x4000073d:0x3ffe3f20
0x4008289f: DPORT_READ_PERI_REG at /Users/jason/esp/esp-idf/components/soc/esp32/include/soc/dport_access.h:170
 (inlined by) psram_cache_init at /Users/jason/esp/esp-idf/components/esp32/spiram_psram.c:839

0x40082f29: psram_gpio_config at /Users/jason/esp/esp-idf/components/esp32/spiram_psram.c:581

0x400d1731: esp_fill_random at /Users/jason/esp/esp-idf/components/esp32/hw_random.c:61 (discriminator 1)

0x40081353: start_cpu0_default at /Users/jason/esp/esp-idf/components/esp32/cpu_start.c:340

E (690) esp_core_dump: Core dump flash config is corrupted! CRC=0xffffffff instead of 0x0
Rebooting...
ets Jun  8 2016 00:22:57

One thing I noticed of interest is that when my OTA from 3.0 to 3.0 finishes, I see these messages:

I (42851) esp_image: segment 0: paddr=0x001f0020 vaddr=0x3f400020 size=0x37994 (227732) map
I (43001) esp_image: segment 1: paddr=0x002279bc vaddr=0x3ffc0000 size=0x0367c ( 13948) 
I (43011) esp_image: segment 2: paddr=0x0022b040 vaddr=0x40080000 size=0x00400 (  1024) 
0x40080000: _WindowOverflow4 at /Users/jason/esp/esp-idf/components/freertos/./xtensa_vectors.S:1685

I (43031) esp_image: segment 3: paddr=0x0022b448 vaddr=0x40080400 size=0x04bc8 ( 19400) 
I (43051) esp_image: segment 4: paddr=0x00230018 vaddr=0x400d0018 size=0xef73c (980796) map
0x400d0018: _flash_cache_start at ??:?

I (43611) esp_image: segment 5: paddr=0x0031f75c vaddr=0x40084fc8 size=0x1229c ( 74396) 
0x40084fc8: _xt_medint3 at /Users/jason/esp/esp-idf/components/freertos/./xtensa_vectors.S:1256

I (43661) esp_image: segment 6: paddr=0x00331a00 vaddr=0x400c0000 size=0x00000 (     0) 
I (43681) esp_image: segment 0: paddr=0x001f0020 vaddr=0x3f400020 size=0x37994 (227732) map
I (43811) esp_image: segment 1: paddr=0x002279bc vaddr=0x3ffc0000 size=0x0367c ( 13948) 
I (43831) esp_image: segment 2: paddr=0x0022b040 vaddr=0x40080000 size=0x00400 (  1024) 
0x40080000: _WindowOverflow4 at /Users/jason/esp/esp-idf/components/freertos/./xtensa_vectors.S:1685

I (43851) esp_image: segment 3: paddr=0x0022b448 vaddr=0x40080400 size=0x04bc8 ( 19400) 
I (43871) esp_image: segment 4: paddr=0x00230018 vaddr=0x400d0018 size=0xef73c (980796) map
0x400d0018: _flash_cache_start at ??:?

I (44401) esp_image: segment 5: paddr=0x0031f75c vaddr=0x40084fc8 size=0x1229c ( 74396) 
0x40084fc8: _xt_medint3 at /Users/jason/esp/esp-idf/components/freertos/./xtensa_vectors.S:1256

I (44451) esp_image: segment 6: paddr=0x00331a00 vaddr=0x400c0000 size=0x00000 (     0) 

And when the OTA from 3.1 to 3.0 finished, I saw these:

I (52914) esp_image: segment 0: paddr=0x001f0020 vaddr=0x3f400020 size=0x37994 (227732) map
I (53054) esp_image: segment 1: paddr=0x002279bc vaddr=0x3ffc0000 size=0x0367c ( 13948) 
I (53084) esp_image: segment 2: paddr=0x0022b040 vaddr=0x40080000 size=0x00400 (  1024) 
0x40080000: _WindowOverflow4 at /Users/jason/esp/esp-idf/components/freertos/xtensa_vectors.S:1779

I (53094) esp_image: segment 3: paddr=0x0022b448 vaddr=0x40080400 size=0x04bc8 ( 19400) 
I (53114) esp_image: segment 4: paddr=0x00230018 vaddr=0x400d0018 size=0xef73c (980796) map
0x400d0018: _flash_cache_start at ??:?

I (53614) esp_image: segment 5: paddr=0x0031f75c vaddr=0x40084fc8 size=0x1229c ( 74396) 
0x40084fc8: DPORT_SEQUENCE_REG_READ at /Users/jason/esp/esp-idf/components/spi_flash/flash_mmap.c:319
 (inlined by) spi_flash_mmap_pages at /Users/jason/esp/esp-idf/components/spi_flash/flash_mmap.c:193

I (53654) esp_image: segment 6: paddr=0x00331a00 vaddr=0x400c0000 size=0x00000 (     0) 
I (53674) esp_image: segment 0: paddr=0x001f0020 vaddr=0x3f400020 size=0x37994 (227732) map
I (53824) esp_image: segment 1: paddr=0x002279bc vaddr=0x3ffc0000 size=0x0367c ( 13948) 
I (53854) esp_image: segment 2: paddr=0x0022b040 vaddr=0x40080000 size=0x00400 (  1024) 
0x40080000: _WindowOverflow4 at /Users/jason/esp/esp-idf/components/freertos/xtensa_vectors.S:1779

I (53874) esp_image: segment 3: paddr=0x0022b448 vaddr=0x40080400 size=0x04bc8 ( 19400) 
I (53894) esp_image: segment 4: paddr=0x00230018 vaddr=0x400d0018 size=0xef73c (980796) map
0x400d0018: _flash_cache_start at ??:?

I (54404) esp_image: segment 5: paddr=0x0031f75c vaddr=0x40084fc8 size=0x1229c ( 74396) 
0x40084fc8: DPORT_SEQUENCE_REG_READ at /Users/jason/esp/esp-idf/components/spi_flash/flash_mmap.c:319
 (inlined by) spi_flash_mmap_pages at /Users/jason/esp/esp-idf/components/spi_flash/flash_mmap.c:193

I (54464) esp_image: segment 6: paddr=0x00331a00 vaddr=0x400c0000 size=0x00000 (     0) 

Of interest is the DPORT_SEQUENCE_REG_READ which is not present in the 3.0 to 3 .0.

Additionally, when my OTA completes I call esp_restart(), and that is also failing - it just locks up the task. It's not until I perform a forced reset that the boot loop starts.

To summarize:

In all cases, the serially flashed firmware always works fine. It's not until the OTA completes that the problem happens.

Expected Behavior

Doesn't boot loop.

Actual Behavior

Does boot loop.

Steps to repropduce

  1. Flash a bootloader and firmware built with the 3.2 version referenced above.
  2. Perform an OTA of a firmware built with the 3.0 version referenced above.
  3. Reboot.
  4. Observe boot loop.

Code to reproduce this issue

Code is proprietary, but given @negativekelvin's comment and issue #3865 it does not seem to be code specific.

Other items if possible

sdkconfig.clean.txt

negativekelvin commented 5 years ago

https://github.com/espressif/esp-idf/issues/3865

vonnieda commented 5 years ago

Thanks @negativekelvin. I have just tested the workaround in that issue and it does not fix the problem for me. I've updated the issue above with additional details.

projectgus commented 5 years ago

Hi @vonnieda ,

We don't support running an older ESP-IDF app version than the bootloader version (it probably works in some cases, but it won't work in all cases). Sorry that this hasn't been documented clearly until now, we will add the documentation for this.

(On the other hand, bootloaders are forward compatible - so you should expect that an earlier ESP-IDF bootloader will boot a later ESP-IDF app, although #3865 is currently an exception to this.)

If you need to run an app built from IDF v3.0 alongside one built from a newer version, the best solution is to flash the v3.0 bootloader binary to all the devices.

vonnieda commented 5 years ago

Thanks @projectgus,

Can you clarify what level of granularity is backwards compatible? For instance, Are sub-revisions both forwards and backwards compatible, or do I need to ensure that I am always flashing firmware that is older than or equal to the bootloader?

To be clear, I mean like 3.0.x. Am I assured that a 3.0.7 bootloader will boot a 3.0.2 firmware?

Thanks, Jason

projectgus commented 5 years ago

Hi @vonnieda,

The IDF app should always be a newer or equal IDF version to the bootloader, including the bugfix version field. There may be bugfix updates that change bootloader behaviour to fix flash related issues, and we can't guarantee that older ESP-IDF apps will be able to continue booting when a newer bootloader passes over control to them.

(That said, most of the time a newer bugfix release bootloader will be able to boot an app from an older bugfix release, but this isn't guaranteed and we don't test for it.)

Our assumption is that in mass production most customers will keep using the same bootloader binary, even if the initial factory app flash is progressively updated to keep up with app updates. If all devices in the field have the same bootloader then this is one less variable for device configurations.

Similarly, if the bootloader does get updated then it would be because of something like a hardware revision or a major configuration change where older app binaries may not continue to work at all, so everything would be updated at once in this case.

If you have a use case that challenges this assumption then please let us know about it and we'll consider if we can relax the requirements. It's difficult though: already testing forward-compatibility of bootloaders is quite complex, adding forward- and backward-compatibility would be an order of magnitude increase in complexity.

vonnieda commented 5 years ago

Thanks @projectgus,

My use case doesn't specifically challenge the assumption, but this is a bit of a surprise to me. I don't remember ever having seen anything in the docs that insinuated I needed to continue to use an old bootloader for newer versions of firmware compiled against newer ESP-IDF.

In my case, my product is quite long lived and we provide firmware updates continuously. I've been staying on ESP-IF 3.0 because upgrading to 3.1 or 3.2 was somewhat problematic, but I decided it was finally time because there are improvements and new features I wanted to use in 3.1 and 3.2.

Given this, it seems like I need to continue to flash my old / original bootloader on new devices, even if I am flashing newer firmware. So, during factory setup I will be flashing, say, bootloader 3.0 with firmware 3.2 and someday 3.3 and maybe eventually 4.0.

Do you foresee any issues with this? Or alternately, is this something you'd recommend against? Would it be more better to just stick with the ESP-IDF version that I started with for the lifetime of a product and forego improvements in IDF?

Thanks, Jason

negativekelvin commented 5 years ago

A single bootloader will simplify things as far as testing and support but your other option is to start shipping a newer bootloader and disable downgrades.

projectgus commented 5 years ago

Hi Jason,

I don't remember ever having seen anything in the docs that insinuated I needed to continue to use an old bootloader for newer versions of firmware compiled against newer ESP-IDF.

No, we definitely haven't explained this in the docs and we'll add an explanation. Sorry again for the hassle.

So, during factory setup I will be flashing, say, bootloader 3.0 with firmware 3.2 and someday 3.3 and maybe eventually 4.0.

Do you foresee any issues with this? Or alternately, is this something you'd recommend against? .

That will work fine. Specifically, we assume that for customers with products in the field they will regularly OTA update to apps built with newer ESP-IDF, while keeping the same older bootloader.

I don't know what your product/hardware lifecycle is like, but at some point you may want to update the bootloader flashed in the factory as well. But you don't have to do this for every IDF version update, and you should ensure that the bootloader is no newer than the IDF used for the oldest app binary that you anticipate flashing to those devices.

As @negativekelvin says, overall this reduces the number of slightly different firmware version combinations you have in the field. So it should make testing simpler, overall.

Would it be more better to just stick with the ESP-IDF version that I started with for the lifetime of a product and forego improvements in IDF?

No, we recommend updating the ESP-IDF version used to build app firmwares periodically to get bug fixes, security fixes, etc. We're finalising a proper support period policy right now, so we'll be announcing this soon. Hopefully this will allow users to plan their ESP-IDF updates with more certainty.

vonnieda commented 5 years ago

@projectgus, @negativekelvin: Thank you both for your responses and for the additional information. This gives me enough to work with and I'll start testing our old bootloader with new firmware this week.

Thanks, Jason

projectgus commented 5 years ago

Thanks @vonnieda .

Have reopened for now because we do plan to fix an issue here (documenting the bootloader version limitations). Will close the issue once we have this in docs.

Alvin1Zhang commented 3 years ago

Thanks for reporting, sorry for slow turnaround, the fix is available at https://github.com/espressif/esp-idf/commit/0fb8f86705929926806f2b7402e11773f815c10b, feel free to reopen.