SmingHub / Sming

Sming - powerful open source framework simplifying the creation of embedded C++ applications.
https://sming.readthedocs.io
GNU Lesser General Public License v3.0
1.48k stars 347 forks source link

Crash after OTA update from 3.8.0 to current develop #1765

Closed verybadsoldier closed 5 years ago

verybadsoldier commented 5 years ago

Well, OTA updates from a 3.8.0-fw to the current version fails. The update works but when the controller boots with the new firmware then it instantly crashes.

I can reproduce like this:

The OTA-update from new-FW to another new-FW works fine.

The problem looks like this (log is quite large so I split it up on pastebin and give comments what happened):

Part 1- 3.8.0-FW is booting: https://pastebin.com/pNkRUqcv

Part 2 - OTA update is started and updates fine: https://pastebin.com/aGrbLE7k

Part 3 - Then the new firmware boots but crashes:


state: 5 -> 0 (0)
rm 0
pm close 7
58418809 TCP connection error: -8
del if0
usl
58419532 disconnect from ssid hsync, reason 8

58419783 AppWIFI::_STADisconnect reason - 8 - counter 0

 ets Jan  8 2013,rst cause:2, boot mode:(3,6)

load 0x40100000, len 2292, room 16 
tail 4
chksum 0xb7
load 0x3ffe8000, len 772, room 4 
tail 0
chksum 0xe9
csum 0xe9

rBoot v1.4.2 - richardaburton@gmail.com
Flash Size:   32 Mbit
Flash Mode:   QIO
Flash Speed:  40 MHz
rBoot Option: Big flash
rBoot Option: RTC data

Booting temp rom.
Booting rom 1.
Fatal exception (28): 
epc1=0x402137bc, epc2=0x00000000, epc3=0x40000f68, excvaddr=0x00000300, depc=0x00000000
Fatal exception (28): 
epc1=0x402137bc, epc2=0x00000000, epc3=0x40000f68, excvaddr=0x00000300, depc=0x00000000
Fatal exception (28): 
epc1=0x402137bc, epc2=0x00000000, epc3=0x40000f68, excvaddr=0x00000300, depc=0x00000000
Fatal exception (28): 
epc1=0x402137bc, epc2=0x00000000, epc3=0x40000f68, excvaddr=0x00000300, depc=0x00000000
Fatal exception (28): 
epc1=0x402137bc, epc2=0x00000000, epc3=0x40000f68, excvaddr=0x00000300, depc=0x00000000
Fatal exception (28): 
epc1=0x402137bc, epc2=0x00000000, epc3=0x40000f68, excvaddr=0x00000300, depc=0x00000000
Fatal exception (28): 

When doing the same but starting already from current-FW then the OTA works and the controller runs correctly: https://pastebin.com/uhbpUYA4

slaff commented 5 years ago

Did you create both firmwares using the same SDK version?

verybadsoldier commented 5 years ago

I am always using the SDK bundled with Sming. So yes, afaik it didn't change in a while.

verybadsoldier commented 5 years ago

So, I am back to this as I am upgrading the RGBWW project to Sming 4.0.0. And I am again encountering this error.

My latest stable firmware was based on Sming 3.8.0 which I think uses SDK 3.0.0: https://github.com/SmingHub/Sming/blob/3.8.0/Sming/third-party/ESP8266_NONOS_SDK/include/version.h

Now current Sming uses SDK 3.0.1.

So is this error expected and is there a solution? I am afraid not being able to OTA-update this is a showstopper for me :(

mikee47 commented 5 years ago

The dump tells you where it's crashing and, if it's the same code as above (28 LOAD_PROHIBITED) then the bad code is at 0x402137bc in your new image. The easiest way to diagnose that is to build it for GDB - assuming it's getting as far as init() - then you'll get a nice backtrace.

verybadsoldier commented 5 years ago

Ok, I tried that (well, I tried alot actually) but also the version compiled with ENABLE_GDB also crashes in the same way. If I understand correctly than that version was supposed to wait for a gdb to connect before starting the firmware. But it seems to crash for that wait happens.


But, I tried alot of other things to isolate this problem: It seems that this problem is somehow related to the data in the other boot slot: When having the firmware in slot 1 (slot 0 empty) then the crash happens (?!). Then when also filling slot 0 with the same firmware then it works (?!).

It may also be something on my end, maybe I misunderstand something or did some silly mistake. But at the moment I am out of ideas. Maybe you have an idea to push me in the right direction?

I will describe in detail one sequence of flashing/testing:

  1. make flashinit
  2. Controller is empty now. So flash rboot: esptool.py -p /dev/ttyUSB0 -b 500000 write_flash -ff 40m -fm qio -fs 32m 0x00000 out/firmware/rboot.bin
  3. Verify the expectations: rboot is in place but no valid ROM in any slot:

    
    vbs@ubuntu:~/Projects/esp_rgbww_firmware$ python2 -m serial.tools.miniterm /dev/ttyUSB0 38400
    --- Miniterm on /dev/ttyUSB0  38400,8,N,1 ---
    --- Quit: Ctrl+] | Menu: Ctrl+T | Help: Ctrl+T followed by Ctrl+H ---
    
    ets Jan  8 2013,rst cause:2, boot mode:(3,7)

load 0x40100000, len 2292, room 16 tail 4 chksum 0xb7 load 0x3ffe8000, len 772, room 4 tail 0 chksum 0xe9 csum 0xe9

rBoot v1.4.2 - richardaburton@gmail.com Flash Size: 32 Mbit Flash Mode: QIO Flash Speed: 40 MHz rBoot Option: Big flash rBoot Option: RTC data

Rom 0 is bad. Rom 1 is bad. No good rom available. user code done

4. Flash the sming4-firmware to slot _1_ (address `0x202000`)
`esptool.py -p /dev/ttyUSB0 -b 115200 write_flash -ff 40m -fm qio -fs 4MB 0x202000 out/Esp8266/debug/firmware/rom0.bin`

5. So, now in slot 1 we have the firmware and slot 0 is basically empty. Now booting slot 1 shows the error (!):

ets Jan 8 2013,rst cause:2, boot mode:(3,6)

load 0x40100000, len 1596, room 16 tail 12 chksum 0x0e ho 0 tail 12 room 4 load 0x3ffe8000, len 704, room 12 tail 4 chksum 0xe7 csum 0xe7

rBoot v1.4.2 - richardaburton@gmail.com Flash Size: 32 Mbit Flash Mode: QIO Flash Speed: 40 MHz rBoot Option: Big flash rBoot Option: RTC data

Writing default boot config. Rom 0 at 2000 is bad. Booting rom 1 at 202000, load addr 26dff0. Fatal exception (0): epc1=0x402139a4, epc2=0x00000000, epc3=0x00000000, excvaddr=0x00000000, depc=0x00000000 Fatal exception (0): epc1=0x402139a4, epc2=0x00000000, epc3=0x00000000, excvaddr=0x00000000, depc=0x00000000 Fatal exception (0):

So `rboot` detected that slot 0 is faulty and booted slot 1 which shows the crash.

6. Now I flash the same firmware *also* in slot 0, address 0x02000, (which is not in use anyway!):
`esptool.py -p /dev/ttyUSB0 -b 115200 write_flash -ff 40m -fm qio -fs 4MB 0x02000 out/Esp8266/debug/firmware/rom0.bin`

7. Now, `rboot` again boots slot 1 but now (after filling slot 0) it suddenly works! (?!)

vbs@ubuntu:~/Projects/esp_rgbww_firmware$ python2 -m serial.tools.miniterm /dev/ttyUSB0 38400 --- Miniterm on /dev/ttyUSB0 38400,8,N,1 --- --- Quit: Ctrl+] | Menu: Ctrl+T | Help: Ctrl+T followed by Ctrl+H ---

ets Jan 8 2013,rst cause:2, boot mode:(3,6)

load 0x40100000, len 1596, room 16 tail 12 chksum 0x0e ho 0 tail 12 room 4 load 0x3ffe8000, len 704, room 12 tail 4 chksum 0xe7 csum 0xe7

rBoot v1.4.2 - richardaburton@gmail.com Flash Size: 32 Mbit Flash Mode: QIO Flash Speed: 40 MHz rBoot Option: Big flash rBoot Option: RTC data

Booting rom 1 at 202000, load addr 26dff0. ota1 not set ota2 not set rf cal sector: 1019 freq trace enable 0 rf[112] : 00 rf[113] : 00 rf[114] : 01 w_flash

SDK ver: 3.1.0-T��U�JU��QjBQ�Z�TUZ��J�U�R�U�(���U�WT�TU�ŨXTU���խ�␋U�uZUT�u�QTUQU�XTU���խuj+�-�UPK����EUUU�(XUR����U�k��(���Vi+U�k���U-EZUQQ��jV�PŕUUEU␖U�UVK�KZ�U�kU�T]UZJZq␕QQ␕�T�U��UZ␋U��PUT �W�����ū��R��-��E���R�TT�␋U�EQ����␕Q�Ū�� ����,QU��U���U��U����ūU��U�T��+��U��V������␔��U(U��TQ�Q�VU�T�UU�␔U��TU�T�TU��T�U����T --- exit ---


---
When doing the same procedure but beginning with flashing the firmware in slot 0 then it works out of the box. The problem only appears when having the firmware in slot 1. Then it seems to matter what is in slot 0.
verybadsoldier commented 5 years ago

mistakenly closed...

verybadsoldier commented 5 years ago

Sorry for confusion: I just wanted to write a comment but accidently closed the issue. Finished my comment now...

verybadsoldier commented 5 years ago

The exact same thing can be reproduced with e.g. Basic_Blink. Just doing flashinit and then flashing the rom0.bin to 0x202000.

EDIT: So, to be clear: this problem seems to be totally unrelated to OTA updates for upgrading from old to new version (as opposed to the issue title).

verybadsoldier commented 5 years ago

@mikee47 Does anything I said make any sense to you? :)

mikee47 commented 5 years ago

@verybadsoldier Thank you for taking the time to break down this problem, I found it fairly quickly using Basic_Blink. To verify:

make flashinit flashboot

produces

Rom 0 at 2000 is bad.
Rom 1 at 202000 is bad.
No good rom available.
user code done

I then do:

esptool.py -p COM4 -b 921600 write_flash -ff 40m -fm dio -fs 4MB 0x00000 out/Esp8266/debug/firmware/rboot.bin 0x202000 out/Esp8266/debug/firmware/rom0.bin

and get normal operation:

Booting rom 1 at 202000, load addr 230990.
rf cal sector: 1019
verybadsoldier commented 5 years ago

Hm sorry, not sure if I understand it: so the problem did not occur on your end?

mikee47 commented 5 years ago

Yes, I got the same problem. Try this patch, see if it fixes it for you:

diff --git a/Sming/Components/rboot/component.mk b/Sming/Components/rboot/component.mk
index d96533cff..c31350118 100644
--- a/Sming/Components/rboot/component.mk
+++ b/Sming/Components/rboot/component.mk
@@ -117 +117 @@ CUSTOM_TARGETS          += $(LIBMAIN_RBOOT_DST)
-COMPONENT_LDFLAGS      := -u Cache_Read_Enable_New
+EXTRA_LDFLAGS          := -u Cache_Read_Enable_New
verybadsoldier commented 5 years ago

Fixed by #1900