Hieromon / AutoConnect

An Arduino library for ESP8266/ESP32 WLAN configuration at runtime with the Web interface
https://hieromon.github.io/AutoConnect/
MIT License
903 stars 189 forks source link

OTA with firmware.bin not writing #284

Closed brad-colbert closed 3 years ago

brad-colbert commented 3 years ago

I'm trying to use the default _ac and Update menu tab to update the firmware but it's not completing. I'm not sure why. I'm able to update files in SPIFFS using the Update option (thank you for adding that capability). Anything I can do to debug?

Hieromon commented 3 years ago

AutoConnectOTA has been fully tested and should work in a typical environment. You can enable the AC_DEBUG option and monitor what is happening inside AutoConnect with a serial monitor. We may be able to improve your situation with more information. Can you post the information that is needed for problem diagnosis according to the post items in the FAQ of the document as clues?

EDIT I came up with a small thing. It may not be important for your situation, but you might be inspired to improve:

When I was testing the v1.2.0, I uploaded the .bin as a new one, and its updating failed. I created an update test sketch with LittleFS, but the updating .bin was an old one created with SPIFFS. Then, I made an updating .bin with LittleFS again and completed it successfully. Matching the filesystem is important if the problem you encountered is based on the ESP8266 platform. AutoConnect has selected LittleFS as the default filesystem since v.1.2.0. Therefore, .bin created by SPIFFS cannot be OTA updated. If you want to continue using SPIFFS, you need to unify all filesystems to SPIFFS. See the topic in the documentation. Could you validate to match the filesystem both xxx.bin as a new one and the compiled Sketch having AutoConnectOTA?

brad-colbert commented 3 years ago

Hello Hieromon. My apologies for the late reply.

Details and update on status:

  1. MELIFE ESP32 ESP-32S Development Board (WROOM 32) 4MB flash and Adafruit HUZZAH32 – ESP32
  2. I'm using PlatformIO Core 5.0.3·Home 3.3.1 but I'm unsure which arduino packages it is using
  3. MacOS Catalina 10.15.7
  4. Smartphone... not really the issue
  5. Linksys WRT1900AC chan auto
  6. lwIP?
  7. See start of this thread but more info below.
  8. Stack dump. I don't think it's crashing
  9. Not really an issue with the sketch code, I don't think
  10. Working on the debug messages.

My partition table looks like:

Name, Type, SubType, Offset, Size, Flags

nvs, data, nvs, 0x9000, 0x5000, otadata, data, ota, 0xe000, 0x2000, app0, app, ota_0, 0x10000, 0x200000, spiffs, data, spiffs, 0x210000,0xF0000,

I don't think that platformIO is building the firmware.bin as LittleFS but it is building the data partition (named spiffs strangely enough) as LittleFS.

brad-colbert commented 3 years ago

Crashes and reboots. Here is the log:

brad-colbert commented 3 years ago

Starting networking and HTTP server...

[AC] Current:Trebloc
[AC] WiFi.config(IP=0.0.0.0, Gateway=0.0.0.0, Subnetmask=0.0.0.0, DNS1=0.0.0.0, DNS2=0.0.0.0)
[AC] WiFi.begin().....................................................................................................time out IP:0.0.0.0
[AC] autoReconnect, 5 network(s) found, Trebloc(RECENT) loaded
[AC] WiFi.config(IP=0.0.0.0, Gateway=0.0.0.0, Subnetmask=0.0.0.0, DNS1=0.0.0.0, DNS2=0.0.0.0)
[AC] WiFi.begin(Trebloc,**********)..established IP:192.168.1.212
[AC] http server started
WiFi connected: 192.168.1.212 surfomatic
Starting mDNS...
Getting time from NTP servers...
[AC] s_rc placed on /_ac/update
[AC] cap placed on /_ac/update
[AC] bin placed on /_ac/update
[AC] update placed on /_ac/update
[AC] js placed on /_ac/update
[AC] bin placed on /_ac/update_act
[AC] result placed on /_ac/update_act
[AC] dvo placed on /_ac/update_act
[AC] rc placed on /_ac/update_act
[AC] dvc placed on /_ac/update_act
[AC] /_ac/update on hands
[AC] /_ac/update_act on hands
[AC] Host:,/_ac/update,generated:/_ac/update, allocated
[AC] Host:surfomatic.local,/_ac/update,already allocated
[AC] Host:surfomatic.local,/_ac/update_act,generated:/_ac/update_act, allocated
[AC] Host:surfomatic.local,/_ac/update_act,already allocated
[AC] Host:surfomatic.local,/_ac/update_act,already allocated
[AC] /_ac/update requests upload to /_ac/update_act
[AC] ACFile bin accepted firmware.bin, handler enabled
[AC] /firmware.bin updating start
[AC] Host:surfomatic.local,/_ac/update_act,already allocated
[AC] Host:surfomatic.local,/_ac/update_act,already allocated
[AC] Host:surfomatic.local,/_ac/update_act,already allocated
[AC] Host:surfomatic.local,/_ac/update_act,already allocated
[AC] Host:surfomatic.local,/_ac/update_act,already allocated
[AC] Host:surfomatic.local,/_ac/update_act,already allocated
abort() was called at PC 0x401623e5 on core 1

Backtrace: 0x4008c674:0x3ffb1890 0x4008c8a5:0x3ffb18b0 0x401623e5:0x3ffb18d0 0x40086e37:0x3ffb18f0 0x400f2085:0x3ffb1910 0x401852ee:0x3ffb1930 0x401854d5:0x3ffb1950 0x400e9f24:0x3ffb1970 0x400defa5:0x3ffb1990 0x4019a766:0x3ffb19d0 0x400e232e:0x3ffb19f0 0x400db34d:0x3ffb1a70 0x4019a03e:0x3ffb1a90 0x40182be1:0x3ffb1ab0 0x400d90ae:0x3ffb1ae0 0x400d9d11:0x3ffb1b10 0x400dac1d:0x3ffb1d00 0x400d8785:0x3ffb1f20 0x400de31b:0x3ffb1f70 0x400d3644:0x3ffb1f90 0x400f48d5:0x3ffb1fb0 0x40088dbd:0x3ffb1fd0

Rebooting...
Hieromon commented 3 years ago

@brad-colbert Thank you for the log. I added a test pattern with reference to your partition scheme and reproduced the issue. However, the module you are using is an ESP32, so choosing a filesystem has no relation to this issue. The ESP32 core is still SPIFFS. Only ESP8266 is prompting the transition to LittleFS.

Analyzing the trace dump, it seems that ESP-IDF's std::regex_match is the cause. It's a new implementation part for the regular file upload to enhanced with v120. #236 This seems to have caused the regression. Sorry.

I'll make a patch soon.

brad-colbert commented 3 years ago

@Hieromon wow! Thank you for the quick results. Fantastic work, fantastic support!

Hieromon commented 3 years ago

@brad-colbert I released the fix as v.1.2.2. In this release, I changed the indication of the file extension that allows OTA from a regular expression to a fixed string. The default is still .bin, but you can't specify it with a regular expression by default. If you want to use a regular expression for the file extension to identify a firmware, need to define AUTOCONNECT_UPLOAD_ASFIRMWARE_USE_REGEXP macro. Please confirm the documentation.

However, it is highly recommended to leave it as a fixed string. I have taken a rather tricky approach to solve this issue. The std::regex implemented by ESP-IDF is unsophisticated.

https://github.com/Hieromon/AutoConnect/blob/0d02b06a3b476371a9f23bae045176af8705fc4b/src/AutoConnectDefs.h#L271-L282

Also, the partition table you used may insufficient for OTA. It seems to have no app area to stage a new one.

brad-colbert commented 3 years ago

Thank you for the quick response.

I still get an immediate crash. Is there something more I can do to help figure it out? I don't have a debugging device.

[AC] Host:,/_ac,generated:/_ac, allocated
[AC] Host:surfomatic.local,/_ac,already allocated
[AC] Host:surfomatic.local,/_ac/update,generated:/_ac/update, allocated
[AC] Host:surfomatic.local,/_ac/update,already allocated
[AC] Host:surfomatic.local,/_ac/update_act,generated:/_ac/update_act, allocated
[AC] Host:surfomatic.local,/_ac/update_act,already allocated
[AC] Host:surfomatic.local,/_ac/update_act,already allocated
[AC] /_ac/update requests upload to /_ac/update_act
[AC] ACFile bin accepted firmware.bin, handler enabled
[AC] OTA:app firmware.bin
[AC] /firmware.bin updating start
[AC] Host:surfomatic.local,/_ac/update_act,already allocated
[AC] Host:surfomatic.local,/_ac/update_act,already allocated
[AC] Host:surfomatic.local,/_ac/update_act,already allocated
[AC] Host:surfomatic.local,/_ac/update_act,already allocated
[AC] Host:surfomatic.local,/_ac/update_act,already allocated
[AC] Host:surfomatic.local,/_ac/update_act,already allocated
abort() was called at PC 0x40165a65 on core 1

Backtrace: 0x4008c6cc:0x3ffb1880 0x4008c8fd:0x3ffb18a0 0x40165a65:0x3ffb18c0 0x40086e37:0x3ffb18e0 0x400f4a09:0x3ffb1900 0x401892ce:0x3ffb1920 0x401894b5:0x3ffb1940 0x400ec9a8:0x3ffb1960 0x400e7b59:0x3ffb1980 0x401bb7de:0x3ffb19c0 0x400eabf6:0x3ffb19e0 0x400e40a9:0x3ffb1a60 0x401bb582:0x3ffb1a80 0x40186bc1:0x3ffb1aa0 0x400e0a92:0x3ffb1ad0 0x400e16f5:0x3ffb1b00 0x400e2601:0x3ffb1cf0 0x400e0169:0x3ffb1f10 0x400e705b:0x3ffb1f60 0x400d98bb:0x3ffb1f80 0x400f79cd:0x3ffb1fb0 0x40088ddd:0x3ffb1fd0

Rebooting...
ets Jun  8 2016 00:22:57

rst:0xc (SW_CPU_RESET),boot:0x13 (SPI_FAST_FLASH_BOOT)
configsip: 0, SPIWP:0xee
clk_drv:0x00,q_drv:0x00,d_drv:0x00,cs0_drv:0x00,hd_drv:0x00,wp_drv:0x00
mode:DIO, clock div:2
load:0x3fff0018,len:4
load:0x3fff001c,len:1044
load:0x40078000,len:8896
load:0x40080400,len:5828
entry 0x400806ac
Hieromon commented 3 years ago

@brad-colbert Which partition table are you using? The trace below you provided suggests that the crash was intentionally caused by the Arduino core. The Arduino core had check whether the writing address is safe prior to erasing the flash and determines that there is not the OTA staging area.

abort() was called at PC 0x40165a65 on core 1

Backtrace: 0x4008c674:0x3ffb1870 0x4008c8a5:0x3ffb1890 0x40153b8d:0x3ffb18b0 0x40086e37:0x3ffb18d0 0x400f1c41:0x3ffb18f0 0x40177e32:0x3ffb1910 0x40178015:0x3ffb1930 0x400ee524:0x3ffb1950 0x400dd229:0x3ffb1970 0x40193aa6:0x3ffb19b0 0x400e3c4a:0x3ffb19d0 0x400d9839:0x3ffb1a50 0x40193146:0x3ffb1a70 0x400d8631:0x3ffb1a90 0x400d5e22:0x3ffb1ac0 0x400d6ce9:0x3ffb1af0 0x400d7de2:0x3ffb1cf0 0x400d552d:0x3ffb1f20 0x400dc4bf:0x3ffb1f70 0x400d20c6:0x3ffb1f90 0x400f42a9:0x3ffb1fb0 0x40088dbd:0x3ffb1fd0
  #0  0x4008c674:0x3ffb1870 in invoke_abort at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/esp32/panic.c:707
  #1  0x4008c8a5:0x3ffb1890 in abort at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/esp32/panic.c:707
  #2  0x40153b8d:0x3ffb18b0 in is_safe_write_address at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/spi_flash/flash_ops.c:126
  #3  0x40086e37:0x3ffb18d0 in spi_flash_erase_sector at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/spi_flash/flash_ops.c:204 (discriminator 2)
  #4  0x400f1c41:0x3ffb18f0 in EspClass::flashEraseSector(unsigned int) at C:\Users\hieromon\.platformio\packages\framework-arduinoespressif32\cores\esp32/Esp.cpp:203
  #5  0x40177e32:0x3ffb1910 in UpdateClass::_writeBuffer() at C:\Users\hieromon\.platformio\packages\framework-arduinoespressif32\libraries\Update\src/Updater.cpp:360
  #6  0x40178015:0x3ffb1930 in UpdateClass::write(unsigned char*, unsigned int) at C:\Users\hieromon\.platformio\packages\framework-arduinoespressif32\libraries\Update\src/Updater.cpp:360
  #7  0x400ee524:0x3ffb1950 in AutoConnectOTA::_write(unsigned char const*, unsigned int) at C:\Users\hieromon\Documents\Arduino\libraries\AutoConnect\src/AutoConnectOTA.cpp:254

Maybe here: ~/.platformio/packages/framework-arduinoespressif32/libraries/Update/src/Updater.cpp

If you keep the partition definition that has posted yet, it is probably the cause of the crash. Strictly speaking, it's not a crash, the Update class (which is in the esp32 Arduino library) has interrupted the writing to the flash because there is no staging area for the OTA, it's an intentional ABORT.

I commented above:

Also, the partition table you used may insufficient for OTA. It seems to have no app area to stage a new one.

At least two app partitions are required for successful OTA. (usually app0 and app1) Could you change the partition table to default.csv or min_spiffs.csv and try again? (Please note that the v.1.2.2 fixed a bug found while diagnosing the traces you provided)

EDIT: Specifying the use of esp32_execption_decoder in platformio.ini as following will help you find the crashing point (which may be aborted or an exception):

monitor_filters=esp32_exception_decoder
brad-colbert commented 3 years ago

My partitions were the final problem. Thank you very much. Below is the partition table that I found to be successful:

nvs,      data, nvs,     0x9000,  0x5000,
otadata,  data, ota,     0xe000,  0x2000,
app0,     app,  ota_0,   , 0x180000,
app1,     app,  ota_1,   , 0x180000,
spiffs,   data, spiffs,  , 0xF0000,
Hieromon commented 3 years ago

Glad to hear that. It solved. Close.