esp8266 / Arduino

ESP8266 core for Arduino
GNU Lesser General Public License v2.1
16.03k stars 13.34k forks source link

[Core 2.7.0/2.7.1] ESP.restart() or ESP.reset() cause bootloop #7307

Closed rev1204 closed 4 years ago

rev1204 commented 4 years ago

Basic Infos

Platform

Settings in IDE

Problem Description

Core 2.7.0 and 2.7.1 ESP.restart() and ESP.reset() causes ESP to bootloop (ESP keep restarting, never start the program again). Unfortunately ESP didn't print any exception. In the debug message below, the program ran first time after Uploading. After first restart, ESP not printing serial and keep restarting forever at 1 sec interval. This is not happen in Core 2.6.3.

MCVE Sketch

void setup() {
    delay(500);
    Serial.begin(115200);
    Serial.println("\nSTART");
    delay(5000);
    ESP.restart();
}

void loop() {
}

Debug Messages

START

 ets Jan  8 2013,rst cause:2, boot mode:(3,6)

load 0x4010f000, len 3456, room 16 
tail 0
chksum 0x84
csum 0x84
va5432625
~ld

 ets Jan  8 2013,rst cause:2, boot mode:(3,6)

load 0x4010f000, len 3456, room 16 
tail 0
chksum 0x84
csum 0x84
va5432625
~ld
d-a-v commented 4 years ago

Unfortunately your sketch is working as expected here (using a d1 mini running latest git master).

Are the IOs connected to something or left unconnected ? Did you try with the full flash erase option ? With the generic board, FW version can be chosen, what did you select ?

CRCinAU commented 4 years ago

Is this the same problem I was seeing with https://github.com/esp8266/Arduino/issues/7306 ?

I note that @rev1204 also mentions:

Flash Frequency: 80Mhz
CPU Frequency: 160MHz

I was using 160Mhz CPU / 80Mhz flash when I first noticed the problem as well...

d-a-v commented 4 years ago

I tried again with cpu@160MHz, flash@80MHz, with "DOUT" and also "QIO". Both are working (using a d1 mini).

CRCinAU commented 4 years ago

When testing with my code, I noticed that I needed to flash a 160Mhz image from being booted within a 160Mhz image to cause it to die every time. At that point reboot and reflashes caused boot loops until power cycled.

If you flashed an 80Mhz image while booted into a 160Mhz image, then it would die until power cycled, but could then be flashed again with either a 80Mhz or 160Mhz image and would be fine until another flash occurs with either a 80 or 160Mhz image.

I note that I'm not resetting the unit via a USB cable - but flashing via the /update URL on the basic web updating framework.

rev1204 commented 4 years ago

Unfortunately your sketch is working as expected here (using a d1 mini running latest git master).

Are the IOs connected to something or left unconnected ? Did you try with the full flash erase option ? With the generic board, FW version can be chosen, what did you select ?

Hi, I tried full flash erase and the problem is gone, now it can restart correctly. I guess something in the flash causing trouble. Thanks for your suggestion.

CRCinAU commented 4 years ago

What option did you use to erase the flash? Does this use esptool.py or esptool?

@rev1204 - Do you get the same successful result after doing an OTA update when set to 160Mhz? I can flash via the USB port many times and it'll load fine - until I do an OTA with the CPU at 160Mhz, or the flash at 80Mhz. At that point, I get the boot loop again.

Doing an OTA update (via the http interface and /update URL) will always cause a boot loop when running at 160Mhz.

EDIT: Logs after doing an OTA via http://$ip/update:

** Normal boot:
*WM: AutoConnect
*WM: Connecting as wifi client...
*WM: Status:
*WM: 0
*WM: Using last saved values, should be faster
*WM: Connection result: 
*WM: 3
*WM: IP Address:
*WM: 10.1.1.12
*WM: freeing allocated params!
Subscribing to: 84:F3:EB:58:3C:E7/output1
Subscribing to: 84:F3:EB:58:3C:E7/output2
Subscribing to: 84:F3:EB:58:3C:E7/output3
Subscribing to: 84:F3:EB:58:3C:E7/output4
Sensor check triggered...
Temperature: 12.85 C
Humidity: 73.00 %
Pressure: 1004.15 hPa
[MQTT RX] Topic: 84:F3:EB:58:3C:E7/output4
[MQTT RX] Payload: off

** Uploaded new firmware here **

 ets Jan  8 2013,rst cause:2, boot mode:(3,6)

load 0x4010f000, len 3456, room 16 
tail 0
chksum 0x84
csum 0x84
v4519db85
@cp:0
ld

 ets Jan  8 2013,rst cause:2, boot mode:(3,6)

load 0x4010f000, len 3456, room 16 
tail 0
chksum 0x84
csum 0x84
v4519db85
~ld

 ets Jan  8 2013,rst cause:2, boot mode:(3,6)

load 0x4010f000, len 3456, room 16 
tail 0
chksum 0x84
csum 0x84
v4519db85
~ld

** repeated again and again and again **
CRCinAU commented 4 years ago

For more debug, I turned on: -DDEBUG_ESP_PORT=Serial -DDEBUG_ESP_HTTP_UPDATE -DDEBUG_ESP_UPDATER -DDEBUG_ESP_OTA

Then built & flashed a 160Mhz image to the D1 Mini, then uploaded a 160Mhz image via the http updater. Below is the log of the entire first boot, then http update via the /update URL:

SDK:2.2.2-dev(38a443e)/Core:3.0.0-dev=30000000/lwIP:IPv6+STABLE-2_1_2_RELEASE/glue:1.2-30-g92add50/BearSSL:5c771be
sleep enable,type: 2
*WM: 
*WM: AutoConnect
*WM: Connecting as wifi client...
*WM: Status:
*WM: 0
*WM: Using last saved values, should be faster
scandone
no MYSSID found, reconnect after 1s
reconnect
scandone
state: 0 -> 2 (b0)
state: 2 -> 3 (0)
state: 3 -> 5 (10)
add 0
aid 84
cnt 

connected with MYSSID, channel 11
dhcp client start...
ip:10.1.1.12,mask:255.255.255.0,gw:10.1.1.254
*WM: Connection result: 
*WM: 3
*WM: IP Address:
*WM: 10.1.1.12
*WM: freeing allocated params!
Subscribing to: 84:F3:EB:58:3C:E7/output1
Subscribing to: 84:F3:EB:58:3C:E7/output2
Subscribing to: 84:F3:EB:58:3C:E7/output3
Subscribing to: 84:F3:EB:58:3C:E7/output4
Sensor check triggered...
Temperature: 13.10 C
Humidity: 64.75 %
Pressure: 1004.06 hPa
[MQTT RX] Topic: 84:F3:EB:58:3C:E7/output4
[MQTT RX] Payload: off
ip:10.1.1.12,mask:255.255.255.0,gw:10.1.1.254
ip:10.1.1.12,mask:255.255.255.0,gw:10.1.1.254
pm open,type:2 0
[httpUpdate] Header read fin.
[httpUpdate] Server header:
[httpUpdate]  - code: 304
[httpUpdate]  - len: -1
[httpUpdate] ESP8266 info:
[httpUpdate]  - free Space: 3657728
[httpUpdate]  - current Sketch Size: 515936
err3,exceed max time value
Sensor check triggered...
Temperature: 13.06 C
Humidity: 64.74 %
Pressure: 1004.09 hPa

*** Uploaded new firmware via HTTP here ***

sleep disable
[begin] roundedSize:       0x0037C000 (3653632)
[begin] updateEndAddress:  0x003FB000 (4173824)
[begin] currentSketchSize: 0x0007E000 (516096)
[begin] _startAddress:     0x0007F000 (520192)
[begin] _currentAddress:   0x0007F000 (520192)
[begin] _size:             0x0037C000 (3653632)
Header: 0xE9 1 2 40
Staged: address:0x0007F000, size:0x0007DF60
state: 5 -> 0 (0)
rm 0
pm close 7
del if0
usl

 ets Jan  8 2013,rst cause:2, boot mode:(3,6)

load 0x4010f000, len 3456, room 16 
tail 0
chksum 0x84
csum 0x84
v4519db85
@cp:0
ld

 ets Jan  8 2013,rst cause:2, boot mode:(3,6)

load 0x4010f000, len 3456, room 16 
tail 0
chksum 0x84
csum 0x84
v4519db85
~ld

 ets Jan  8 2013,rst cause:2, boot mode:(3,6)

load 0x4010f000, len 3456, room 16 
tail 0
chksum 0x84
csum 0x84
v4519db85
~ld

*** Repeated forever ***

I believe the problem lies above somewhere - as after doing this OTA update, every ESP.reset(), ESP.restart(), or automatic reboot after new OTA flash fails in the same manner.

I have also confirmed this same problem occurs when keeping the CPU to 80Mhz, but changing the FLASH to 80Mhz...

rev1204 commented 4 years ago

Update @d-a-v : I tried again with my old project and has the problem again. In my example above, such a simple sketch can run ok. Now it's a more complicated sketch, has webserver and many other things. But it can run perfectly on core 2.6.3, now it bootloop on core 2.7.1 even after full erase flash. I don't know which part caused it for now. Maybe I'll try tomorrow.

@CRCinAU : The option is in Tools->Erase Flash->All Flash Contents. I'm not trying OTA update yet, and even without OTA, my ESP is now bootloop.

devyte commented 4 years ago

@rev1204 what is your flash brand? or flashID number.

rev1204 commented 4 years ago

@devyte Flash Chip ID: 0x00164020 Flash Vendor ID: 0x00000020

I think I get it now what causes the bootloop. It because of SPIFFS. My guess is because SPIFFS now has been deprecated. This is my code:

#include <FS.h>

void setup() {
    delay(500);
    Serial.begin(115200);
    Serial.println("\nSTART ");
    if (SPIFFS.begin()) {
        Serial.println("SPIFFS Mounted");
    }
    else {
        Serial.println("SPIFFS failed");
    }
    delay(5000);
    ESP.restart();
}

void loop() {
}

I build and upload using full flash erase. After that I use ESP Sketch Data Upload. It prints SPIFFS Mounted. After restart, ESP just bootloop.

d-a-v commented 4 years ago

It's an XMC flash chip.

edit: It bootloops after ESP Sketch data upload and not before ? Can you double check you have the latest version of this plugin ? Can you try with LittleFS (there's also a an upload plugin) https://arduino-esp8266.readthedocs.io/en/2.7.1/filesystem.html?highlight=spiffs#uploading-files-to-file-system

mhightower83 commented 4 years ago

Looks like you guys are already zeroing in on it. FWIW: I have a DOIT module with 4M XMC flash and was able to recreate the problem as described. Commenting out this line https://github.com/esp8266/Arduino/blob/4519db85e981c6e505828baf94e8107a76db4f1a/cores/esp8266/core_esp8266_main.cpp#L338 made things work as expected. At least with my short testing.

rev1204 commented 4 years ago

@d-a-v Edit: I just updated to latest plugin version. It bootloop in both SPIFFS and LittleFS before and after Sketch data upload. But, after I commenting out the line of code as @mhightower83 , I can confirm (with my simple code above) it work as expected.

d-a-v commented 4 years ago

@rev1204 Thanks for reporting! @mhightower83 Thanks! I guess a fix (fixing the call or removing it) will land very soon for a 2.7.2 release.

@CRCinAU Can you try with the proposed fix ?

carpelux commented 4 years ago

I can also verify that commenting out experimental::initFlashQuirks() does solve the problem. I had the issue once i updated to 2.7.0 but due to time constraints I just went back to 2.6.3. Now it works with 2.7.1 after commenting out that line.

Tech-TX commented 4 years ago

@d-a-v I saw the same issue yesterday, David. XMC chips on my D1 Mini modules, boot loop after ESP. restart() at 160Mhz compile. Dropping the flash freq. to 26Mhz fixed it for me. I'll try commenting out the flash quirks when I get home this evening.

Edit: it worked for me as well on 2 different boards I tested. I'll switch the ESP.restart() out for ESP.reset() and retest. Update: I can't get the boot loop to happen with ESP.reset(), only with ESP.restart(). At least with my boards and code, no boot loop with ESP.reset() and the default D1 Mini board settings, as well as "erase all flash contents" to insure it's a clean upload.

CRCinAU commented 4 years ago

@CRCinAU Can you try with the proposed fix ?

@d-a-v - I can confirm that commenting out the above line resolves all my boot loop issues on my D1 Minis. Tested on 3 different devices (different code), all resolved and working correctly.

devyte commented 4 years ago

@CRCinAU I haven't seen confirmation that you have an xmc flash. I think it's likely, but let's be sure. Please check your flashId and report it here.

CRCinAU commented 4 years ago

@devyte - I'm guessing its the output of ESP.getFlashChipId() you want?

CPU Speed: 160 MHz
Flash Speed: 80 Mhz
Flash Real Size: 4194304 bytes
Flash Chip ID: 1458208
Tech-TX commented 4 years ago

That looks like a 1MB XMC flash, but I don't know where that trailing '8' comes from. It should be a 6-nybble or 3-byte word. I think this is the datasheet for your part: http://www.xmcwh.com/index.php?s=/cms/168.html

CRCinAU commented 4 years ago

I'm pretty sure its an XM25QH32B - http://www.xmcwh.com/index.php?s=/cms/170.html

Tech-TX commented 4 years ago

That leading '14' normally means a 1MB part. The '58' is the specific model or version, and the '20' would be XMC. You're linking a 4MB part, which usually has '16' for the leading part of the ID.

mhightower83 commented 4 years ago

@Tech-TX I also puzzled at that number a bit. I think 1458208 is decimal which would be 0x164020. Which would matchup to Manufacturer ID = 20h, Memory Type =40h, Capacity = 16h

Tech-TX commented 4 years ago

Gotcha. I've never seen anyone show a flash ID in decimal. :-)

Patching the file in mhightower's post should fix you up.

CRCinAU commented 4 years ago

The code is just the following as a String:

                F("<tr><td>CPU Speed:</td><td>") + ESP.getCpuFreqMHz() + F(" MHz</td></tr>\n") +
                F("<tr><td>Flash Speed:</td><td>") + (ESP.getFlashChipSpeed() / 1000000) + F(" Mhz</td></tr>\n") +
                F("<tr><td>Flash Real Size:</td><td>") + ESP.getFlashChipRealSize() + F(" bytes</td></tr>\n") +
                F("<tr><td>Flash Chip ID:</td><td>") + ESP.getFlashChipId() + F("</td></tr>\n") +

I'm guessing the auto-conversion stuff just dumped it as DEC instead of HEX...

devyte commented 4 years ago

So everyone's in the same boat then, awesome. Closing as duplicate of #7267.

pfeerick commented 4 years ago

Just for completeness, since I have been following this issue since yesterday, I serial uploaded a sketch which simply serial prints some ESP.get values (such as FlashID, FlashMode) and then triggers ESP.reset() after 10 seconds of uptime. I tested against a NodeMCU and Digistump Oak, both with BergMicro BG25Q32 flash memory, and a Wemos D1 Mini with XMC XM25QH32B flash memory.

Using core 2.7.1, 80Mhz reset consistently and without issue of all three boards. 160Mhz reset consistently without issue on the Digistump Oak and Nodemcu V3. The Wemos D1 Mini would reset and then get stuck in a bootloop.

 ets Jan  8 2013,rst cause:2, boot mode:(3,6)

load 0x4010f000, len 3456, room 16 
tail 0
chksum 0x84
csum 0x84
va5432625
~ld

Commenting out experimental::initFlashQuirks(); // Chip specific flash init. in core_esp8266_main.cpp and doing a clean build made the Wemos D1 Mini @ 160Mhz reset and run consistently like the other two boards.

mhightower83 commented 4 years ago

@pfeerick In your test were you also running with a Flash Frequency of 80 MHz?

pfeerick commented 4 years ago

No, they weren't. Stock 40Mhz. But here's the weird thing... I had re-enabled experimental::initFlashQuirks();, so I tried with 80Mhz Flash, and the Wemos D1 Mini was fine. I was then putting the results together to post, and noticed I'd changed the chip mode from DIO to QIO earlier while checking if that made any difference, so swapped it back, and tried 40Mhz Flash / 160Mhz CPU again and now the Wemos D1 mini is resetting just fine at 160Mhz now. I power cycled it in case something was sticking, and it's still running fine. So now I don't know what's going on! Crash, dang it! :laughing: :open_mouth:

i.e. It has just properly self reset, runs properly, self resets properly again, runs again...

 ets Jan  8 2013,rst cause:2, boot mode:(3,0)

load 0x4010f000, len 3456, room 16 
tail 0
chksum 0x84
csum 0x84
va5432625
~ld

Init!
CoreVersion: 2_7_1
FlashChipID: 0x164020
FlashChipMode: DIO
FlashChipSpeed: 40 Mhz
CpuFreqMHz: 160 MHz

 ets Jan  8 2013,rst cause:2, boot mode:(3,0)

load 0x4010f000, len 3456, room 16 
tail 0
chksum 0x84
csum 0x84
va5432625
~ld

Init!
CoreVersion: 2_7_1
FlashChipID: 0x164020
FlashChipMode: DIO
FlashChipSpeed: 40 Mhz
CpuFreqMHz: 160 MHz

Edit: btw, if it means anything, I just noticed after the power cycle the boot mode changed from (3,0) to (3,6). I'm not sure what the second digit means though?

Code ``` #include void setup() { Serial.begin(115200); Serial.println(); Serial.println("Init!"); uint32_t FlashChipID = ESP.getFlashChipId(); uint32_t ChipID = ESP.getChipId(); String CoreVersion = ESP.getCoreVersion(); uint8_t BootVersion = ESP.getBootVersion(); uint8_t CpuFreqMHz = ESP.getCpuFreqMHz(); FlashMode_t FlashChipMode = ESP.getFlashChipMode(); uint32_t FlashChipSpeed = ESP.getFlashChipSpeed(); // uint8_t FlashChipVendorId = ESP.getFlashChipVendorId(); while(millis() < 1000) { delay(1); } // Serial.printf("BootVersion: %u\n", BootVersion); Serial.println("CoreVersion: " + CoreVersion); Serial.printf("FlashChipID: 0x%06X\n", FlashChipID); Serial.printf("FlashChipMode: %s\n", (FlashChipMode == FM_QIO ? "QIO" : FlashChipMode == FM_QOUT ? "QOUT" : FlashChipMode == FM_DIO ? "DIO" : FlashChipMode == FM_DOUT ? "DOUT" : "UNKNOWN")); Serial.printf("FlashChipSpeed: %u Mhz\n", FlashChipSpeed / 1000000); // Serial.printf("FlashChipVendorId: %X\n", FlashChipVendorId); // Serial.printf("ChipID: %08u\n", ChipID); Serial.printf("CpuFreqMHz: %u MHz\n", CpuFreqMHz); } void loop() { while(millis() < 5000) { delay(1); } ESP.reset(); } ```
CRCinAU commented 4 years ago

Interestingly enough, if I set board_build.flash_mode = qio in platformio.ini, then it looks like I can no longer flash things.

ie:

Tue May 19 15:51:37 2020 - XsNz6SHApGG0j@z5b069eAAAAMI [10.1.1.254]: Session started. Chip ID: 1078572, MAC Address: 84:F3:EB:10:75:2C, SDK Version: 2.2.2-dev(38a443e)
Tue May 19 15:51:37 2020 - XsNz6SHApGG0j@z5b069eAAAAMI [10.1.1.254]: Known device - proceeding...
Tue May 19 15:51:37 2020 - XsNz6SHApGG0j@z5b069eAAAAMI [10.1.1.254]: File associated with device: ClimateControl-d1_mini.bin
Tue May 19 15:51:37 2020 - XsNz6SHApGG0j@z5b069eAAAAMI [10.1.1.254]: Firmware differs from loaded. Sending File. Header: b1139dc4dc3e664127e70de38d897876, File: 5af9911a85ca428450a6b77a80acc775
Tue May 19 15:51:46 2020 - XsNz6SHApGG0j@z5b069eAAAAMI [10.1.1.254]: Firmware Send complete...

Tue May 19 15:52:12 2020 - XsN0DAvUh7CS62TycDs5GgAAAEU [10.1.1.254]: Session started. Chip ID: 1078572, MAC Address: 84:F3:EB:10:75:2C, SDK Version: 2.2.2-dev(38a443e)
Tue May 19 15:52:12 2020 - XsN0DAvUh7CS62TycDs5GgAAAEU [10.1.1.254]: Known device - proceeding...
Tue May 19 15:52:12 2020 - XsN0DAvUh7CS62TycDs5GgAAAEU [10.1.1.254]: File associated with device: ClimateControl-d1_mini.bin
Tue May 19 15:52:12 2020 - XsN0DAvUh7CS62TycDs5GgAAAEU [10.1.1.254]: Firmware differs from loaded. Sending File. Header: b1139dc4dc3e664127e70de38d897876, File: 5af9911a85ca428450a6b77a80acc775
Tue May 19 15:52:18 2020 - XsN0DAvUh7CS62TycDs5GgAAAEU [10.1.1.254]: Firmware Send complete...

if I recompile with board_build.flash_mode = dio, then the flash works:

Tue May 19 15:52:46 2020 - XsN0LiHApGG0j@z5b069eQAAAMM [10.1.1.254]: Session started. Chip ID: 1078572, MAC Address: 84:F3:EB:10:75:2C, SDK Version: 2.2.2-dev(38a443e)
Tue May 19 15:52:46 2020 - XsN0LiHApGG0j@z5b069eQAAAMM [10.1.1.254]: Known device - proceeding...
Tue May 19 15:52:46 2020 - XsN0LiHApGG0j@z5b069eQAAAMM [10.1.1.254]: File associated with device: ClimateControl-d1_mini.bin
Tue May 19 15:52:46 2020 - XsN0LiHApGG0j@z5b069eQAAAMM [10.1.1.254]: Firmware differs from loaded. Sending File. Header: b1139dc4dc3e664127e70de38d897876, File: b0d66f935827705b1ead4ec9e2661015
Tue May 19 15:52:54 2020 - XsN0LiHApGG0j@z5b069eQAAAMM [10.1.1.254]: Firmware Send complete...

Tue May 19 15:53:21 2020 - XsN0UQvUh7CS62TycDs5GwAAAEg [10.1.1.254]: Session started. Chip ID: 1078572, MAC Address: 84:F3:EB:10:75:2C, SDK Version: 2.2.2-dev(38a443e)
Tue May 19 15:53:21 2020 - XsN0UQvUh7CS62TycDs5GwAAAEg [10.1.1.254]: Known device - proceeding...
Tue May 19 15:53:21 2020 - XsN0UQvUh7CS62TycDs5GwAAAEg [10.1.1.254]: File associated with device: ClimateControl-d1_mini.bin
Tue May 19 15:53:21 2020 - XsN0UQvUh7CS62TycDs5GwAAAEg [10.1.1.254]: Firmware version matches: b0d66f935827705b1ead4ec9e2661015

Is QIO supposed to work on these flash chips? It seems going by the data sheet that its supported?

Oh, and to remove all doubt, this is the output from the logging on my web server when an update is served to the D1 Mini via https... The Header md5 is the one supplied by the D1 Mini in the http header, the File is the calculated md5 of the firmware on disk.

EDIT: I note this post that kind of asks the same question, but never answered: https://www.esp8266.com/viewtopic.php?p=55287#p55287

pfeerick commented 4 years ago

@CRCinAU I honestly don't know what happened there... I left it (Wemos D1 Mini) running some different code while opening an issue about something else, and just tried again, and now the 160Mhz compiled code crashes as expected, with the flash speed at either 40 or 80 Mhz. Otherwise I was going to suggest bumping your flash memory speed up as well as the CPU speed. The only consistent success I've had with the Wemos D1 Mini thus far has been to comment out that experimental::initFlashQuirks() function, which will probably happen as standard in 2.7.2, unless the real culprit is found.

And I don't know why QIO worked that time with it, because it doesn't now either. :confused: Not knowing how the chip is wired, I'm don't even know if it should have worked or doesn't become of some other issue, since QIO mode needs two extra connections between the ESP8266 and the SPI flash chip which aren't always connected, and even if they are, often to the wrong pins, etc, etc..

CRCinAU commented 4 years ago

It seems like I can use esptool.py to flash in QIO mode:

$ esptool.py --baud 460800 write_flash --flash_freq 80m --flash_mode qio 0 BasicWebUpdate-d1_mini.bin
esptool.py v2.7
Found 1 serial ports
Serial port /dev/ttyUSB0
Connecting....
Detecting chip type... ESP8266
Chip is ESP8266EX
Features: WiFi
Crystal is 26MHz
MAC: 84:f3:eb:9f:3a:a3
Uploading stub...
Running stub...
Stub running...
Changing baud rate to 460800
Changed.
Configuring flash size...
Auto-detected Flash size: 4MB
Flash params set to 0x004f
Compressed 329296 bytes to 240335...
Wrote 329296 bytes (240335 compressed) at 0x00000000 in 5.4 seconds (effective 488.2 kbit/s)...
Hash of data verified.

Leaving...
Hard resetting via RTS pin...
CRCinAU commented 4 years ago

I might be wrong here, but if I flash with esptool.py with the --flash_mode qio option, then that seems to persist no matter what is uploaded via OTA... ESP.getFlashChipMode() will return 0 when flashed in this mode. Any further OTA keeps this value at 0.

If I flash with --flash_mode dio, then ESP.getFlashChipMode() returns 2. After this, it seems any OTA remains at 2 - even if you build with platformio.ini value board_build.flash_mode = qio.

If I'm interpreting this correctly, once you flash once as QIO, it'll always use QIO. If you flash as DIO, it'll always use DIO.

Is this correct?

pfeerick commented 4 years ago

One meal, and one preprocessor #warning statement later... I now know that some caching was fouling my tests before with DIO/QIO ... even though I'd cleaned the project in an attempt to make it rebuild properly, commenting and uncommenting the experimental::initFlashQuirks() wasn't being taken properly... it took a couple of goes before the compiler correctly compiled what was actually in the file. The end result is I once again have QIO happening on the Wemos D1 Mini, with experimental::initFlashQuirks() disabled, at 80Mhz Flash / 160Mhz CPU.

@CRCinAU I can replicate that behaviour also... A previously 'QIO' mode flashed device will not change to 'DIO' mode if flashed OTA. I immediately flashed the same code over serial, and it then switched. Reading some of what is mentioned in the documentation for esptool, it mentions

A header at the beginning of a bootable image contains these values.

these being 'correct mode, frequency and size settings'. And then goes on to mention

These options are only consulted when flashing a bootable image to an ESP8266 at offset 0x0, or an ESP32 at offset 0x1000. These are addresses used by the ROM bootloader to load from flash. When flashing at all other offsets, these arguments are not used.

This leads me to believe (as well as something else mentioned in relation to XMC in the linked issue above) that the bootloader is responsible for setting the flash frequency, and AFAIK OTA doesn't touch the bootloader. I also just saw this in a map of the ESP8266 memory layout:

Start Start Length Function
0x000000 0 4KB Bootloader with flash type/size header
ChocolateFrogsNuts commented 4 years ago

The XMC chips support QIO, but some of the Wemos D1 boards they are on are not wired for QIO, depending on the version and if they are genuine or not. I found no guaranteed way to find out other than trying it (or getting out a multimeter and chip datasheets).

I found the XMC chips to be generally unstable at 40 and 80 MHz on the D1 boards with random crashes etc. Adding SPIFFS to a sketch made it much worse - this is what initFlashQuirks() was supposed to solve by having the XMC chips boost their output drive from the default 75% to 100%.

The XMC chips are supposed to be good to something like 105MHz from memory, but I'm assuming that requires 100% output drive.

Also, as far as I can see, the bootloader does not set the flash speed, which is always 20MHz when the bootloader runs. The flash speed is most likely set early on in the SDK init.

OTA updates DO update the bootloader (just tested it), however you will not see the updated bootloader on the boot that installs the update because it's not been installed yet...

One thought - if ESP.reset() resets the chip in the same way as the reset button - clearing the instruction cache and forcing code to be re-read from flash whereas ESP.restart() merely jumps to the boot address leaving the cache intact, that might explain the difference in behaviour... (remembering that it's possible to get a bad flash read after pressing the reset button, causing a crash just to keep you confused)

ChocolateFrogsNuts commented 4 years ago

Looking at the disassembly of _SPICommand() - the function initFlashQuirks() ultimately uses to do it's magic - it looks like the optimiser is messing with the code and ignoring the PRECACHE_ATTR directive to prevent re-ordering code blocks... which is bad because some of the critical code is moved outside the precached() area. It's also ignoring at least one volatile and optimising it away... if it's ignoring others bad things will happen here. Either of these things will cause a crash. It took quite a bit of work to get the optimiser to not mess this up originally, so I'll have to look into it further.

Has anything changed in the optimisation settings generally?

earlephilhower commented 4 years ago

Nothing has changed in years as far as I know. -Os is the default.

You can use a #pragma GCC optimize ("O0") if you want to disable everything in your section of code. It'll explode code size in many cases, though, so be very judicious in its use.

pfeerick commented 4 years ago

@ChocolateFrogsNuts Thanks for the reminder about the first run after OTA updates, which I did try to account for by never taking the first reset result. I still don't think OTA I'm aware that the first run after OTA does updates, but I didn't think it did the bootloader - and was the reason we needed to do a serial update in order to use the new gzip routines.

Anyway, I just tried again, thinking I would see if ESP.reset() behaved any different to ESP.restart() ... and didn't bother in the end... as uploading with different flash modes OTA most definitely does NOT actually change the flash mode. Even more interesting is I've been adding more to the debug sketch I've been tinkering with, and added the FlashCRC check, and that fails...

  if (!ESP.checkFlashCRC())
  {
    SERIAL.println();
    SERIAL.println(F("Warning! Flash CRC check failed!"));    
  }

i.e. when the Wemod D1 Mini was previously running a 80Mhz CPU / 80Mhz Flash in DIO mode ...

ResetReason: Software/System restart
ChipID: 10266669
BootVersion: 31
CoreVersion: 2_7_1
SdkVersion: 2.2.2-dev(38a443e)
CpuFreqMHz: 80 MHz
FlashChipID: 0x164020
FlashChipVendorId: XMC
FlashChipMode: DIO
FlashChipSpeed: 80 Mhz
FlashChipRealSize (Chip ID): 4194304 bytes
FlashChipSize (seen by SDK): 4194304 bytes
Sketch MD5: a02c6090f695ff00a0d0951b90bc7755
Sketch Size: 321920 bytes

... and is then flashed with 160Mhz CPU / 80Mhz QIO code OTA, even after several restarts and even a hard reset, it still reports ...

BootMode: 1
ResetReason: External System
ChipID: 10266669
BootVersion: 31
CoreVersion: 2_7_1
SdkVersion: 2.2.2-dev(38a443e)
CpuFreqMHz: 160 MHz
FlashChipID: 0x164020
FlashChipVendorId: XMC
FlashChipMode: DIO
FlashChipSpeed: 80 Mhz
FlashChipRealSize (Chip ID): 4194304 bytes
FlashChipSize (seen by SDK): 4194304 bytes
Sketch MD5: 9ec7250db2d66d896af0271b52571e07
Sketch Size: 321920 bytes
Warning! Flash CRC check failed!

... but writing via serial yields the correct flash mode, no CRC error, but a different MD5 signature - even though I deliberately uploaded the same binary that was use for OTA ...

BootMode: 1
ResetReason: Software/System restart
ChipID: 10266669
BootVersion: 31
CoreVersion: 2_7_1
SdkVersion: 2.2.2-dev(38a443e)
CpuFreqMHz: 160 MHz
FlashChipID: 0x164020
FlashChipVendorId: XMC
FlashChipMode: QIO
FlashChipSpeed: 80 Mhz
FlashChipRealSize (Chip ID): 4194304 bytes
FlashChipSize (seen by SDK): 4194304 bytes
Sketch MD5: 14169eac9b3a95ce0d8b0a999fea54aa
Sketch Size: 321920 bytess
Code ``` #include #include #include #include #include #define OTA #ifdef OTA #include const char *ssid = "Anembo (ZTE)"; const char *password = "004783wb"; TelnetSpy SerialAndTelnet; #define SERIAL SerialAndTelnet #else #define SERIAL Serial #endif #ifdef OTA void otaSetup() { WiFi.mode(WIFI_STA); WiFi.begin(ssid, password); while (WiFi.waitForConnectResult() != WL_CONNECTED) { SERIAL.println("Connection Failed! Rebooting..."); delay(5000); ESP.restart(); } ArduinoOTA.setHostname("esp8266-diagnostic"); ArduinoOTA.onStart([]() { String type; if (ArduinoOTA.getCommand() == U_FLASH) { type = "sketch"; } else { // U_FS type = "filesystem"; } // NOTE: if updating FS this would be the place to unmount FS using FS.end() SERIAL.println("Start updating " + type); }); ArduinoOTA.onEnd([]() { SERIAL.println("\nEnd"); }); ArduinoOTA.onProgress([](unsigned int progress, unsigned int total) { SERIAL.printf("Progress: %u%%\r", (progress / (total / 100))); }); ArduinoOTA.onError([](ota_error_t error) { SERIAL.printf("Error[%u]: ", error); if (error == OTA_AUTH_ERROR) { SERIAL.println("Auth Failed"); } else if (error == OTA_BEGIN_ERROR) { SERIAL.println("Begin Failed"); } else if (error == OTA_CONNECT_ERROR) { SERIAL.println("Connect Failed"); } else if (error == OTA_RECEIVE_ERROR) { SERIAL.println("Receive Failed"); } else if (error == OTA_END_ERROR) { SERIAL.println("End Failed"); } }); ArduinoOTA.begin(); SERIAL.println("Ready"); SERIAL.print("IP address: "); SERIAL.println(WiFi.localIP()); } #endif void setup() { SERIAL.begin(115200); SERIAL.println(); SERIAL.println("Init!"); #ifdef OTA SerialAndTelnet.setWelcomeMsg("Welcome to the TelnetSpy example\n\n"); otaSetup(); #endif FlashMode_t FlashChipMode = ESP.getFlashChipMode(); uint32_t FlashChipId = ESP.getFlashChipId(); uint8_t FlashChipVendor = FlashChipId & 0x000000ff; SERIAL.printf("BootMode: %u\n", ESP.getBootMode()); SERIAL.println("ResetReason: " + ESP.getResetReason()); SERIAL.printf("ChipID: %08u\n", ESP.getChipId()); SERIAL.printf("BootVersion: %u\n", ESP.getBootVersion()); SERIAL.println("CoreVersion: " + ESP.getCoreVersion()); SERIAL.printf("SdkVersion: %s\n", ESP.getSdkVersion()); SERIAL.printf("CpuFreqMHz: %u MHz\n", ESP.getCpuFreqMHz()); SERIAL.printf("FlashChipID: 0x%06X\n", FlashChipId); SERIAL.printf("FlashChipVendorId: %s\n", (FlashChipVendor == 0xC8 ? "GigaDevice" : FlashChipVendor == 0xEF ? "Winbond" : FlashChipVendor == 0xE0 ? "BergMicro" : FlashChipVendor == 0x20 ? "XMC" : "Unknown")); SERIAL.printf("FlashChipMode: %s\n", (FlashChipMode == FM_QIO ? "QIO" : FlashChipMode == FM_QOUT ? "QOUT" : FlashChipMode == FM_DIO ? "DIO" : FlashChipMode == FM_DOUT ? "DOUT" : "UNKNOWN")); SERIAL.printf("FlashChipSpeed: %u Mhz\n", ESP.getFlashChipSpeed() / 1000000); SERIAL.printf("FlashChipRealSize (Chip ID): %u bytes\n", ESP.getFlashChipRealSize()); SERIAL.printf("FlashChipSize (seen by SDK): %u bytes\n", ESP.getFlashChipSize()); SERIAL.println("Sketch MD5: " + ESP.getSketchMD5()); SERIAL.printf("Sketch Size: %u bytes", ESP.getSketchSize()); if (!ESP.checkFlashConfig()) { SERIAL.println(); SERIAL.println(F("Warning! Mismatch in Flash Chip size and size seen by SDK!")); } if (!ESP.checkFlashCRC()) { SERIAL.println(); SERIAL.println(F("Warning! Flash CRC check failed!")); } } void loop() { while (millis() < 10000) { delay(1); #ifdef OTA SerialAndTelnet.handle(); ArduinoOTA.handle(); #endif } ESP.restart(); // SDK restart //ESP.reset(); // Hard restart } ```
CRCinAU commented 4 years ago

Ok - so I'm glad you mentioned this - as I thought I was going a bit insane from lack of sleep - as I was building with pio run, then taking the .bin file and flashing it with esptool.py - but I noticed that the SAME binary file copied across to the HTTP auto-update site was having a different MD5 - and I just couldn't figure this out...

I ended up using pio run -t upload instead, then copying the binary to the http server and it was fine - the MD5 matches...

I wonder if esptool.py was patching the binary - or something was going on behind the scenes to make the checksum between the flashed file and the copied file be different...

Driving me nuts.

ChocolateFrogsNuts commented 4 years ago

I just patched the boot loader to print what it's doing with the first 4 bytes of flash (where ESP.getFlashChipMode() reads the mode from). It does write whatever is in the OTA update to that location, however what it's told to write does not change when the mode is changed in the IDE. If you change the flash frequency however, it does change what's written in the OTA update. It looks like a build issue, rather than a bootloader issue.

CRCinAU commented 4 years ago

It looks like a build issue, rather than a bootloader issue.

So do I understand this correctly - if its programmed via USB as QIO, any further OTA will remain in QIO mode - although you can change the frequency via an OTA.

If you program via USB as DIO, it will stay DIO if you flash via USB, but you can't change it to QIO without flashing via USB again with QIO mode set?

This would more or less match up with what I discovered last night. I believe it also means if you mismatch QIO / DIO between flash and build environment, you end up with a mismatched MD5 when checking for OTA...

ChocolateFrogsNuts commented 4 years ago

That seems to be the case - although the .bin file generated seems to have the correct information in it, so I'm not sure what's happening to it in between... I'm beginning to suspect something in the upload process, but I'm not familiar with how that works at the PC end.

ChocolateFrogsNuts commented 4 years ago

Aha! The Updater class messes about with the flash mode (Updater.cpp:349) and does exactly what we are seeing - forcing the flash mode to be whatever is currently being used as it writes the update to it's temporary location in flash. EDIT: if you want to prevent it from happening, send a gzipped OTA update :-)

devyte commented 4 years ago

@earlephilhower I seem to remember you mentioning something about keeping the previous mode with OTA, and that it was on purpose. Do you remember any details here? One thing is certain: changing the mode to something that causes a hardware conflict is dangerous, it can fry two pins of the mode access. I've seen it happen several times. Even if it doesn't fry, it could cause a reboot loop on the remote device if done OTA. My thinking here is: you do not change mode on a remote device. If you need to change mode, you should be in a development environment, i. e. connected over USB/serial.

earlephilhower commented 4 years ago

Updater does change the flash mode bits in the image (if it's not gzip compressed).

Why? I couldn't say as it was done before my time. My guess is that the idea was to be able to distribute a single binary image to a bunch of systems in the field with diverse flash configs (i.e. a product whose initial run used vendor AAA, but later moved to vendor BBB for cost reasons).

It's something that probably should be undone in 3.0, but changing it would be a breaking change.

devyte commented 4 years ago

Ah yes, that was exactly the original reason: to be able to distribute a single binary to devices with potentially different modes. Otherwise it would be necessary to maintain binaries per mode. And you do not want to change the mode of a remote device.

ChocolateFrogsNuts commented 4 years ago

When you think about how the even the Wemos D1 mini boards have several versions that do or don't support various flash modes it makes perfect sense to not allow a mode change OTA. I would actually suggest moving the restriction into eboot so as to support it on gzipped updates. We have a few bytes to spare in there now.

pfeerick commented 4 years ago

I would prefer to see it made an option, rather than completely blocking it off. Maybe something like ArduinoOTA.preserveFlashMode(bool), thus keeping current behaviour unless overridden and allowing for it to be disabled if desired? Removes some of the 'why the heck was this magic bit of code done' factor and doesn't require a wait for an major version change, unless wanting to change the default (which is a sensible default to prevent stupid mistakes). Although that still leaves compressed images as inconsistent with this, so enforcing the flash mode not being changed in eboot would at least give consistent behaviour for both OTA image types.

CRCinAU commented 4 years ago

... or in my case where I did want to change the mode OTA :)

It was more of a pain having to go around to each device and program it via USB to make the change happen. Having to disassemble various cases, mounts etc to do so.

I'd agree that a much more sane approach would be to have a value which protects by default, but also allows it to happen - and if the user wants to, they can change from DIO -> QIO via OTA.

ChocolateFrogsNuts commented 4 years ago

PR #7317 now fixes experimental::initFlashQuirks() - which is what we were originally discussing here :)