espressif / ESP8266_NONOS_SDK

ESP8266 nonOS SDK
Other
929 stars 535 forks source link

Corruption of Flash memory causing reboot loop. #92

Open wally4u opened 6 years ago

wally4u commented 6 years ago

Hi All,

I am currently having an issue with a bunch of boards (returned from the field) that are stuck in a reboot loop. The root cause is that these boards (ESP-wroom-02) have corrupted flash. We cannot seem to figure out what the cause of the corruption is, so I wonder if anyone else has run into this issue.

We did a flash dump and compared the data to a working "factory fresh" board. We discovered that specific sectors in the flash memory are corrupt. (see list below). We cannot find any reason why these sectors are corrupted since the module is dormant (deep-sleep) 99% of the time in our system. And do not use the embedded flash (we use the wifi module as a transparent UART proxy). No reads/writes to flash. We are using stock nonOS binaries (factory installed on the module) with AT 1.3 / SDK 2.0

We cannot determine what is going on in software since we do not have access to the nonsdk code. (Has anyone received the source-code from espressif (even under NDA)?) We currently are receiving increasing complaints from the field and need to resolve this ASAP.

Technical info: 32 Byte is written and the rest of the sector is FFFF. 70 29 FF 3F 00 00 00 00 00 00 00 00 00 00 00 00 C0 9A 00 00 38 25 FF 3F 00 00 00 00 00 00 00

ESP8266_Corrupt_10274879: 0x1D000 tot 0x1E000 en 0x1F000 tot 0x20000 ESP8266_Corrupt_10275065: 0x1D000 tot 0x1E000 en 0x1F000 tot 0x20000 ESP8266_Corrupt_10275070: 0x1D000 tot 0x1E000 en 0x1F000 tot 0x20000 ESP8266_Corrupt_10344706: 0x1D000 tot 0x1E000 en 0x1F000 tot 0x20000 ESP8266_Corrupt_10345021: 0x1D000 tot 0x1E000 en 0x1F000 tot 0x20000 ESP8266_Corrupt_10345026: 0x1D000 tot 0x1E000 en 0x1F000 tot 0x20000 ESP8266_Corrupt_10345030: 0x1D000 tot 0x1E000 en 0x1F000 tot 0x20000 ESP8266_Corrupt_10345760: 0x1D000 tot 0x1E000 en 0x1F000 tot 0x20000 ESP8266_Corrupt_10346122: 0x1D000 tot 0x1E000 en 0x1F000 tot 0x20000 ESP8266_Corrupt_10275108: 0x9000 tot 0xC000 ESP8266_Corrupt_10344795: 0x9000 tot 0xA000 en 0xB000 tot 0xC000

The corruption seems to be "mostly" at the same location with the same data.

I tried emailing espressif / technical support form on the website, but we did not receive any reply as of yet.

wally4u commented 6 years ago

Finally got a response from espressif via email. Will update here is we find out what the issue is.

wally4u commented 6 years ago

Still being stonewalled by espressif. Basically stating I'm using incorrect firmware which caused the issue. Which is kind of funny since we did not modify the original factory firmware in the chips. Will update when I know more.

DouglasPearless commented 6 years ago

I have seen something similar when the application on the boards were updated via OTA (where the new application is written to a spare part of the flash, then at boot time, is copied over the existing application), but the user power-cycles the unit before the copy process is complete. In that case, the ESP keep rebooting endlessly as the application was corrupted, even though it should have re-copied the application at boot time but did not.

This may or may not be related to your issue :-)

Cheers Douglas

On 7/02/2018, at 8:33 PM, wally4u notifications@github.com wrote:

Still being stonewalled by espressif. Basically stating I'm using incorrect firmware which caused the issue. Which is kind of funny since we did not modify the original factory firmware in the chips. Will update when I know more.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/espressif/ESP8266_NONOS_SDK/issues/92#issuecomment-363681699, or mute the thread https://github.com/notifications/unsubscribe-auth/ADkh-29ndhIp4UGN88Ak41UCozqqCrQ8ks5tSVGxgaJpZM4RpSDi.

wally4u commented 6 years ago

Hi @DouglasPearless thanks for the comment. I don't think/hope OTA is what is going wrong. We are currently not doing anything with the EPS8266 except sending it to deep sleep. The only commands used in order: AT AT+GMR AT+CIPSTAMAC? AT+CWQAP AT+GSLP=1200000

Reading the latest 2.2 release notes one of the items is "Fix issues of deep-sleep sleep 0 or sleep for a long time". Which is the only real feature we use. @FayeY Could you give some background on what was fixed here? And maybe what the definition is for "long time"?