Closed ThomasFarstrike closed 8 months ago
With the deadline of Feb 25, 2024 only 6 days away and to have this ready in time for Bitcoin Atlantis, I'm thinking of starting work on this issue. Unless there are others working on it? Please speak up!
I implemented the above.
Initially, I used the typical ESP32 "task" watchdog, but if that one triggers a restart, it's not knowable from rtc_get_reset_reason(). So I switched to the more unusual and convoluted "RTC watchdog", which is normally used by the lower-level ESP32 boot functions to detect hung boots, but can be repurposed.
More info: https://docs.espressif.com/projects/esp-idf/en/stable/esp32/api-reference/system/wdts.html
I also spent a lot of effort in getting it to work without writing any state (boot counters etc) to the flash memory, because that has limited (as low as 10k?) write cycles. Also NVM and EEPROM were out of the question because these are also implemented in dedicated flash regions on the ESP32.
I found "noinit DRAM" in the docs which is an area of RAM which is preserved across watchdog restarts BUT not across deepsleeps. Then I found RTC_DATA_ATTR memory, which is preserved across deepsleeps, but not across watchdog restarts. In the end, I used both of these concepts in tandem, moving state from one variable to the other at the right times, to achieve persistence across both occurrences.
More info: https://docs.espressif.com/projects/esp-idf/en/latest/esp32/api-guides/memory-types.html
Then I also did a lot of wireless network testing with wrong access point names, wrong passwords, wrong encryption types, and after a lot of wireless event callback parsing, was able to convert those protocol-level issues into usable feedback for the user, on the display. This is a bit out of scope for this issue, but it should help the users debug the most common wifi issues more easily.
This is ready and deployed in v2.0.0 in the webinstaller.
Throughout the piggy code, there are several places where infinite retries are attempted, and this can drain the battery if it keeps failing.
Examples:
To fix these issues, better error handling of these specific cases would be good, where possible. For example, if the wifi credentials are wrong, the user should be notified on the screen.
Additionally, to prevent anything from causing the ESP32 to get stuck forever, the ESP32 watchdog should be activated and programmed. This will ensure the board reboots in case of an exceptionally long action.
To do it properly and prevent infinite watchdog reboots from draining the battery, a watchdog reboot counter should be kept somewhere. This watchdog reboot counter should be incremented in cases of a "watchdog" reset cause. And it should be reset to 0 in case of a regular (non-watchdog triggered) reboot.
If the watchdog reboot counter exceeds some configured value (example: 3) then the device should immediately go into a long sleep/hibernate (example: 6 hours) so that it wakes up at a time when whatever is causing the problem might hopefully be resolved.