PlummersSoftwareLLC / NightDriverStrip

NightDriver client for ESP32
https://plummerssoftwarellc.github.io/NightDriverStrip/
GNU General Public License v3.0
1.29k stars 210 forks source link

TTGO doesn't come up when flashed using Web Installer #367

Open sdmtr opened 11 months ago

sdmtr commented 11 months ago

Bug report

Problem Howdy! I've spent the last few hours trying to flash all manner of ESP32 devices with any and all versions of NDL available via the web installer, and each time I'm unable to set the wifi credentials. As an example, if I install the TTGO firmware to a Lilygo TTGO board then once the install is complete, I'm simply dumped back at the initial screen where my only two options are to install NDL or view the console. I did manage to get the wifi setup process to begin once by carefully timing when I clicked on the "connect" button (more about that below), but the process stalled when it reached the part where it searches for available wifi networks.

Steps

  1. Attempt to install NDL
  2. Observe your inability to set wifi credentials

Notes When I look at the console output, I can see that NDL attempts to retrieve wifi credentials from flash memory, notices that none are set, and therefore reboots. I believe (although I could be totally wrong) that this is where the problem lies, because the Improv module simply doesn't have enough time to connect to the board, retrieve the list of available SSIDs, and receive the credentials from the user, before the device reboots and the connection is dropped. Here's an excerpt of the console logs:

(W) (DrawLoopTaskEntry)(C1) Entering main draw loop!
(I) (setup)(C1) Calling ConnectToWifi()
(I) 
(I) (ConnectToWiFi)(C1) Setting host name to NightDriverStrip...WL_NO_SHIELD
(W) (ConnectToWiFi)(C1) WiFi Credentials not set, cannot connect
(I) (setup)(C1) Unable to connect to WiFi, but must have it, so rebooting...
(I) 
(E) (TerminateHandler)(C1) -------------------------------------------------------------------------------------
(E) (TerminateHandler)(C1) - NightDriverStrip Guru Meditation                              Unhandled Exception -
(E) (TerminateHandler)(C1) -------------------------------------------------------------------------------------
(I) (PrintOutputHeader)(C1) NightDriverStrip
(I) 
(I) (PrintOutputHeader)(C1) ------------------------------------------------------------------------------------------------------------
(I) (PrintOutputHeader)(C1) M5STICKC: 0, USE_M5DISPLAY: 0, USE_OLED: 0, USE_TFTSPI: 0, USE_LCD: 0, USE_AUDIO: 0, ENABLE_REMOTE: 0
(I) (PrintOutputHeader)(C1) Version 37: Wifi SSID: "" - ESP32 Free Memory: 171920, PSRAM:0, PSRAM Free: 0
(I) (PrintOutputHeader)(C1) ESP32 Clock Freq : 240 MHz
(E) (TerminateHandler)(C1) Terminated due to exception: Unable to connect to WiFi, but must have it, so rebooting

As mentioned above, the one time I was able to see an option to set the wifi credentials didn't come at the end of an installation, it came when I timed clicking the "connect" button such that the web installer connected to the board during a moment of time when the Improv serial module was up and responding. As far as I can tell, the installer makes an attempt to connect via Improv at the beginning of the session, and if it succeeds then it'll read the board settings (firmware version and hardware type) and display the option to set wifi credentials. This is why I think it's a timing issue and that the forced reboot is what's causing the problem in the first place.

Proposed Solution Don't force a reboot when wifi isn't available. As per the final line in the log above, it seems that NDL restarts the board as soon as it's unable to connect to wifi, either because credentials aren't set or the network isn't available. If that happens then there's no opportunity for the web installer to connect to the board via Improv and set the credentials, so the board is rendered useless.


(Also, I just want to say how excited I am about this project and how utterly cool it is. I've used WLED for a LOT of stuff over the last few years but it has a few idiosyncrasies that I don't love, and NDL looks like it's shaping up to be an incredible replacement going forward. I can't wait to get my hands on a Mesermerizer board and really see what it can do. Thank you so much for making this project available to us mere mortals, Dave.)

robertlipe commented 11 months ago

Welcome. What you're seeing isn't a controlled "hey, let's reboot now". That just a plain ole crash.

(E) (TerminateHandler)(C1) - NightDriverStrip Guru Meditation Unhandled Exception -

Unhandled exceptions are, well, bad.

How confident are you tha A) this firmware is appropriate for your board and B) there aren't power supply issues.

For (B), on most SBCs (well, those that don't have 8,000 light bulbs attached to them) the two most power hungry things are writing to flash and starting up the WiFi. If those happen and your power supply is too wimpy (an old phone charger, wires too small/long, etc.) the board will usually just crash and what you're describing is about the first case of both of those being ignited at the same time. So check your power. Attach a scope to VCC and trigger for < 4.8V or so.

For (A), it can just be a bit of frustrating hit and miss. https://web.esphome.io/ has a bunch of ESP32 binaries that'll boot a lot of boards, but it can still be frustrating to find what a random $4 asian board really corresponds to. It's a bit frustrating that that page leans harder on older hardware but is pretty scant for, say ESP32-S3 boards.

I think the 'nightdriver' target is pretty specific to Dave's boards as it relies on an exact combinatino of the mic being on these pins and the flash being on that pin and the remote on this pin and so on. I've not had great experiences booting it on a random board, but I've not really rolled up my sleeves to tackle why.

FWIW, if it halves your testing matrix, you can just whack the "USE_NETWORK' in the configuration when building firmware to see if that's a key variable. I have it turned off in my development work just because it takes up size and speed and I'm focused on quick testing. The info I need comes to the serial console anyway, so the web interface doesn't help me.

Welcome and good luck!

robertlipe commented 11 months ago

Just so you have a reference for A/B testing, here's a successful boot on a mesmerizer build on official Dave hardware:

rst:0x1 (POWERON_RESET),boot:0x13 (SPI_FAST_FLASH_BOOT) configsip: 0, SPIWP:0xee clk_drv:0x00,q_drv:0x00,d_drv:0x00,cs0_drv:0x00,hd_drv:0x00,wp_drv:0x00 mode:DIO, clock div:2 load:0x3fff0030,len:1184 load:0x40078000,len:13192 load:0x40080400,len:3028 entry 0x400805e4 E (927) esp_core_dump_flash: No core dump partition found! E (927) esp_core_dump_flash��r���ɕ�"յ�����ѥѥ���2�չ��jR�Replacing Idle Tasks with TaskManager... (I) (PrintOutputHeader)(C1) NightDriverStrip (I) (I) (PrintOutputHeader)(C1)

(I) (PrintOutputHeader)(C1) M5STICKC: 0, USE_M5DISPLAY: 0, USE_OLED: 0, USE_TFTSPI: 0, USE_LCD: 0, USE_AUDIO: 1, ENABLE_REMOTE: 1 (I) (PrintOutputHeader)(C1) ESP32 PSRAM Init: OK (I) (PrintOutputHeader)(C1) Version 37: Wifi SSID: "ElderOfTheInternet" - ESP32 Free Memory: 293528, PSRAM:4192059, PSRAM Free: 4187691 (I) (PrintOutputHeader)(C1) ESP32 Clock Freq : 240 MHz (I) (setup)(C1) Startup! (I) (setup)(C1) Starting DebugLoopTaskEntry

Launching JSON Writer Thread. Mem: 293492, LargestBlk: 110580, PSRAM Free: 4187691/4192059, (W) (DeviceConfig)(C1) DeviceConfig could not be loaded from JSON, using defaults (W) (NotifyJSONWriterThread)(C1) >> Notifying JSON Writer Thread

Starting SmartMatrix Mallocs Heap/32-bit Memory Available: 290068 bytes total, 110580 bytes largest free block 8-bit/DMA Memory Available : 241292 bytes total, 110580 bytes largest free block Total PSRAM used: 4368 bytes total, 4187691 PSRAM bytes free SmartMatrix Layers Allocated from Heap: Heap/32-bit Memory Available: 288592 bytes total, 110580 bytes largest free block

The "esp_core_dump_flash" thing looks scary. I'm pretty sure I know what the issue is and would fix it, but I can't find the source. :-) My PrintOutputHeader is different because I tweaked the Mesmerizer build as I described.

The PSRAM might be a clue. If you're running a build (like Mesmerizer) on a board that assumes less RAM and/or doesn't have external PSRAM, that's probably not good, though I don't know the precise symptoms.

On Wed, Jul 19, 2023 at 10:03 AM Robert Lipe @.***> wrote:

Welcome. What you're seeing isn't a controlled "hey, let's reboot now". That just a plain ole crash.

(E) (TerminateHandler)(C1) - NightDriverStrip Guru Meditation Unhandled Exception -

Unhandled exceptions are, well, bad.

How confident are you tha A) this firmware is appropriate for your board and B) there aren't power supply issues.

For (B), on most SBCs (well, those that don't have 8,000 light bulbs attached to them) the two most power hungry things are writing to flash and starting up the WiFi. If those happen and your power supply is too wimpy (an old phone charger, wires too small/long, etc.) the board will usually just crash and what you're describing is about the first case of both of those being ignited at the same time. So check your power. Attach a scope to VCC and trigger for < 4.8V or so.

For (A), it can just be a bit of frustrating hit and miss. https://web.esphome.io/ has a bunch of ESP32 binaries that'll boot a lot of boards, but it can still be frustrating to find what a random $4 asian board really corresponds to. It's a bit frustrating that that page leans harder on older hardware but is pretty scant for, say ESP32-S3 boards.

I think the 'nightdriver' target is pretty specific to Dave's boards as it relies on an exact combinatino of the mic being on these pins and the flash being on that pin and the remote on this pin and so on. I've not had great experiences booting it on a random board, but I've not really rolled up my sleeves to tackle why.

FWIW, if it halves your testing matrix, you can just whack the "USE_NETWORK' in the configuration when building firmware to see if that's a key variable. I have it turned off in my development work just because it takes up size and speed and I'm focused on quick testing. The info I need comes to the serial console anyway, so the web interface doesn't help me.

Welcome and good luck!

rbergen commented 11 months ago

There may be a "cleaner" way to trigger reboots on ESP32s, but the approach taken in this project is indeed to throw an std::runtime_error, which will then trigger a reboot. The last line of the log snippet posted by @sdmtr actually kind of illustrates this:

(E) (TerminateHandler)(C1) Terminated due to exception: Unable to connect to WiFi, but must have it, so rebooting

This reboot happening makes sense if the build has been configured to require WiFi. That is the case if both ENABLE_WIFI and WAIT_FOR_WIFI are defined as non-zero. In the "regular" project configurations as they stand, this is only the case for the LEDSTRIP project. That is obviously not the same as the TTGO project.

So:

  1. I'm not sure how one would end up in the code path that triggers the reboot if WiFi doesn't come up, if one was flashing the TTGO project.
  2. The case made in this issue is relevant for the LEDSTRIP project, and I'll give that some thought.
robertlipe commented 11 months ago

There may be a "cleaner" way to trigger reboots on ESP32s, but the approach

ESP.restart();

is used elsewhere in the project. Is it not available here?

My work tree is unhappy enough I can't currently submit a CL in good faith, but just plopping that into src/main.cpp compiles. Citation? Gladly.

https://docs.espressif.com/projects/esp-idf/en/latest/esp32/api-reference/system/misc_system_api.html

While it may seem uncomfortable to rely on ESP-specific magic, there are references to the IDF system libraries all over the tree - including in main.cc

Message ID: <PlummersSoftwareLLC/NightDriverStrip/issues/367/1642474342@

github.com>

rbergen commented 11 months ago

ESP.restart();

Yes, it had to be something as straight-forward like that, didn't it? 🙂

I think what I mentioned as "current project MO" is a case of semi-consistently applied legacy code. Which I'll instantly grant can be replaced by more modern/now recommended approaches. However, personally I'm not going to implement that change in the context of this issue. If someone else wants to open a PR to do so, I'm very happy to review it.

davepl commented 11 months ago

Is there a scenario in which it doesn’t restart? If the default behavior on an unhandled exception is to restart, then I think what we’re doing now is pretty clean.

We never actually restart - we just have a TerminateHandler to display a cute “Guru Meditation” and as a handy spot to set a breakpoint. We then rethrow that which was thrown. I don’t see an incentive to manually call restart,

Sheer elegance. Convince me otherwise :-)

On Jul 19, 2023, at 11:35 AM, Rutger van Bergen @.***> wrote:

ESP.restart();

Yes, it had to be something as straight-forward like that, didn't it? 🙂

I think what I mentioned as "current project MO" is a case of semi-consistently applied legacy code. Which I'll instantly grant can be replaced by more modern/now recommended approaches. However, personally I'm not going to implement that change in the context of this issue. However, if someone else wants to open a PR to do so, I'm very happy to review it.

— Reply to this email directly, view it on GitHub https://github.com/PlummersSoftwareLLC/NightDriverStrip/issues/367#issuecomment-1642568872, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA4HCF2YRV7RGRAOKMRJHGLXRASGBANCNFSM6AAAAAA2PTYVC4. You are receiving this because you are subscribed to this thread.

robertlipe commented 11 months ago

It's pretty clear that's a sucker's bet. No thanx, I'll pass.

I'll find a more exciting hill to die upon than answering a musing posted by another developer. If not actually rebooting after printing "rebooting..." is elegant, our zens just won't align on this.

Back to OP.

On Wed, Jul 19, 2023 at 1:39 PM David W Plummer @.***> wrote:

Is there a scenario in which it doesn’t restart? If the default behavior on an unhandled exception is to restart, then I think what we’re doing now is pretty clean.

We never actually restart - we just have a TerminateHandler to display a cute “Guru Meditation” and as a handy spot to set a breakpoint. We then rethrow that which was thrown. I don’t see an incentive to manually call restart,

Sheer elegance. Convince me otherwise :-)

  • Dave

On Jul 19, 2023, at 11:35 AM, Rutger van Bergen @.***> wrote:

ESP.restart();

Yes, it had to be something as straight-forward like that, didn't it? 🙂

I think what I mentioned as "current project MO" is a case of semi-consistently applied legacy code. Which I'll instantly grant can be replaced by more modern/now recommended approaches. However, personally I'm not going to implement that change in the context of this issue. However, if someone else wants to open a PR to do so, I'm very happy to review it.

— Reply to this email directly, view it on GitHub < https://github.com/PlummersSoftwareLLC/NightDriverStrip/issues/367#issuecomment-1642568872>, or unsubscribe < https://github.com/notifications/unsubscribe-auth/AA4HCF2YRV7RGRAOKMRJHGLXRASGBANCNFSM6AAAAAA2PTYVC4>.

You are receiving this because you are subscribed to this thread.

— Reply to this email directly, view it on GitHub https://github.com/PlummersSoftwareLLC/NightDriverStrip/issues/367#issuecomment-1642574165, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACCSD3ZLOXJLPQ7POCUXYGLXRASXRANCNFSM6AAAAAA2PTYVC4 . You are receiving this because you commented.Message ID: @.***>

sdmtr commented 11 months ago

What you're seeing isn't a controlled "hey, let's reboot now". That just a plain ole crash.

It's a crash only insofar as it's the result of an unhandled exception, but the exception in question is caused by the lack of wifi credentials, which there's no opportunity to provide. See the last line of the log excerpt:

(E) (TerminateHandler)(C1) Terminated due to exception: Unable to connect to WiFi, but must have it, so rebooting

FWIW, if it halves your testing matrix, you can just whack the "USE_NETWORK' in the configuration when building firmware to see if that's a key variable.

This issue is specific to the web installer, I'm not building from source. If I were then this wouldn't be a problem, the correct credentials would have been set in secrets.h, but this bug report is explicitly about the web installer and proposing a way to fix it so that other people don't run into this same problem when they're trying to get their boards up and running. The vast majority of people who will end up using NDL won't be building from source; as with WLED it'll be people who just want to click a button on a web installer to load a pre-compiled binary onto their ESP32 that they can immediately start using.

This reboot happening makes sense if the build has been configured to require WiFi. That is the case if both ENABLE_WIFI and WAIT_FOR_WIFI are defined as non-zero. In the "regular" project configurations as they stand, this is only the case for the LEDSTRIP project. That is obviously not the same as the TTGO project.

I gave the TTGO project just as an example, this happens for every board I've tested with and every version of the firmware available via the web installer that uses wifi. The issue isn't that the reboot is happening per se, that's intentional behaviour due to the unhandled exception being thrown; the issue is that if the board immediately reboots when wifi credentials aren't available then logically there will never be an opportunity to actually provide those credentials, making web installations effectively useless for anything other than projects that don't use wifi at all.

Is there a scenario in which it doesn’t restart?

No, all combinations I've tried enter this reboot loop as soon as it notices there are no wifi credentials available, making it effectively impossible to send it the credentials since the Improv service isn't alive long enough to communicate with the web installer. This seems to be intended behaviour, NDL intentionally throws an exception when it can't connect to wifi, and because that exception is unhandled the correct behaviour is to restart.

If the default behavior on an unhandled exception is to restart, then I think what we’re doing now is pretty clean.

Absolutely, rebooting on an unhandled exception is perfectly fine, the issue here though is that I don't believe this should be an unhandled exception in the first place - after all, how can Improv connect to the board to deliver the wifi credentials it needs if the firmware reboots the moment it discovers it has no credentials?

A better way to handle it might be to simply wait for credentials to be delivered, just sit in an idle loop until Improv receives the correct RPC via serial. Otherwise I can't see how it would ever be possible to use the web installer to successfully load NDL onto a board, the only possible way to do it would be to build NDL from source with credentials preloaded in secrets.h.

I should note that WLED also uses Improv in the same way you're using it, and their solution is to just continue booting up like normal and display whatever the default strip pattern is, while also exposing an access point that the user can connect to in order to use the web interface and continuing to listen for Improv RPCs over the serial port.

rbergen commented 11 months ago

I gave the TTGO project just as an example, this happens for every board I've tested with and every version of the firmware available via the web installer that uses wifi. The issue isn't that the reboot is happening per se, that's intentional behaviour due to the unhandled exception being thrown; the issue is that if the board immediately reboots when wifi credentials aren't available then logically there will never be an opportunity to actually provide those credentials, making web installations effectively useless for anything other than projects that don't use wifi at all.

@sdmtr What I'm saying is two things:

  1. The board rebooting because of a) no credentials being available and then b) instantly giving up on WiFi connectivity - and in fact, the whole boot - is counterproductive if there is still a path to getting the credentials. As there is in the web installer scenario. That's something I think we should indeed (re)consider.
  2. The board should not reboot for all projects that use WiFi. As I said in my previous comment, the one code path that leads to the reboot you provided logging of is only triggered (or actually, compiled in) when WiFi is enabled (ENABLE_WIFI is 1) and WAIT_FOR_WIFI is 1. This is handled by lines 629 to 637 in main.cpp. In the current project configurations, WAIT_FOR_WIFI is only defined to be 1 for the LEDSTRIP project, so that's the only project for which the WiFi-related reboots should happen.

@davepl About WAIT_FOR_WIFI, I basically see two options:

As I'm not sure why the "reboot if WiFi connectivity fails" behavior was originally implemented, I was wondering if you could give some input on this?

sdmtr commented 11 months ago

@rbergen RE your second point, you're absolutely right. My original comment mixes observations and logs from many different tests across different boards rather than focusing on one specific test, which has created some confusion and lead to some inaccuracies on my part, sorry about that.

I just did another handful of quick tests using just the TTGO board, and the ledstrip firmware is indeed the one that reboots when wifi credentials aren't present (as you correctly pointed out.) The TTGO firmware on the other hand seems to be experiencing a different problem, although I'm not sure what exactly. Here's the full log:

rst:0xc (SW_CPU_RESET),boot:0x13 (SPI_FAST_FLASH_BOOT)
configsip: 0, SPIWP:0xee
clk_drv:0x00,q_drv:0x00,d_drv:0x00,cs0_drv:0x00,hd_drv:0x00,wp_drv:0x00
mode:DIO, clock div:2
load:0x3fff0030,len:1184
load:0x40078000,len:13192
load:0x40080400,len:3028
entry 0x400805e4
E (655) esp_core_dump_flash: No core duy�partition found!
E (655) esp_core_dump_flash: No core dump partitionReplacing Idle Tasks with TaskManager...
(I) (PrintOutputHeader)(C1) NightDriverStrip
(I) 
(I) (PrintOutputHeader)(C1) ------------------------------------------------------------------------------------------------------------
(I) (PrintOutputHeader)(C1) M5STICKC: 0, USE_M5DISPLAY: 0, USE_OLED: 0, USE_TFTSPI: 1, USE_LCD: 0, USE_AUDIO: 1, ENABLE_REMOTE: 1
(I) (PrintOutputHeader)(C1) Version 37: Wifi SSID: "" - ESP32 Free Memory: 256532, PSRAM:0, PSRAM Free: 0
(I) (PrintOutputHeader)(C1) ESP32 Clock Freq : 240 MHz
(I) (setup)(C1) Startup!
(I) (setup)(C1) Starting DebugLoopTaskEntry
>> Launching Debug Thread.  Mem: 256532, LargestBlk: 110580, PSRAM Free: 0/0, >> Launching JSON Writer Thread.  Mem: 253560, LargestBlk: 110580, PSRAM Free: 0/0, (W) (DeviceConfig)(C1) DeviceConfig could not be loaded from JSON, using defaults
(W) (NotifyJSONWriterThread)(C1) >> Notifying JSON Writer Thread
(W) (setup)(C1) Starting ImprovSerial
(W) (ReadWiFiConfig)(C1) Retrieved SSID and Password from NVS: , ********
E (588) gpio: GPIO can only be used as input mode
[   599][E][esp32-hal-gpio.c:130] __pinMode(): GPIO config failed
E (592) gpio: gpio_set_level(226): GPIO output gpio_num error
E (597) gpio: GPIO can only be used as input mode
[   612][E][esp32-hal-gpio.c:130] __pinMode(): GPIO config failed
E (607) gpio: gpio_set_level(226): GPIO output gpio_num error
(W) (setup)(C1) Creating TFT Screen
(W) (setup)(C1) Allocating LEDStripGFX for channel 0
(I) (setup)(C1) Could allocate 26 buffers but limiting it to 20
(I) 
(W) (setup)(C1) Reserving 20 LED buffers for a total of 46720 bytes...
(I) (setup)(C1) Adding LEDs to FastLED...
(I) (setup)(C1) Adding 768 LEDs to FastLED.
(W) (InitEffectsManager)(C1) InitEffectsManager...
(I) (InitEffectsManager)(C1) Creating EffectManager using default effects
>> Launching Drawing Thread.  Mem: 193960, LargestBlk: 110580, PSRAM Free: 0/0, (W) (DrawLoopTaskEntry)(C1) >> DrawLoopTaskEntry
(W) 
(W) (DrawLoopTaskEntry)(C1) Entering main draw loop!
>> Launching Screen Thread.  Mem: 186808, LargestBlk: 110580, PSRAM Free: 0/0, >> Launching Audio Thread.  Mem: 183872, LargestBlk: 110580, PSRAM Free: 0/0, >> Launching Remote Thread.  Mem: 179132, LargestBlk: 110580, PSRAM Free: 0/0, (I) (AudioSamplerTaskEntry)(C0) >>> Sampler Task Started
(W) (begin)(C1) Remote Control Decoding Started
>> Launching Network Thread.  Mem: 174528, LargestBlk: 110580, PSRAM Free: 0/0, [  1720][E][ESPmDNS.cpp:65] begin(): Failed starting MDNS
Error starting mDNS
>> Launching ColorData Thread.  Mem: 169540, LargestBlk: 110580, PSRAM Free: 0/0, (E) (TerminateHandler)(C0) -------------------------------------------------------------------------------------
(E) (TerminateHandler)(C0) - NightDriverStrip Guru Meditation                              Unhandled Exception -
(E) (TerminateHandler)(C0) -------------------------------------------------------------------------------------
(I) (PrintOutputHeader)(C0) NightDriverStrip
(I) 
(I) (PrintOutputHeader)(C0) ------------------------------------------------------------------------------------------------------------
>> Launching Socket Thread.  Mem: 166476, LargestBlk: 110580, PSRAM Free: 0/0, (I) (PrintOutputHeader)(C0) M5STICKC: 0, USE_M5DISPLAY: 0, USE_OLED: 0, USE_TFTSPI: 1, USE_LCD: 0, USE_AUDIO: 1, ENABLE_REMOTE: 1
(I) (PrintOutputHeader)(C0) Version 37: Wifi SSID: "" - ESP32 Free Memory: 162132, PSRAM:0, PSRAM Free: 0
(I) (PrintOutputHeader)(C0) ESP32 Clock Freq : 240 MHz
I) (PrintOutputHeader)(C0) ESP32 Clock Freq : 240 MHz
 Free Mem
abort() was called at PC 0x4018f1b4 on core 0

Backtrace: 0x4008512d:0x3ffda7a0 0x40090169:0x3ffda7c0 0x40095da1:0x3ffda7e0 0x4018f1b4:0x3ffda860 0x4018f206:0x3ffda880 0x4018f59b:0x3ffda8a0 0x400828cd:0x3ffda8c0 0x40082965:0x3ffda8e0

ELF file SHA256: 8b20cee5cd509615

E (2306) esp_core_dump_flash: Core dump flash config is corrupted! CRC=0x7bd5c66f instead of 0x0
Rebooting...

For the sake of clarity, this is an authentic Lilygo TTGO ESP32-DOWDQ6 board, and I selected "ESP32" as the device type and "TTGO" as the project in the web installer interface.

rbergen commented 11 months ago

Thanks @sdmtr for clearing this up. It does help focus the analysis of the problems (now plural) we are investigating. Putting the no-WiFi reboot aside for now - I think we now know what we're looking at there - I'd say the logging on the TTGO crash doesn't provide too many insights as to what's going on. The "abort()" mention at the bottom of the log doesn't help much either, as the C++ code in the project doesn't call any function by that name.

I don't own any TTGO boards myself, so I can't compare what you're seeing to anything useful at my end - maybe @davepl can. A question I do have is if you've tried flashing the board using the PlatformIO route? I know the issue relates to the web installer specifically, but trying to flash the board the other way may well narrow the area that needs to be covered while investigating this.

davepl commented 11 months ago

Thing is, I’m not aware of any instances in which we reboot that we COULD continue. About the only case of “intentional” reboot is when wifi can’t be acquired, and that’s exceptional, so it’s an exception.

As far as I know, our Improv codepath works the same way, doesn’t it?

Let me now which specific scenario you’re thinking of that should improve.

On Jul 19, 2023, at 11:46 PM, sdmtr @.***> wrote:

What you're seeing isn't a controlled "hey, let's reboot now". That just a plain ole crash.

It's a crash only insofar as it's the result of an unhandled exception, but the exception in question is caused by the lack of wifi credentials, which there's no opportunity to provide. See the last line of the log excerpt:

(E) (TerminateHandler)(C1) Terminated due to exception: Unable to connect to WiFi, but must have it, so rebooting FWIW, if it halves your testing matrix, you can just whack the "USE_NETWORK' in the configuration when building firmware to see if that's a key variable.

This issue is specific to the web installer, I'm not building from source. If I were then this wouldn't be a problem, the correct credentials would have been set in secrets.h, but this bug report is explicitly about the web installer and proposing a way to fix it so that other people don't run into this same problem when they're trying to get their boards up and running. The vast majority of people who will end up using NDL won't be building from source; as with WLED it'll be people who just want to click a button on a web installer to load a pre-compiled binary onto their ESP32 that they can immediately start using.

This reboot happening makes sense if the build has been configured to require WiFi. That is the case if both ENABLE_WIFI and WAIT_FOR_WIFI are defined as non-zero. In the "regular" project configurations as they stand, this is only the case for the LEDSTRIP project. That is obviously not the same as the TTGO project.

I gave the TTGO project just as an example, this happens for every board I've tested with and every version of the firmware available via the web installer that uses wifi. The issue isn't that the reboot is happening per se, that's intentional behaviour due to the unhandled exception being thrown; the issue is that if the board immediately reboots when wifi credentials aren't available then logically there will never be an opportunity to actually provide those credentials, making web installations effectively useless for anything other than projects that don't use wifi at all.

Is there a scenario in which it doesn’t restart?

No, all combinations I've tried enter this reboot loop as soon as it notices there are no wifi credentials available, making it effectively impossible to send it the credentials since the Improv service isn't alive long enough to communicate with the web installer. This seems to be intended behaviour, NDL intentionally throws an exception when it can't connect to wifi, and because that exception is unhandled the correct behaviour is to restart.

If the default behavior on an unhandled exception is to restart, then I think what we’re doing now is pretty clean.

Absolutely, rebooting on an unhandled exception is perfectly fine, the issue here though is that I don't believe this should be an unhandled exception in the first place - after all, how can Improv connect to the board to deliver the wifi credentials it needs if the firmware reboots the moment it discovers it has no credentials?

A better way to handle it might be to simply wait for credentials to be delivered, just sit in an idle loop until Improv receives the correct RPC via serial. Otherwise I can't see how it would ever be possible to use the web installer to successfully load NDL onto a board, the only possible way to do it would be to build NDL from source with credentials preloaded in secrets.h.

I should note that WLED also uses Improv in the same way you're using it, and their solution is to just continue booting up like normal and display whatever the default strip pattern is, while also exposing an access point that the user can connect to in order to use the web interface and continuing to listen for Improv RPCs over the serial port.

— Reply to this email directly, view it on GitHub https://github.com/PlummersSoftwareLLC/NightDriverStrip/issues/367#issuecomment-1643362712, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA4HCF5NC5MNRQ2GZE4ZDILXRDH3DANCNFSM6AAAAAA2PTYVC4. You are receiving this because you commented.

rbergen commented 11 months ago

My question comes from the fact that we only treat WiFi not connecting as an exception worthy of rebooting in the LEDSTRIP project, not any of the others. In all other projects, we continue trying to connect in the main.cpp loop() every so many seconds.

Rebooting immediately after establishing that no credentials are present, as LEDSTRIP does, keeps the user from providing credentials via Improv. That means that for LEDSTRIP, the correct credentials have to be embedded into the image (i.e. secrets.h) for the image to work.

davepl commented 11 months ago

Oh, ok, that’d be a bug. Please raise an issue specifically for that, and I’ll take it.

The reason LEDSTRIP reboots is that it’s remote-only, so no wifi, it’s dead in the water. But it needs to survive long enough to at least be able to set credentials!

On Jul 20, 2023, at 8:27 AM, Rutger van Bergen @.***> wrote:

My question comes from the fact that we only treat WiFi not connecting as an exception worthy of rebooting in the LEDSTRIP project, not any of the others. In all other projects, we continue trying to connect in the main.cpp loop() every so many seconds.

Rebooting immediately after establishing that no credentials are present, as LEDSTRIP does, keeps the user from providing credentials via Improv. That means that for LEDSTRIP, the correct credentials have to be embedded into the image (i.e. secrets.h) for the image to work.

— Reply to this email directly, view it on GitHub https://github.com/PlummersSoftwareLLC/NightDriverStrip/issues/367#issuecomment-1644136521, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA4HCF7JC2C6PGLNZPCCUD3XRFE7VANCNFSM6AAAAAA2PTYVC4. You are receiving this because you were mentioned.

rbergen commented 11 months ago

I've opened #371 for the LEDSTRIP reboot issue, and renamed this one to focus on TTGO failing to come up.