Aircoookie / WLED

Control WS2812B and many more types of digital RGB LEDs with an ESP8266 or ESP32 over WiFi!
https://kno.wled.ge
MIT License
14.28k stars 3.05k forks source link

WLED Crashes and Reboots #3609

Closed Doyle4 closed 6 months ago

Doyle4 commented 6 months ago

What happened?

Changing colours can cause WLED to crash and reboot (Lost connection to device warning).

To Reproduce Bug

Crashed on me when using Bouncing Balls effect and selecting a new colour. Crashed when Aura was selected and tried to increase brightness in the effect settings slider. Meteor Smooth Crashed when changed colour

Expected Behavior

Not to crash and reboot and effect colour changes

Install Method

Self-Compiled

What version of WLED?

0.14.1.b1

Which microcontroller/board are you seeing the problem on?

ESP8266

Relevant log/trace output

No response

Anything else?

Downgraded to Gold Release of 14.0 and everything working as expected. Says I self compiled which I didn't unless that means also downloading the file from here and installing, wasn't edited in anyway nor did I install it via the web installer.

Code of Conduct

blazoncek commented 6 months ago

Sorry, cannot reproduce. Please use debug build and post crash dump.

softhack007 commented 6 months ago

🤔 the common part of this problem description is performing UI actions (changing color, change global brightness) on 8266. Depending on how the UI sliders were used ("dragged" or "tapping") this could create a load of WS messages.

We have a known issue on 8266 when many UI events are received in short time, and memory is low. Maybe these problems are related: #3443, #3458, #3382, #3492

Doyle4 commented 6 months ago

I just find it strange how downgrading has fixed it, updating to latest beta release same issue occurs. Will try the debug build and see if it happens and post any logs.

ihavenonick commented 6 months ago

I can confirm the problem. If I want to create a preset for candle multi, for example, and then change the colors or brightness, the D1 mini (ESP8266) loses the connection. If that still works, you can't start the preset with the quick load button.

This problem only occurs in 0.14.1-b1, it is not a problem with 0.14.0.

blazoncek commented 6 months ago

Pleas check available heap prior to crash and report. It would also be helpful if you'd post configuration (cfg.json) and presets (preset.json)

ihavenonick commented 6 months ago

wled_cfg_Schlafzimmer Wand.json wled_presets_Schlafzimmer Wand.json

blazoncek commented 6 months ago

I was able to reproduce, though it is sporadic and is not consistently reproducible. It looks like temporary RAM depletion.

--------------- CUT HERE FOR EXCEPTION DECODER ---------------

Exception (3):
epc1=0x40102ac5 epc2=0x00000000 epc3=0x40103428 excvaddr=0x4002b9b1 depc=0x00000000

LoadStoreError: Processor internal physical address or data error during load or store
  epc1=0x40102ac5 in umm_malloc_core at umm_malloc.cpp:?
  epc3=0x40103428 in _notifyPWM at core_esp8266_waveform_pwm.cpp:?

>>>stack>>>

ctx: cont
sp: 3ffffd30 end: 3fffffd0 offset: 0150
3ffffe80:  31353101 00003933 3ffffe76 3ffe8758  
3ffffe90:  3fff3000 402552f0 00000020 40102d4c  
3ffffea0:  3fff3000 3fff228c 00000010 40266c93  
3ffffeb0:  3ffe9d39 3fff2e60 3fff353c 402552f0  
3ffffec0:  4026357c 3ffe9d37 3fff353c 3ffe8758  
3ffffed0:  00000763 00000005 3fff353c 40264b74  
3ffffee0:  3fff3000 3fff2e60 3fff353c 4024d215  
3ffffef0:  3fff41ec 00000763 fe46a8c0 3fff26ec  
3fffff00:  3fff045c 3fff2c6c 3fff045c 4023efe1  
3fffff10:  00003b58 00000000 153f7ced 00000763  
3fffff20:  3fff0300 402704d8 8a46a8c0 3fff3750  
3fffff30:  3fff3000 00000001 3fff2ffc 402379c8  
3fffff40:  3fffdad0 00000000 3fff3724 40237a0e  
3fffff50:  3fff0448 00000000 3fff2f64 3fff3750  
3fffff60:  3fffdad0 00000000 3fff3724 40247f74  
3fffff70:  402704d8 8a46a8c0 3fff353c 40264b74  
3fffff80:  00000000 0000000e 0009b538 3fff0300  
3fffff90:  00000000 0011001f 4026e0b0 3fff3750  
3fffffa0:  3fffdad0 00000000 3fff3724 3fff3750  
3fffffb0:  3fffdad0 00000000 3fff3724 402670a0  
3fffffc0:  feefeffe feefeffe 3fffdab0 40101f01  
<<<stack<<<

0x402552f0 in AsyncWebSocket::makeBuffer(unsigned int) at ??:?
0x40102d4c in malloc at ??:?
0x40266c93 in operator new(unsigned int) at ??:?
0x402552f0 in AsyncWebSocket::makeBuffer(unsigned int) at ??:?
0x4026357c in HardwareSerial::write(unsigned char const*, unsigned int) at ??:?
0x40264b74 in Print::println(unsigned int, int) at ??:?
0x4024d215 in sendDataWs(AsyncWebSocketClient*) at ??:?
0x4023efe1 in NetworkClass::isConnected() at ??:?
0x402704d8 in StreamNull::~StreamNull() at ??:?
0x402379c8 in updateInterfaces(unsigned char) at ??:?
0x40237a0e in handleTransitions() at ??:?
0x40247f74 in WLED::loop() at ??:?
0x402704d8 in StreamNull::~StreamNull() at ??:?
0x40264b74 in Print::println(unsigned int, int) at ??:?
0x4026e0b0 in std::_Function_handler<void (ota_error_t), WLED::setup()::{lambda(ota_error_t)#2}>::_M_manager(std::_Any_data&, std::_Function_handler<void (ota_error_t), WLED::setup()::{lambda(ota_error_t)#2}> const&, std::_Manager_operation) at wled.cpp:?
0x402670a0 in loop_wrapper() at core_esp8266_main.cpp:?
0x40101f01 in cont_wrapper at ??:?

--------------- CUT HERE FOR EXCEPTION DECODER ---------------

If you do not need websockets and/or MQTT please compile ESP8266 version without websockets and MQTT to free some RAM.

Doyle4 commented 6 months ago

Forgot to mention I'm using a D1 Mini also as I see someone mentioned and managed to reproduce. Will compile with latest beta without websockets etc.

Thanks to all looking into this.

jcPOLO commented 6 months ago

it happened to me with an ESP32 too. I will try to give more information in a few days.

fribse commented 6 months ago

I'm seeing this as mentioned in #3613, this is on a brand new d1 mini esp32, mounted in a DigUno with a 5V 6A PSU. I rebuilt the config, so that's brand new, but my presets were backed up and restored from the previous esp8266. I've attached my presets here: wled_presets_Christmas tree.json

blazoncek commented 6 months ago

If you want a speedy resolution, get a debug build and post crash dump something similar to above.

fribse commented 6 months ago

Hi @blazoncek I don't have any urgency, just trying to help the little I can, I don't have time to do proper debug on this as well, too many projects already. I just tried going to factory reset, and then I added glitter for the 150 LED's, and as soon as I slowed it way down it crashed. I look forward to the b2 of the firmware...

willmmiles commented 6 months ago

I ran in to an issue just like this yesterday and tracked it back to the Segment backup copy in deserializeSegment incorrectly free'ing the original Segment's FX data, resulting in a use-after-free that corrupted the heap -- ie. exactly the issue @blazoncek just posted a fix for with 5ebc345. I was going to send a PR today but it looks like it's already been taken care of. I can confirm that that patch fixed it for me.

blazoncek commented 6 months ago

@willmmiles you seem capable. care to help?

willmmiles commented 6 months ago

Sure, what do you need? As far as I can tell, the patch you've written does the trick for fixing the heap crashes following UI config updates.

zigomatichub commented 6 months ago

On d1mini, 50led ws2801, power on. Effect is candle multi via a pre-configured preset Change color to another color Then crashing. That may help to reproduce.

blazoncek commented 6 months ago

Sure, what do you need?

Nothing in particular but we'd need people that understand the code. There are plenty of TODOs in the code. Contact me on Discord if you have time to spare for WLED.

Ucsus commented 6 months ago

Same problem.

blazoncek commented 6 months ago

@zigomatichub @Ucsus please read above. Fix has been committed.

orichienal commented 6 months ago

hi got same problem with esp8266 d1 mini and Candle Multi with red and orange. Where can i find a debug bin to check

thx

Doyle4 commented 6 months ago

hi got same problem with esp8266 d1 mini and Candle Multi with red and orange. Where can i find a debug bin to check

thx

This has been fixed in latest master source, so debug not needed, debug firmware is the same, just used to log data so can find the issue more easily. You need to download the latest source file and compile the firmware yourself. If not able to create your own, downgrade to Master 0.14.0 release until 0.14.1 Master is released. :)

fribse commented 6 months ago

Where can i find a debug bin to check

Get the B2, it looks like it's fixed

orichienal commented 6 months ago

Got the B2 installed and let it run the whole night with candle multi and its still "burning", nice great job and thanks for the work

orichienal commented 6 months ago

I think the joy was premature and too great, but unfortunately I still have the problem, now it takes longer until it occurs, but after a certain time it restarts and then lights up in standard orange. Can anyone else confirm this?

willmmiles commented 6 months ago

I think the joy was premature and too great, but unfortunately I still have the problem, now it takes longer until it occurs, but after a certain time it restarts and then lights up in standard orange. Can anyone else confirm this?

I'm also observing occasional reboots on my ESP8266 setup. I think it's a different issue, though. I've been trying to pin it down for about a week -- it's definitely not related to the FX or transition logic, nor is it a heap exhaustion issue (I've enabled the allocator instrumentation to be sure) -- though applying heap pressure does seem to make it more likely to occur, which leads me to believe we might be looking for another use-after-free somewhere in the network layers.

blazoncek commented 6 months ago

Thanks @willmmiles for troubleshooting.

ATM I am afraid all of the bells and whistles we added to 0.14 may be a bit too much (paired with newer ESP core needed for 0.14) for poor ESP8266. If possible switch to ESP32 or use WLED 0.13.3 for the time being.

willmmiles commented 6 months ago

I'm beginning to suspect it's an internal bug in the newer ESP core. Still investigating; these "hard wdt" crashes don't have much to say on the console, can take a long time to reproduce, and I'm still learning how to elicit more useful debugging information.

blazoncek commented 6 months ago

@willmmiles We can see heap corruption (#3641 ) on some ESP8266 which happens somewhere in TCP code. I'd be glad if your expertise can help.