Closed jwaeles closed 4 months ago
Please use debug build and exception decoder to help out with crash diagnosis.
Please use debug build and exception decoder to help out with crash diagnosis.
Is there a debug build binary, or do i have to build it myself? i looked at exception decoder, seems like it's an arduino plugin? but i also looked at building wled, and now it's only supported on platformIO? so how can I decode a platformio binary with arduino IDE... i'm confused or i need some pointers
You'll need to compile yourself to get a meaningful output from exception decoder.
Use something similar in your platformio.ini
file:
[env:debug]
extends = env:esp32dev
monitor_filters = esp32_exception_decoder
build_flags = ${common.build_flags_esp32}
-D WLED_DEBUG
... any other build flags
Then just use PIO's monitor tool.
also looked at building wled, and now it's only supported on platformIO?
Yes, you need VSCode+platformio for building + installing wled from source code. The KB has some guidance for getting started:
Thank you, it's built with debug enabled and running, we should soon find out whether it produces any useful output. I do see some extra logging, so that's a good start...
I had to disable the debug output, as i suspect it delayed execution a little and prevented my bug to reproduce.... ran 24 hours, lost wifi a bunch of times (another issue?) but didn't see a crash.
Until I removed WLED_DEBUG.
Then, only 2 hours in, got this nice stacktrace
Guru Meditation Error: Core 0 panic'ed (LoadProhibited). Exception was unhandled.
Core 0 register dump:
PC : 0x40157372 PS : 0x00060930 A0 : 0x8014e915 A1 : 0x3ffb5da0
A2 : 0x3ffdae70 A3 : 0x00450008 A4 : 0x0000004e A5 : 0x00000000
A6 : 0x3ffb52a8 A7 : 0x00000000 A8 : 0x00702903 A9 : 0x00702929
A10 : 0x00000000 A11 : 0x0000030a A12 : 0x0070261f A13 : 0x00000b38
A14 : 0x00060920 A15 : 0x00000000 SAR : 0x00000019 EXCCAUSE: 0x0000001c
EXCVADDR: 0x00450018 LBEG : 0x4000c2e0 LEND : 0x4000c2f6 LCOUNT : 0xffffffff
ELF file SHA256: 0000000000000000
Backtrace: 0x4015736f:0x3ffb5da0 0x4014e912:0x3ffb5dd0 0x4011e33a:0x3ffb5df0 0x4014974d:0x3ffb5e10 0x4008b89e:0x3ffb5e40
#0 0x4015736f:0x3ffb5da0 in tcp_output at /home/cschwinne/esp32-arduino-lib-builder/esp-idf/components/lwip/lwip/src/core/tcp_out.c:1025
#1 0x4014e912:0x3ffb5dd0 in tcp_recved at /home/cschwinne/esp32-arduino-lib-builder/esp-idf/components/lwip/lwip/src/core/tcp.c:1765
#2 0x4011e33a:0x3ffb5df0 in _tcp_recved_api(tcpip_api_call_data*) at .pio\libdeps\debug\AsyncTCP\src/AsyncTCP.cpp:1153
#3 0x4014974d:0x3ffb5e10 in tcpip_thread at /home/cschwinne/esp32-arduino-lib-builder/esp-idf/components/lwip/lwip/src/api/tcpip.c:483
#4 0x4008b89e:0x3ffb5e40 in vPortTaskWrapper at /home/cschwinne/esp32-arduino-lib-builder/esp-idf/components/freertos/port.c:355 (discriminator 1)
Rebooting...
Not in WLED code. Check your MQTT broker. There was an issue with old Windows implementation of Mosquitto broker in the past.
Here is another one, still from the same build
CORRUPT HEAP: Bad head at 0x3ffde190. Expected 0xabba1234 got 0x3ffde608
abort() was called at PC 0x4008eb39 on core 0
ELF file SHA256: 0000000000000000
Backtrace: 0x40089af8:0x3ffb5d10 0x40089e55:0x3ffb5d30 0x4008eb39:0x3ffb5d50 0x4008543a:0x3ffb5d70 0x40085805:0x3ffb5d90 0x4000bec7:0x3ffb5db0 0x4016def2:0x3ffb5dd0 0x4016df29:0x3ffb5df0 0x4014974d:0x3ffb5e10 0x4008b89e:0x3ffb5e40
#0 0x40089af8:0x3ffb5d10 in invoke_abort at /home/cschwinne/esp32-arduino-lib-builder/esp-idf/components/esp32/panic.c:648
#1 0x40089e55:0x3ffb5d30 in abort at /home/cschwinne/esp32-arduino-lib-builder/esp-idf/components/esp32/panic.c:648
#2 0x4008eb39:0x3ffb5d50 in multi_heap_free at /home/cschwinne/esp32-arduino-lib-builder/esp-idf/components/heap/multi_heap_poisoning.c:321
#3 0x4008543a:0x3ffb5d70 in heap_caps_free at /home/cschwinne/esp32-arduino-lib-builder/esp-idf/components/heap/heap_caps.c:232
#4 0x40085805:0x3ffb5d90 in _free_r at /home/cschwinne/esp32-arduino-lib-builder/esp-idf/components/newlib/syscalls.c:42
#5 0x4000bec7:0x3ffb5db0 in ?? ??:0
#6 0x4016def2:0x3ffb5dd0 in _udp_pcb_deinit at /home/cschwinne/esp32-arduino-lib-builder/esp-idf/components/mdns/mdns_networking.c:202
#7 0x4016df29:0x3ffb5df0 in _mdns_pcb_deinit_api at /home/cschwinne/esp32-arduino-lib-builder/esp-idf/components/mdns/mdns_networking.c:267
#8 0x4014974d:0x3ffb5e10 in tcpip_thread at /home/cschwinne/esp32-arduino-lib-builder/esp-idf/components/lwip/lwip/src/api/tcpip.c:483
#9 0x4008b89e:0x3ffb5e40 in vPortTaskWrapper at /home/cschwinne/esp32-arduino-lib-builder/esp-idf/components/freertos/port.c:355 (discriminator 1)
Rebooting...
What confuses me is that WLED_0.13.3_ESP32 doesn't crash and doesn't disconnect from the wifi. I have 8 different WLED 0.13.3 running on ESP32 and they all have an uptime of 14 days (last power outage). All my 3-4 instances of WLED 0.14 are rebooting, drop out of the network, glitch on the output after a couple of days. They all use the same WiFi access point and the same broker (mosquitto 2.0.15 on linux (docker))
I've just updated mosquitto to the latest version available on dockerhub, 2.0.18, we'll see if WLED still crashes
Very likely related to #3641
Since i have updated mosquitto, i haven't seen any stack trace. I however can't test for long, as after a few hours, i always loose WiFi connectivity with 0.14*; I will check later if someone reported a bug on that. Once WiFi is lost, obviously there is no chance for the network packets to be malformed or misread, since there are none reaching the IP stack.
However, the corrupted heap crash was occurring much earlier than when the WiFi dropped, so the stability issues are probably indeed related to the broker, and/or to mdns (which i saw mentioned in one of the stack traces)
If not yet, please use 0.14.1-b3 It may be relevant.
EDIT: WiFi issues are not WLED related but rather your network set-up/hardware.
EDIT: WiFi issues are not WLED related but rather your network set-up/hardware.
I get your point of view, but like i said earlier, i have 8x wled instances on 0.13.x running for very long without any wifi connectivity issues. My 4x ESP32's with 0.14.1-b2 all drop consistently from the wifi. The ESP32's are sourced from 3 different vendors (some are standard esp32 dev board, 3 are quinled dig-uno/quad, 3 are my own PCB with ESP32 assembled by JLCPCB). Only those with 0.14 lose connectivity and can't recover until I powercycle them.
I know on my network i have a very short DHCP lease time (15 minutes), i had forgotten it from some older network manipulation i was doing, but all the ESP32 & ESP8266 running WLED 0.13.x + shelly + amazon echo + sonos + ... that are on this network are happily staying connected since forever... except for all occurences of WLED 0.14 which consistently lose wifi after a few hours. I don't want to change the DHCP lease time until I get to the bottom of this issue.
Access points are Ubiquiti Unifi 6 something, can't remember the exact model, but pretty much top of the line for 2 years ago, and indeed my wifi coverage is pretty solid since i installed those. Router/DHCP server is Netgate, also pretty much top of the line. Both access points and router were recently rebooted and seem snappy and happy.
I know you get a lot of users with weird setups coming to nag here, but please don't dismiss so fast, because from my analysis, all clues point to the version of WLED running on the ESP32.
Hi, the two crashes both happen deep inside the TCP and UDP core, without any WLED source code in the trace.
The second crash (with multi_heap_free()
) could be a consequence of low memory and heap fragmentation. WLED 0.14.x needs more RAM than 0.13.x - due to added features.
To preserve memory, it usually helps to disable some "bells and whistles" - like
-DWLED_DISABLE_WEBSOCKETS -DWLED_DISABLE_ADALIGHT -DWLED_DISABLE_MQTT -DWLED_DISABLE_ESPNOW
👉 Did you try with the latest beta 0.14.1-b3? We have fixed some use-after-free problems recently, so the latest beta might behave better.
As last resort, you could wipe your device completely with esptool erase_flash
, then re-install from the development environment. This sometimes improves wifi connectivity - don't know why but it sometimes helps. Make sure to backup config & presets before esptool erase_flash
.
Finally, some wifi problems go away when using a newer espressif framework - buildenv esp32dev_V4_dio80. The "V4" environment is still experimental for classic esp32, due to limited testing. It will also increases firmware size by 300kB so might not always fit into 4MB flash.
I did not dismiss you out of blue. I have 30+ WLED instances, from ESP01 to ESP32 (including variants C3 and S2) on various controllers including QuinLED, Shields and other pre-assembled devices. And except one ESP01, 30cm from AP (UAP-AC-M), all have zero WiFi issues and never loose connectivity. My network consists of, like yours, Ubiquiti UniFi and EdgeRouter.
So I will insist on WiFi or other network traffic issues which WLED cannot solve. For clarification: network parts have not been modified since 0.12. The only addition was a signal strength fix for newer ESP32 models like C3,S2 & S3 which is a compile time option.
FYI having "Fast roaming" or BSS Transition enabled is known to cause issues with non-compilant hardware. WLED does not support those protocols.
This sometimes improves wifi connectivity - don't know why but it sometimes helps. Make sure to backup config & presets before
esptool erase_flash
.
A newer bootloader may be needed as it initialises hardware prior to firmware. If your devices have old bootloader (pre 0.13) then they may need bootloader update.
As an update, I have used https://wled-install.github.io/ and flashed the version "Standard version 0.14.1 V4 (ESP IDF 4.4.3 based, experimental, should resolve reboot issues)" and so far it seems stable. I was losing connectivity or seeing reboots much much faster, and so far it's running 24h and still online, responsive and snappy.
6 days uptime going strong, i think this is it
Hey! This issue has been open for quite some time without any new comments now. It will be closed automatically in a week if no further activity occurs. Thank you for using WLED! ✨
What happened?
Hello,
All my 5 ESP32's running WLED_0.14.1-b2_ESP32.bin keep rebooting randomly, sometimes after only a few hours. They're all connected to my MQTT broker, with moderate traffic.
I have another ESP32 with WLED_0.14.1-b2_ESP32_audioreactive.bin, on that one MQTT isn't enabled, and since the upgrade from 0.14 to 0.14.1-b2, it's stable so far (3 days uptime).
I have managed to capture a stacktrace, but i don't know how to decode it.
This stacktrace was generated from WLED_0.14.1-b2_ESP32.bin, at least the binary from install.wled.me
To Reproduce Bug
Expected Behavior
No crash
Install Method
Binary from WLED.me
What version of WLED?
WLED 0.14.1-b2 (build 2312290)
Which microcontroller/board are you seeing the problem on?
ESP32
Relevant log/trace output
Anything else?
Thank you for your help!
Code of Conduct