forkineye / ESPixelStick

Firmware for the ESPixelStick
http://forkineye.com/
531 stars 170 forks source link

Version 4.0-beta3 randomly reboots #378

Closed darranwil closed 10 months ago

darranwil commented 2 years ago

--------- Instructions -------- Please provide answers directly below each section. --------- Instructions ---------

4.0-beta3

wemos d1 mini pro with external antenna

precompiled

Windows 10 V21H1

Firefox 92.01

Access Point

Random Reboots. Wired to max485. Tested for solid 48 hours on previous version 3.2 with mulkticast sacn stream, no reboots. Version 4.0-beta3 randomly reboots with no data being sent to it. Sometimes 20 mins, sometimes 5 hours...

forkineye commented 2 years ago

There's an issue we're trying to track down related the webserver library that we're using. Did this occur during web interaction or while just running / processing data?

darranwil commented 2 years ago

Rebooted several times while having the page open to monitor uptime over several hours and seemed to randomly reboot. Sometimes last 20 mins, sometimes 45, sometimes several hours. Ran without browser open for almost 8 hours and only rebooted twice. Never sent any SACN or MQTT while doing uptime test.

MartinMueller2003 commented 2 years ago

Try with no browser connected to the device and then after a day check the uptime. The problem is the WebServer used to present the status page has issues. Check the device every once and a while. I had mine up for 6 days and had it playing data in response to an FPP show player. But my web page was closed the entire time.

darranwil commented 2 years ago

Try with no browser connected to the device and then after a day check the uptime. The problem is the WebServer used to present the status page has issues. Check the device every once and a while. I had mine up for 6 days and had it playing data in response to an FPP show player. But my web page was closed the entire time.

As stated in my previous post, I did that. I've flashed several chips (wemos D1 Minis), browser closed, and same result. Sometimes 20 mins, sometimes 2-3, sometimes 4-6 hours before reboot.

aggie81 commented 2 years ago

Just FYI since you seem aware of this. Just got some ESPixelSticks V3 to play with. They run fine on FW 3.2. Trying out V4 Beta 3 and am having the same WebServer problems. Very slow or no response sometimes, WebSocket randomly looses connection and random reboots. Attached is some serial output I captured. Looks like watchdog timeouts, but I have also seen it throw an exception and spit out a HEX dump. But I haven't capture that yet.

SerialLogOutput.txt .

aggie81 commented 2 years ago

Spoke too soon. Just caught an exception. This occurred as I changed the input from E131 to DDP and clicked on save config. This log is same as above plus the exception.

SerialLogOutput2.txt

darranwil commented 2 years ago

Downloaded and flashed after "Switch ESPAsyncWebServer to yubox fork". Still random reboots with browser closed.

forkineye commented 2 years ago

From the stack traces, there seems to be something occurring inside LWIP. Can you give this version a try? It's compiled with LWIP2 configured to "High Bandwidth, No Features" - https://github.com/forkineye/ESPixelStick/actions/runs/1441917656. https://arduino-esp8266.readthedocs.io/en/latest/ideoptions.html#lwip-variant

MartinMueller2003 commented 2 years ago

FYI: Using High Bandwidth with no features has been very stable for me. Can we close this?

darranwil commented 2 years ago

My appologies to everyone for dropping out for so long. Covid long hauler, twice now and this time has put me down for weeks at a time. I just installed the lastest Beta4 and will let it run 24/7 in Artnet/DMX output mode connected to a fixture.

MartinMueller2003 commented 2 years ago

Welcome back and I hope you dont get it again. Looking forward to closing this issue.

MartinMueller2003 commented 2 years ago

I have had the latest version running on an ESP32 board for the past 5 days WITH the browser open to the status page. It is running in an FPP remote player mode. It has been up and running for 6 days and has played 710 songs. NO REBOOTS. No funny issues just keeps on going. I do have an intentional issue (one of the fseq files for a song is not on the SD card) and this has not caused any issues.

forkineye commented 2 years ago

@MartinMueller2003 I'm leaving this open for now, as I believe there are still some issues related to LWIP and the ESPAsyncWebserver / ESPAsyncTCP implementation underneath. LWIP 2.1.3 is currently being integrated into the ESP8266 core and I'd like to do more testing with it first. thanks, -shelby

darranwil commented 2 years ago

Still randomly rebooting for me with browser closed and no Artnet data being sent to it. Seems to be about every 8-12 hours. The only thing I've noticed is the heap size seems to go up and down but never runs out.

aggie81 commented 2 years ago

Just got back to trying ESPixelStick v4.0-beta4 on ESPixelStick Ver 3 HW, Seems to be running better than beta3, but beta4 still also randomly reboots, usually when accessing the the Web UI and changing parameters on the Device Setup page. Once setup (DDP mode), I have been able to send sequence data directly from xLights to a 150 pixel string for at least 4 hours now with no reboots, with the web UI open at the Home page to monitor uptime. As long as I am not changing parameters in the Web UI it seems to be fairly stable so far just receiving DDP data.

MartinMueller2003 commented 2 years ago

ESP8266 has far less ram available than the ESP32 implementation. This causes the Web UI to occasionally starve the rest of the system and that causes crashes.

aggie81 commented 2 years ago

Can there be a work around for V4 and ESP8266 HW? Ver 3.2 and the latest WLED 0.12.0 both seem to run solid on the ESP8266 in my testing.

forkineye commented 2 years ago

We've been trying to get to the core of the issue. It started happening during the re-factor from 3.2 to 4.0 and is what has been keeping 4.0 from becoming a "stable" release and is related the webserver library and underlying asynctcp code that is being used.

onewithhammer commented 2 years ago

Shelby / Martin - Have you tired to remove the Arduino JSON Library and all references. When I was developing this project (https://github.com/onewithhammer/ESP8266-MyWidget-Demo) I was having the similar results with exceptions and was pulling my hair out until I remove this library and all references.

See my notes from the project: I originally tried to send / receive JSON messages using the popular Arduino JSON Library ArduinoJson but I couldn't make it stable. I kept getting exceptions happening in various places, while stress testing (calling GET heap repeatively), so I eventually removed the ArduinoJson library and references. I converted all Web Services messages to send/receive text messages. I also converted files to save as text files (cfg.txt) instead of JSON.

This may not be the issue but I can tell you I struggled to get my project stable until I removed this library.

MartinMueller2003 commented 2 years ago

Reading and displaying config information does not use any ArduinoJson functions. It reads the files directly from the SD card into a buffer and sends that information. ArduinoJson is used to build and send status for the home page and to process configuration updates. In other words very minimal interactions done is json. I most often see the crash on moving from one page to another and the issue is worse as ram is used up. Analysis of the crashes shows most of them are in the TCP processing stage where the system is trying to allocate buffers. I have taken great care to make sure ArduinoJson has released all resources prior to interacting with the web server.

onewithhammer commented 2 years ago

Any idea of when a stable release will be available that addresses this issue?

MartinMueller2003 commented 2 years ago

The ESP32 has more ram and I do not see these crashes in my system. The ESP8266 Ram is on the edge and sometimes gets in trouble.

MartinMueller2003 commented 1 year ago

Is anyone seeing this on my latest builds on the ESP32 platform? I know this is a ram issue with the ESP8266 but I have not seen it in a long time on my ESP32 versions.

akennerly commented 1 year ago

@MartinMueller2003 I'd be happy to test on the ESP32. Is there a drop in replacement ESP32 for v3 PixelStick? e.g. ESP32 D1 Mini

MartinMueller2003 commented 1 year ago

Yes the ESP32 D1 Mini is a drop in replacement. Just grab the repo and build for the mini or grab the images I keep on google drive.

akennerly commented 1 year ago

@MartinMueller2003 I installed the CI build ESPixelStick v4.0-ci3138152858 (Sep 27 2022 - 18:51:29)

Since my ESP32 doesn't appear to have PSRAM, I used the "D1 Mini Mhetesp32minikit" build in ESPixelStick Flash Tool. I chose that build after searching through other Issues/Discussions. If there is a better choice or if I should compile myself I will. I was able to at least get the ESP to complete a boot but the attached log will show that there is a periodically logged error because of a missing wired ethernet port.

SerialOutput-10192022.txt

I haven't done any other testing yet.

The serial log output is just this periodic error: "esp_eth: esp_eth_ioctl(348): ethernet driver handle can't be null"

I'll configure this ESP32 as an FPP Remote to test if I don't need to switch builds or compile the firmware to more closely match my ESP32.

This is the ESP32 that was purchased: https://www.amazon.com/dp/B09C5RDZ8G

MartinMueller2003 commented 1 year ago

The correct image would be for the d1_mini32 board

akennerly commented 1 year ago

@MartinMueller2003 That build results in a reset loop.

SerialOutput-10202022.txt

MartinMueller2003 commented 1 year ago

Hmm. I will take a look later today

akennerly commented 1 year ago

@MartinMueller2003 Any luck in seeing where the issue might be?

I realize you have your own life and this is unpaid volunteer development. I'd just like to get at least one ESPPS stable for the holidays.

Thanks

Mat-Moo commented 1 year ago

Just going to say I can't get 4 (from daily build) to run on a Mini d1, just keeps rebooting. Are we saying that we should only use ESP32 variants now?

MartinMueller2003 commented 1 year ago

I used an ESP8266 D1 Mini in my show this year with no issues. I do know that the ESP32 has more ram and that makes web UI activities more stable.

Mat-Moo commented 1 year ago

Mostly ui activities I've been doing, and switching to advanced mode seems to trigger it

MartinMueller2003 commented 1 year ago

Interesting since advanced mode only uses data already in the browser. The real trigger is switching the page. That causes a request to the ESP and that causes ram issues. We know that using the UI causes the ESP to be unstable. We know it is due to ram issues.

MartinMueller2003 commented 1 year ago

Just did a major rework of the UI to get the ram usage down. With the new implementation, I have not seen a single reboot that was not expected.

MartinMueller2003 commented 11 months ago

I would like to close this as fixed. Does anyone still have these issues?

akennerly commented 11 months ago

I just uploaded Firmware-4.0-ci6236846991 to an 8266 based ESPSv3. I'll you know if I run into the reboot problem.

MartinMueller2003 commented 11 months ago

On the admin page, please make sure the build date is newer than Aug 21 2023

akennerly commented 11 months ago

"Build DateSep 19 2023 - 14:22:16"

akennerly commented 11 months ago

I haven't had any reboots with the fixed firmware.

MartinMueller2003 commented 10 months ago

I have not seen any issues since we stopped using the WebSockets. I am closing this issue.