Qrome / marquee-scroller

Marquee Scroller Clock News Weather and More
https://www.thingiverse.com/thing:2867294
MIT License
328 stars 159 forks source link

Strange behaviour 2.18wide - Clock stuck/frozen #222

Open njordan77 opened 2 years ago

njordan77 commented 2 years ago

I have a very strange annoying issue with v2.18 on several installations in different location (internet different). The clock is stuck on a wrong time....so its frozen.....

Wemos / NodeMCUs in use.

Does anybody else face this issue? Do not remember that this ever happened before on my installations that worked for 1+year. Thanks, Norbert

tadder commented 2 years ago

I have a similar problem. Date is stuck at “Thursday 1st January”. Previously working fine.

njordan77 commented 2 years ago

Maybe it is worth mentioning that it works fine for a few hours or even a day. but when it freezes, there is no return to normal function. Reboot and it works again.

Qrome commented 2 years ago

@tadder -- make sure you use 2.18 @njordan77 -- your the first to have this issue. Make sure you have not maxed out your api connection limits

njordan77 commented 2 years ago

@Qrome thanks for the feedback. the strange thing is that it happens on two installations (not immediately, but within one day and from then onwards its frozen). 'm only using weather API, nothing else. no bicoin, no stocks, no news.....if this is ment by max'ing API limits.

Qrome commented 2 years ago

Do you you not use the TimeDB api key? @njordan77 -- when you say frozen, can you get into the web interface or does that not respond as well? Run it with the serial monitor and see what the error is. I use TimeDB, Weather, OctoPrint, and Pi-Hole on several clocks -- no issues. I am not see reports from the 1000's of others using it as well.

tadder commented 2 years ago

Thank for your reply, you are right I was on an earlier version. Previously I had the display turned through 180deg and the length for 8 modules. I can see that direct editing of the ‘settings.h’ file is no longer necessary and indeed not possible. However, there does not appear to be a way to change these two parameters using the Web interface. I did try loading ‘marquee.ino.d1_mini_wide_2.18.bin’ (after re-flashing) but still no luck. I suspect I am missing something simple, but if you could point me in the right direction I would be grateful.

Qrome commented 2 years ago

The fields that you can't edit in the web interface still take in the values from the settings.h -- edit that for the number of LEDS and orientation.

njordan77 commented 2 years ago

@Qrome. Sure, forgot to mention TimeZoneAPI, but thats is in addition to WeatherAPI. I did see that nobody else did mention this before. Right now its hard to do a serialmon trace as the clock is on the roof of my carport. will soon (when snow in the roor allows) dismount it and do the monitoring to get more specific information.

matthias1403 commented 2 years ago

Hi, I had the same experience after several hours or a day (Wemos D1 Mini) that the clock stucks. Looking at the serial output it has shown that the program is trapped in the while loop (TimeDB.cpp:63). I see sometimes a lot of data trash received from the server, which is not plausible. I added locally an exit if this is not ending.....which improves, but this is still not clean. Overall it looks to me like a memory leak (Speicherschmierer) as I have also seen a serial print fragments where it should not be.

Currently my device is also not at the serial port connected, but I could do it, if it would help.

njordan77 commented 2 years ago

as nobody has this issue (except both of us) my speculation is that the OTA update from 2.17 to 2.18 did kill something/bring flash into this dilemma. Will completely flash with BLANK file and reflash 2.18. Hope that this will resolve the issues i'm having.

tadder commented 2 years ago

It looks like you are right. I have re-flashed on a blank file system and all looks good.

matthias1403 commented 2 years ago

I have also clean the flash through Arduino IDE (Erase flash --> all flash contents) and burned the 2.18 (unchanged), than I have to reconfigure everything. But after a day the clock is stuck again, showing also 6 active pixels at the first column, which indicates that is trapped in the loop where it waits on data. I have seen 2 occurrences up to now. Any suggestions to clean the flash in another way to be 100% sure that there is nothing wrong? And nobody else sees this issue?

Qrome commented 2 years ago

What is the version of the ESP8266 Core you are using to compile it with?

matthias1403 commented 2 years ago

Currently I took the bin from github, but I tested also with Core 2.7.4, as the 3.x was not compile clean (something obsolete in httpclient call).

Qrome commented 2 years ago

Try using the version listed in the Readme.md file.

esp8266 Core platform version 2.5.2

I know this one works correctly.

matthias1403 commented 2 years ago

Used a compiled build with 2.5.2 now. Stuck after a day. Attached the serial log. If nobody has this issue this is maybe a HW problem of my sample..... See attachment: serial.log There is often crap in the http responses, and also the phrase "essful!" should not be there.

njordan77 commented 2 years ago

I blanked my ESP and did a BIN upload - but again only clock is shown (frozen)......normally it shows temp + clock....even after a day no update nothing. The strange thing is that with the same hardware it worked great on 2.17 (for more than a year without any crash). After the general issue and solved via 2.18 it now has this behavior on 2 individual installations...i'm out of ideas what could be specific to my hardware/setup.

matthias1403 commented 2 years ago

I soldered a new Wemos mini at the display and burned a 2.18 on this untouched device. The behavior is the same again, as it stops running after ~1d. It is definetivly caused by the wrong data in the http responses, which leads sometimes to an infinitive loop. But what could create this data trash?

nored commented 2 years ago

I have the same issue... however changing TimeDB.cpp in the following way fixes it:

  boolean record = false;
  unsigned long MAXTIME = 10000; // timeout (milliseconds) for RUNNING state
  unsigned long startTime = millis();

  while (client.connected() || client.available()) { //connected or data available
    char c = client.read(); //gets byte from ethernet buffer
    if (String(c) == "{") {
      record = true;
    }
    if (record) {
      if (String(c) != "⸮") {
        result = result + c;
      }
    }
    if (String(c) == "}") {
      record = false;
    }
    if ((millis()-startTime) >= MAXTIME) {
      client.stop(); //stop client
      Serial.println("Fetching time data took too long..."); //error message if timeout
      Serial.println();
      return 20;
    }
  }

There is two possibilities I am addressing here either you get some "⸮" that render the time result unusable or you get an infinite amount of "⸮" and the loop never stops reading so that the clock freezes.

I have not tried this solution for long now but at the moment it seems to fix the issue.


Quick Edit here: After two days of testing it is working just fine with it.

BeNeDeLuX commented 2 years ago

I have also the same problems with my two displays here since 2.18. So you are not alone. Same behavior -> during Update the clock freezes. Happens around once a week.

// Edit: attached screenshot where the clock freezed clock_stuck

brainrecall commented 2 years ago

I've been having the same issue with a standard width clock for at least a month now. I came to same conclusion that @nored found. I solved it in a different manner, and found three other places where there are possibilities of never-exiting while loops. I made those changes and some other things like platform IO support, on my branch for now (I did not build the bins yet): https://github.com/brainrecall/marquee-scroller

njordan77 commented 2 years ago

wow, thanks - more than happy that others having the same issues.

@Qrome is there a chance to get this into the official code and provide binaries. I'm sorry to not be too much of a coding genius. Thanks Norbert

liuxianhao666 commented 2 years ago

I have always had the same problem. I have had this problem since version 2.15. Now I buy the hardware again and brush 2.18. This problem still exists.

nored commented 2 years ago

My solution stopped working yesterday... so I swiched to ntpclient. I will test and let you know if it works any better and also deliver a patch if it does.

phenomeus commented 2 years ago

It’s happening for me too. Time is visible and the left first column is stuck three third.

Running weather, time, Bitcoin, octopi and pihole. Because my clock is on battery I took it nearer to an acces point and let it there. Suddenly it revived itself and continued working.

brainrecall commented 2 years ago

@nored I think the time source is only half the problem, the weather code also has a while loop that could never exit and I have seen it die there as well. Take a look at my branch, it should cleanup both those cases.