OpenGarage / OpenGarage-Firmware

OpenGarage: open-source WiFi-enabled garage door opener
https://opengarage.io
GNU General Public License v3.0
280 stars 102 forks source link

No internet causes api hangs #91

Open jknaack opened 1 year ago

jknaack commented 1 year ago

Fix #90

To get rid of the calls to delay() and use the loop itself, it seemed easiest to use a very simple state machine + a transition time (to replace the delay call) between states, where the transition time is defined by the loop millis() call.

Basically, the first state is the CONFIGURE state, which, once you exit, you can't go back to. That always sends you to the UPDATE_TIME state, which sets up the timeout at 30 seconds, then sets the state to CALL_NTP, which actually calls the ntp server. This state can succeed, in which case it updates all the time stuff, and updates the state to UPDATE_TIME with a transition time of 1800 seconds (per the previous implementation). If it fails before the timeout (30 seconds, per the previous implementation), it moves the state back to CALL_NTP, with a 2 second transition time. If it fails after the timeout, it moves the state to UPDATE_TIME with a transition time of 60 seconds (per the previous implementation), which resets the timeout to 30 seconds, and the process repeats.

One behavior I did change is that there was a 1 second delay between calling configTime and time(nullptr). I noticed that in the old code, this was required or the first call to time(nullptr) would always fail. After moving to the state transition model, this no longer happens. I suspect some background work needs to happen between configTime and time(nullptr), which can occur now because we aren't blocking with delay and we are existing the loop. I set the transition time to 0, but made it configurable in the defines. My preference would be to delete the transition time delay completely since I don't think it's necessary anymore (and if it happens to fail occasionally on the first call, the retry should cover it), but I left it for now.

Another potential improvement would be to use a backoff strategy for the retry instead of a fixed 2 second retry (up to the 30 minute retry for MQTT) . That might be simpler and more effective than the retry/timeout complexity now that there are no longer challenges with avoiding the delay(), but I wanted to submit a version that was functionally the same, so here it is.