Smegheid / water_heater

Running solar water heater system with a busted panel sensor from a raspberry pi
0 stars 0 forks source link

Add watchdog #1

Open Smegheid opened 2 years ago

Smegheid commented 2 years ago

Since the water heater control system has safety implications if it doesn't regulate, it's probably a good idea to have a watchdog as a backup.

The simplest thing I can think of is a script that runs as a one-shot affair that then gets invoked periodically from cron. That way we're not dependent on the watchdog process running continually.

The watchdog should probably check that:

In all cases, if a problem is found the watchdog should probably kill off the control process (if running) as we no longer have confidence in it, explicitly stop the pump and then restart the controller.

Thoughts on each check are above as sub-bullets.

Smegheid commented 2 years ago

One thought before getting started: water_pump state will read back the state of the pump control. However, for the purposes of the watchdog, what are the odds of a race condition where the control process has toggled the pump but has not yet updated the status file to reflect that?

The window is probably fairly small; the control process probably turns off the relay and then updates the status file within a small number of tens of ms and the watchdog will run once a minute from cron. However, you know what they say: million-to-one chances crop up nine times out of ten.

Not sure if I'm happy about that, and not currently sure what to do about it.

Smegheid commented 2 years ago

Thinking out loud before I'm done for the day and to remind myself once I get started again later: what if the watchdog were to wait on an update to the status file?

The control loop only updates the status once it's done making decisions for that pass. It then goes to sleep for a decent length of time before going again. If the watchdog were to wait for the status file to be updated, then that would accomplish several things:

This looks like it's fairly easy to accomplish. inotifywait can take a --timeout option in seconds where it'll exit if the file isn't changed in that window. The control process repeats every 10 sec, so if we set that timeout to a couple of times that, a sane control process must update the status in that window.

Yeah, I think I like the sound of that. It solves several problems at once.

Other thoughts: