Stampede of `run` MQTT messages causes a bad MQTT state

Pioreactor / pioreactor

Hardware and software for accessible, extensible, and scalable bioreactors. Built on Raspberry Pi.

https://pioreactor.com

MIT License

101 stars 9 forks source link

Stampede of `run` MQTT messages causes a bad MQTT state #517

Closed CamDavidsonPilon closed 3 months ago

CamDavidsonPilon commented 3 months ago

Using a link-local connection that was oddly slow, the UI's wasn't firing run MQTT messages. I tried refreshing the page, but it looked like nothing was getting through (I had pio mqtt open in the console, and that didn't show anything). Then, a sudden influx of messages meant that three stirring jobs started in quick succession, and while the first one started stirring, the latter two were ended (no duplicate jobs). However, these later jobs changed the state of stirring in MQTT to disconnected, hence the UI wasn't showing that stirring was running, and I had to SSH in kill it.

This kinda explains some users' experiences with UI state != pioreactor state.

CamDavidsonPilon commented 3 months ago

Added deduplicating in monitors callback
Added error notifications in the UI to the user if can't connect to backend
UI backend configuration uses the leader's unit_config.ini, which contains faster lookups for broker_address and leader_address (localhost, instead of it's ipv4 or hostname)

CamDavidsonPilon commented 3 months ago

the solution 1. doesn't work since the callback is single-threaded, so requests are already handled sequentially (no duplicate processing). I'd still like monitor to "reject" some identical requests if they come in too fast

CamDavidsonPilon commented 3 months ago

I added a simple data structure to check if a command was recently sent to monitor, and if so, ignores it. The threshold is 0.5s.

I also thought about the case of using experiment_profiles to do something silly like duplicating start actions when the same hours_elapsed to cause this problem.