ccutrer / balboa_worldwide_app

Ruby library for communication with Balboa Water Group's WiFi module or RS-485
92 stars 27 forks source link

Controls go "Unavailable" (Home Assistant) #65

Closed fastfourier666 closed 2 years ago

fastfourier666 commented 2 years ago

Hi C,

Sorry, I seem to be the only one banging on about stuff in your issues recently! I am still using this and it's been great for managing power usage on our tub.

I am using this on an Ubuntu 20.04 proxmox VM. It's a fresh install apart from this library (installed with instructions in the README) and I'm using ruby 2.7.0p0. It talks to HA/Mosquitto running on another VM. The tub controller is MQBP20UX V2.1.1, connected with USB -> RS485 converter via USB passthrough from the host machine. All works fine so far.

At around 8:49PM every night the controls in HA become "Unavailable" (greyed out). The clock and timezone is set correctly. For the last few nights it's happened at 20:49:23, 20:49:23 again, 20:49:35, 20:49:47 and 20:49:41 this evening.

I poked around for a few days and found:

Any idea what this could be, or the best way to proceed to debugging it? I know next to nothing about Ruby unfortunately. Right now my "solution" is to monitor one of the bwa entities in Home Assistant and fire an ssh command at the balboa VM to restart the service. It works, but I'm more curious than anything about what could be causing this...

ccutrer commented 2 years ago

Hmm, that's a bit unusual. The only way $state can go to lost is by the MQTT server seeing the connection close, and applying the Last Will and Testament message. But at that point obviously no more messages can be published. The client will automatically reconnect, but when it does so it should be automatically re-publishing ready. Which MQTT server are you using?

fastfourier666 commented 2 years ago

I'm using Mosquitto 6.1.2 which is installed as a Home Assistant add-on.

Could the three-second gap between the MQTT updates be related to the RestartSec=3s in the service file?

fastfourier666 commented 2 years ago

Well, I think this was probably something wonky with the virtual machine.

I had this problem about three days in a row, and each time it took me anywhere between a few minutes and a few hours to realise it had happened and restart the service. BUT it still happened at 8:49, which made me think it was something within the library.

A couple of days ago I was trying to force it to fail, so disabled NTP and manually set the machine clock. bwa kept on running. I reinstated the NTP client, set the clock back and it's been running perfectly ever since. Something weird is going on so I think I'll just start frresh with a new VM.

Anyway, sorry for the noise and thanks again for this software!