bitfocus / companion-module-generic-mqtt

MIT License
2 stars 4 forks source link

Module does not handle "MQTT server is not reachable" correctly #30

Open fkusei opened 3 months ago

fkusei commented 3 months ago

Context: We run unattended-upgrades on all our systems. Every system will reboot if changes require it to apply updates. All connected systems are expected to handle "the service is not available temporarily" or "the vpn connection to the service fails" without issues.

What happened: Last night, our MQTT server rebooted. companion-module-mqtt did not handle that as we expect it to do.

Specifically, companion-module-mqtt tried reconnecting roughly 200 times per second to the MQTT server, which was unavailable. The logged message was Error: connect ECONNREFUSED 10.34.3.221:1883. After the MQTT server was available again, companion-module-mqtt did not successfully reconnect, but instead only logged Error: read ECONNRESET forever. This lead to the conntrack tables on the vpn servers running full, impacting all other services behind the vpn connection as well.

What i would expect the module to do: companion-module-mqtt should use exponential backof when trying to connect to the mqtt server. If the MQTT server keeps being unavailable, it should try a new connection at most every second.

Additional information:

Full logs of the incident occuring have been attached: companion-logs.txt There are no logs in the four hours prior to the mqtt server rebooting. Restarting companion fixed the issue. Please note the logs are in reverse order (newest first).