In HA setups, when stopping icinga2 (but keeping icingadb running) or killing icingadb with SIGKILL, it's possible to end up with the following state in the database:
So there is a left-over row with responsible='y' but with an expired heartbeat.
If the icingadb process is still running, it should actively retract by writing responsible='n' to its own instance (not sure why this doesn't happen already)
On takeover, the responsible should explicitly be reset for other expired rows
At the moment, this bug can result in a situation where icingadb-web shows the Icinga is not running warning even though everything is fine as is seems to only consider responsible but not heartbeat when selecting the row for displaying status information. So this could also fixed by icingadb-web by using WHERE responsible = 'y' ORDER BY heartbeat DESC LIMIT 1, but I think it's cleaner to fix it here and only have one row with responsible='y' in the first place.
In HA setups, when stopping icinga2 (but keeping icingadb running) or killing icingadb with SIGKILL, it's possible to end up with the following state in the database:
So there is a left-over row with
responsible='y'
but with an expired heartbeat.responsible='n'
to its own instance (not sure why this doesn't happen already)responsible
should explicitly be reset for other expired rowsAt the moment, this bug can result in a situation where icingadb-web shows the Icinga is not running warning even though everything is fine as is seems to only consider
responsible
but notheartbeat
when selecting the row for displaying status information. So this could also fixed by icingadb-web by usingWHERE responsible = 'y' ORDER BY heartbeat DESC LIMIT 1
, but I think it's cleaner to fix it here and only have one row withresponsible='y'
in the first place.