dalibo / check_patroni

A nagios plugin for patroni.
PostgreSQL License
7 stars 3 forks source link

Fix the cluster_has_leader service for standby clusters #63

Closed blogh closed 10 months ago

blogh commented 11 months ago

Before this patch we checked the expected standby leader state was running for all versions of Patroni.

With this patch, for:

The tests where modified to account for this.

closes #58

mbanck commented 11 months ago

Maybe it makes sense to return WARNING state if the standby leader is in archive recovery? Usually, this should be a transient state that only happens for a short time at startup or during switchovers - if it happens continuously there is something wrong (either archive recovery is slower than archiving or it does not work altogether). And as we cannot tell the lag, I think it would be better to just consider streaming state as OK.

blogh commented 11 months ago

Yes, that's something I was unsure of when I wrote this ...

I thought about using an option to explicitly authorize "in archive recovery" as valid state (here and for cluster_has_replica).

In this case at least we could have a standby cluster that relies on log shipping to be somewhat up to date. I need to think about this and discuss it with my colleagues.

Sorry about the snail pace btw, I have other missions that keep me busy.

blogh commented 10 months ago

Done that way.