Closed blogh closed 11 months ago
cf #50
I still need to fix the tests and try on older supported python versions.
If I read the changes correctly, this also adds the timeline to the perfdata? That might warrant a release notes item as well then.
You are right, I changed it. I'll probably continue next week. I am booked for a client this week.
Hi @mbanck,
Do you want to review it ?
I think this is still wrong.
From PostgreSQL's perspective, a healthy standby could be streaming
or in archive recovery
(we don't use slots and use log shipping to catchup). And if we look at is_healthiest_node or
is_failover_possible, Patroni doesn't care about the state of the node either (maybe I missed it ?)
It checks things like :
maximum_lag_on_failover
(we do it if --max-lag
is used)nofailover
tag is present (we don't check for that)So I think we should do something like
if version < 3.0.4:
if state = "running" and TL = leader TL:
test for lag if needed
the node is healthy
if version >= 3.0.4:
if state in ["streaming", "in archive recovery"] and TL = leader TL:
test for lag if needed
the node is healthy
I don't know what to do about nodes with a nofailover
tag. Maybe exclude them if we
use a new --exclude-nofailover-tag
option ?
I guess in archive recovery
means the standby is currently catching up; whether that is healthy or not could then be checked via lag. So I think the above is fine.
I am also not sure what to do about nofailover
tags, but in my opinion, this is orthogonal to whether a node is healthy or not.
@dlax could you have another look please ?
For patroni >= version 3.0.4:
For prio versions: