Open nobuto-m opened 1 month ago
It is the same pySyncObj Raft library as described in https://github.com/canonical/postgresql-operator/issues/571#issuecomment-2301699211
Duplicate of https://github.com/canonical/postgresql-operator/issues/418, we are trying to fix this in https://warthogs.atlassian.net/browse/DPE-3684
Steps to reproduce
juju deploy postgresql --base ubuntu@22.04 --channel 14/stable -n 3
Expected behavior
The cluster keeps working since there are two living nodes out of 3 (the quorum should be satisfied).
Actual behavior
The cluster is not operational.
The charm states postgresql/0 is the primary but it's not true since there is no postgresql process running in postgresql/0 any longer.
^^^ no postgresql process.
initial status
after taking down postgresql-2 (non leader)
-> expected status
cleaning up postgresql/2 from the model
remove-machine --force
was used instead ofremove-unit
since the machine/unit agent is no longer responding after the hardware failure.adding another machine as the 3rd node (postgresql/3) in the cluster
-> expected status
taking down postgresql/3
The cluster should still work at this point since there are two living nodes out of the 3-node cluster. However, no Patroni operation is possible any longer.
[/var/snap/charmed-postgresql/current/etc/patroni/patroni.yaml]
The raft config in patroni.yaml looks okay though.
Versions
Operating system: jammy
Juju CLI: 3.5.3-genericlinux-amd64
Juju agent: 3.5.3
Charm revision: 14/stable 429
LXD: N/A
Log output
Juju debug log:
postgresql_replacing_failed_nodes_debug.log
Additional context