Open nadav213000 opened 3 years ago
Experiencing the same issue, help will be appreciated!
Can you confirm that the repmgrd
daemon is running on all nodes? The logs from at least one node should clearly show if/when disconnections occurred, if it actually disrupted the Postgres connections repmgrd makes to each node.
What does this command show:
repmgr service status
If it is running on all nodes, how are you checking the logs? The repmgr daemon is very chatty even under normal operating circumstances.
@bonesmoses same here, with data corruption (different states on different servers). I hope this information will help.
repmgr
diagnostic output from all three nodesrepmgr.conf
from -2
node-0
node-1
node-2
nodeIt looks like unexpected name lookup failure during repmgr startup may be the root cause of cluster being stuck in split-brain state and if it's true then all we need is to add DNS lookup retries here and there...
I'm having the same issue. In my case I was able to narrow it down to certain VM network behaviors - for example, running under Openstack as my vm provider, in cases where a VM's private network is disconnected this state happens. This private network failure causes the VM to be unable to access its own block storage. It seems that however repmgr is checking connectivity for cluster status check (the one that shows primary as unreachable) is not the same method as that which causes failover to occur, since I would expect that if not having storage is enough to report unreachable (and it is!) then failover should definitely occur here.
Hey,
I have 3 cluster nodes deployed on VMs. There are some networks issues, which cause the primary to be not available to other nodes in cluster.
When we run
cluster show
command both other servers show primary asunreachable
. But repmgr doesn't trigger failover. Further more, repmgr show no logs of monitoring the primary (also upstream node).Repmgr work well in other situations, such as postgres service crush or server crush.
We have configured repmgr like that:
This seems to me like the relvant configuration for this problem.
Do you have any idea why repmgr doesnt trigger failover in situation like that? And doesnt write any logs either?