EnterpriseDB / repmgr

A lightweight replication manager for PostgreSQL (Postgres)
https://repmgr.org/
Other
1.54k stars 250 forks source link

Is this a bug on standby promotion? #821

Open jiangjcsnapon opened 1 year ago

jiangjcsnapon commented 1 year ago

We have two nodesand one witness configuration registered in repmgr 5.4dev.

I am testing split brain scenario by break connection between primary and standby, but witness can see primary.

The standby promoted itself.

"[2023-07-28 13:57:23] [INFO] checking state of node ""PrimaryServer"" (ID: 1), 6 of 6 attempts [2023-07-28 13:57:25] [WARNING] unable to ping ""user=repmgr connect_timeout=2 dbname=repmgr host=PrimaryServer port=5432 fallback_application_name=repmgr"" [2023-07-28 13:57:25] [DETAIL] PQping() returned ""PQPING_NO_RESPONSE"" [2023-07-28 13:57:25] [WARNING] unable to reconnect to node ""PrimaryServer"" (ID: 1) after 6 attempts [2023-07-28 13:57:25] [INFO] 1 active sibling nodes registered [2023-07-28 13:57:25] [INFO] 3 total nodes registered [2023-07-28 13:57:25] [INFO] primary node ""PrimaryServer"" (ID: 1) and this node have the same location (""default"") [2023-07-28 13:57:25] [INFO] local node's last receive lsn: 0/5E623340 [2023-07-28 13:57:25] [INFO] checking state of sibling node ""WitnessServer"" (ID: 3) [2023-07-28 13:57:25] [INFO] node ""WitnessServer"" (ID: 3) reports its upstream is node 1, last seen 1 second(s) ago [2023-07-28 13:57:25] [NOTICE] witness node ""WitnessServer"" (ID: 3) last saw primary node 1 second(s) ago, considering primary still visible [2023-07-28 13:57:25] [INFO] 1 nodes can see the primary [2023-07-28 13:57:25] [DETAIL] following nodes can see the primary:

[2023-07-28 13:57:25] [INFO] visible nodes: 2; total nodes: 2; no nodes have seen the primary within the last 4 seconds [2023-07-28 13:57:25] [NOTICE] promotion candidate is ""StandbyServer"" (ID: 2) [2023-07-28 13:57:25] [NOTICE] this node is the winner, will now promote itself and inform other nodes [2023-07-28 13:57:25] [INFO] promote_command is: ""/usr/pgsql-14/bin/repmgr standby promote -f /etc/repmgr/14/repmgr.conf --log-to-file"" [2023-07-28 13:57:25] [NOTICE] redirecting logging output to ""/var/log/repmgr/repmgrd.log""

[2023-07-28 13:57:27] [WARNING] 1 sibling nodes found, but option ""--siblings-follow"" not specified [2023-07-28 13:57:27] [DETAIL] these nodes will remain attached to the current primary: WitnessServer (node ID: 3, witness server) [2023-07-28 13:57:27] [NOTICE] promoting standby to primary [2023-07-28 13:57:27] [DETAIL] promoting server ""StandbyServer"" (ID: 2) using pg_promote() [2023-07-28 13:57:27] [NOTICE] waiting up to 60 seconds (parameter ""promote_check_timeout"") for promotion to complete [2023-07-28 13:57:28] [NOTICE] STANDBY PROMOTE successful [2023-07-28 13:57:28] [DETAIL] server ""StandbyServer"" (ID: 2) was successfully promoted to primary [2023-07-28 13:57:28] [INFO] checking state of node 2, 1 of 6 attempts [2023-07-28 13:57:28] [NOTICE] node 2 has recovered, reconnecting [2023-07-28 13:57:28] [INFO] connection to node 2 succeeded [2023-07-28 13:57:28] [INFO] original connection is still available [2023-07-28 13:57:28] [INFO] 1 followers to notify [2023-07-28 13:57:28] [NOTICE] notifying node ""WitnessServer"" (ID: 3) to follow node 2 INFO: node 3 received notification to follow node 2 [2023-07-28 13:57:28] [INFO] switching to primary monitoring mode [2023-07-28 13:57:28] [NOTICE] monitoring cluster primary ""StandbyServer"" (ID: 2)"

jiangjcsnapon commented 1 year ago

standby can not see primary but it can see witness and witness can see primary. Standby should not promote because if witness can see primary, primary is possible up.