Open nuowei2543 opened 6 months ago
Hi, there is no inbuilt automatic rejoin. By just starting the old master again, you create a split brain scenario. But it's no problem to automatically rejoin the old master after promoting the new one via script.
Hello, during my simulation of host failover, I stopped the master host's PostgreSQL instance, and the standby node successfully switched to become the new master node. However, when I restarted the original master node, it did not automatically rejoin the cluster as a standby node. version: ubuntu:20.4 postgresql:16.2 repmgrd:5.4.1
1、 postgres@ser-compute-01:/disk1/postgresql/repmgr$ repmgr -f /disk1/postgresql/repmgr/repmgr.conf cluster show ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string ----+-------+---------+-----------+----------+----------+----------+----------+------------------------------------------------------------------------ 1 | node1 | primary | running | | default | 100 | 3 | host=10.0.14.100 port=5432 user=repmgr dbname=repmgr connect_timeout=2 2 | node2 | standby | running | node1 | default | 100 | 3 | host=10.0.14.101 port=5432 user=repmgr dbname=repmgr connect_timeout=2 3 | node3 | witness | running | node1 | default | 0 | n/a | host=10.0.14.109 port=5432 user=repmgr dbname=repmgr connect_timeout=2
2、on node1 execute command supervisorctl stop postgresql
3、postgres@ser-compute-02:~$ repmgr -f /disk1/postgresql/repmgr/repmgr.conf cluster show ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string ----+-------+---------+-----------+----------+----------+----------+----------+------------------------------------------------------------------------ 1 | node1 | primary | - failed | ? | default | 100 | | host=10.0.14.100 port=5432 user=repmgr dbname=repmgr connect_timeout=2 2 | node2 | primary | running | | default | 100 | 2 | host=10.0.14.101 port=5432 user=repmgr dbname=repmgr connect_timeout=2 3 | node3 | witness | running | node2 | default | 0 | n/a | host=10.0.14.109 port=5432 user=repmgr dbname=repmgr connect_timeout=2
4、on node1 execute command supervisorctl startpostgresql
5、postgres@ser-compute-02:/disk1/postgresql/repmgr$ repmgr -f /disk1/postgresql/repmgr/repmgr.conf cluster show ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string ----+-------+---------+-----------+----------+----------+----------+----------+------------------------------------------------------------------------ 1 | node1 | primary | ! running | | default | 100 | 1 | host=10.0.14.100 port=5432 user=repmgr dbname=repmgr connect_timeout=2 2 | node2 | primary | running | | default | 100 | 2 | host=10.0.14.101 port=5432 user=repmgr dbname=repmgr connect_timeout=2 3 | node3 | witness | running | node2 | default | 0 | n/a | host=10.0.14.109 port=5432 user=repmgr dbname=repmgr connect_timeout=2
WARNING: following issues were detected
So, I don't know why node1 is still the primary.