I encountered an issue where failover did not occur after the primary database experienced a network disruption. In a three-node repmgr setup, when the network to the primary PostgreSQL node is blocked, the standby repmgr nodes fail to detect the primary node’s failure. On all three servers, the status shows “unreachable,” but no actual failover occurs.
The standby nodes have no logs recorded, but running show reveals the exception:
ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string
----+--------------------+---------+-----------+------------------+----------+----------+----------+-------------------------------------------------------------------------
1 | pg_49536432_stage | standby | running | ? pg_19850_stage | default | 100 | 8 | host=xxxxxx port=6432 user=repmgr dbname=repmgr connect_timeout=2
2 | pg_211296432_stage | standby | running | ? pg_19850_stage | default | 100 | 8 | host=xxxxxxx port=6432 user=repmgr dbname=repmgr connect_timeout=2
3 | pg_19850_stage | primary | ? running | ? | default | 100 | | host=xxxxxxxx port=6432 user=repmgr dbname=repmgr connect_timeout=2
The repmgr.conf configuration details are as follows:
I encountered an issue where failover did not occur after the primary database experienced a network disruption. In a three-node repmgr setup, when the network to the primary PostgreSQL node is blocked, the standby repmgr nodes fail to detect the primary node’s failure. On all three servers, the status shows “unreachable,” but no actual failover occurs.
The standby nodes have no logs recorded, but running show reveals the exception: ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string
----+--------------------+---------+-----------+------------------+----------+----------+----------+------------------------------------------------------------------------- 1 | pg_49536432_stage | standby | running | ? pg_19850_stage | default | 100 | 8 | host=xxxxxx port=6432 user=repmgr dbname=repmgr connect_timeout=2 2 | pg_211296432_stage | standby | running | ? pg_19850_stage | default | 100 | 8 | host=xxxxxxx port=6432 user=repmgr dbname=repmgr connect_timeout=2 3 | pg_19850_stage | primary | ? running | ? | default | 100 | | host=xxxxxxxx port=6432 user=repmgr dbname=repmgr connect_timeout=2
The repmgr.conf configuration details are as follows:
failover='automatic' priority=100 connection_check_type=query connection_check_query = 'SELECT 1' reconnect_attempts=6 reconnect_interval=5 monitor_interval_secs=2 primary_notification_timeout=20