EnterpriseDB / repmgr

A lightweight replication manager for PostgreSQL (Postgres)
https://repmgr.org/
Other
1.56k stars 251 forks source link

"node check" incorrectly reports node is not attached to upstream while in state "catchup" #738

Open loop-evgeny opened 2 years ago

loop-evgeny commented 2 years ago

repmgr 5.3.0, PostgreSQL 14.1, Ubuntu 18.04

I have one standby node attached to a primary. The standby fell behind during some heavy writing on the primary and I rebooted the standby. After the reboot, the standby took some time to re-appear in pg_stat_replication, but eventually did re-appear there with state="catchup". During this time repmgr node check incorrectly reports that it is not attached to the upstream node at all:

# sudo -u postgres repmgr -f "/etc/postgresql/14/behavior/repmgr.conf" node check
WARNING: node "(MY_STANDBY_NODE)" attached in state "catchup"
Node "(MY_STANDBY_NODE)":
    Server role: OK (node is standby)
    Replication lag: CRITICAL (2095 seconds, critical threshold: 600))
    WAL archiving: OK (0 pending archive ready files)
    Upstream connection: CRITICAL (node "(MY_STANDBY_NODE)" (ID: 2) is not attached to expected upstream node "(MY_PRIMARY_NODE)" (ID: 1))
...

cluster show shows correct warnings, however:

# sudo -u postgres repmgr -f "/etc/postgresql/14/behavior/repmgr.conf" cluster show
WARNING: node "(MY_STANDBY_NODE)" attached in state "catchup"
...
WARNING: following issues were detected
  - node "(MY_STANDBY_NODE)" (ID: 2) attached to its upstream node "(MY_PRIMARY_NODE)" (ID: 1) in state "catchup"

I rebooted the standby twice and this happened both times. (The warning eventually disappears when pg_stat_replication.state changes from "catchup" to "streaming".)

spihiker commented 2 days ago

The same as i.