Open piotrekfus91 opened 3 years ago
Hi, i have the same issue here with:
(no docker involved, just plain vms)
Did you work out a solution? Thanks.
I didn't, we plan to change repmgr to something else after half a year of no answer.
We had the same problem with WAL on postgres 13 and repmgr 5.3. It happens when Timeline is not equal on nodes:
node1$ repmgr -v -f /etc/postgresql/13/main/repmgr.conf cluster show
NOTICE: using provided configuration file "/etc/postgresql/13/main/repmgr.conf"
INFO: connecting to database
ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string
----+---------+---------+-----------+----------+----------+----------+----------+------------------
1 | node1 | standby | running | node2 | default | 100 | 15 | ...
2 | node2 | primary | * running | | default | 100 | 16 | ...
You can restart standby node:
node1$ sudo systemctl restart postgresql
and timeline will be equal on both nodes:
node1$ repmgr -v -f /etc/postgresql/13/main/repmgr.conf cluster show
NOTICE: using provided configuration file "/etc/postgresql/13/main/repmgr.conf"
INFO: connecting to database
ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string
----+---------+---------+-----------+----------+----------+----------+----------+------------------
1 | node1 | standby | running | node2 | default | 100 | 16 | ...
2 | node2 | primary | * running | | default | 100 | 16 | ...
Next switchover operation should be successful:
node1$ repmgr -v -f /etc/postgresql/13/main/repmgr.conf standby switchover
NOTICE: using provided configuration file "/etc/postgresql/13/main/repmgr.conf"
NOTICE: executing switchover on node "node1" (ID: 1)
INFO: searching for primary node
INFO: checking if node 2 is primary
INFO: current primary node is 2
INFO: SSH connection to host "node2" succeeded
INFO: 0 pending archive files
INFO: replication lag on this standby is 0 seconds
NOTICE: attempting to pause repmgrd on 2 nodes
NOTICE: local node "node1" (ID: 1) will be promoted to primary; current primary "node2" (ID: 2) will be demoted to standby
NOTICE: stopping current primary node "node2" (ID: 2)
NOTICE: issuing CHECKPOINT on node "node2" (ID: 2)
DETAIL: executing server command "sudo /usr/bin/systemctl stop postgresql"
INFO: checking for primary shutdown; 1 of 60 attempts ("shutdown_check_timeout")
INFO: checking for primary shutdown; 2 of 60 attempts ("shutdown_check_timeout")
NOTICE: current primary has been cleanly shut down at location 1/A8000028
NOTICE: promoting standby to primary
DETAIL: promoting server "node1" (ID: 1) using pg_promote()
NOTICE: waiting up to 60 seconds (parameter "promote_check_timeout") for promotion to complete
INFO: standby promoted to primary after 1 second(s)
NOTICE: STANDBY PROMOTE successful
DETAIL: server "node1" (ID: 1) was successfully promoted to primary
INFO: node "node2" (ID: 2) is pingable
INFO: node "node2" (ID: 2) has attached to its upstream node
NOTICE: node "node1" (ID: 1) promoted to primary, node "node2" (ID: 2) demoted to standby
NOTICE: switchover was successful
DETAIL: node "node1" is now primary and node "node2" is attached as standby
NOTICE: STANDBY SWITCHOVER has completed successfully
Result:
node1$ repmgr -v -f /etc/postgresql/13/main/repmgr.conf cluster show
NOTICE: using provided configuration file "/etc/postgresql/13/main/repmgr.conf"
INFO: connecting to database
ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string
----+---------+---------+-----------+----------+----------+----------+----------+-------------------
1 | node1 | primary | * running | | default | 100 | 17 | ...
2 | node2 | standby | running | node1 | default | 100 | 16 | ...
@alien11689
I had the same problem, which could be solved by restarting the standby server or waiting a few minutes.
I hit the same issue.
Main reason here is the fictive archive_command
, if you disable archiving — things works as expected.
To fix, just make archive_command = '{ sleep 5; true; }'
. Smaller timeout might work as well.
I am not sure whether this is an repmgr issue or there's a race inside PostgreSQL, though.
@vyegorov Thank you very much for your answer, that is the solution: archive_command = '{ sleep 5; true; }'
I hit the same issue.
Main reason here is the fictive
archive_command
, if you disable archiving — things works as expected.To fix, just make
archive_command = '{ sleep 5; true; }'
. Smaller timeout might work as well. I am not sure whether this is an repmgr issue or there's a race inside PostgreSQL, though.
Thank you ! Your reply also solved my same issue.
Hi, I am trying to do switchover using repmgr. It stops primary node correctly, but after that it hangs during rewind:
I tried with
--force-rewind=/usr/lib/postgresql/13/bin/pg_rewind
, the result is the same. I also created a symlinksudo ln -s /usr/lib/postgresql/13/bin/pg_rewind /usr/bin/pg_rewind
, but still to no avail.repmgr 5.2.0 postgresql 13 ubuntu 20.04 (on docker) postgresql.override.conf:
repmgr.conf:
Any hints, how to solve this problem?