Open aviralsingh21 opened 2 months ago
I had a similar problem. When I reboot my primary e.g. for updating the Linux kernel, the secondary is promoted to primary. To "fix" this, I pause the service before reboot and unpause after reboot.
Pause service (execute on ONE node of the cluster):
/path/to/binary/repmgr --config-file=/path/to/config/repmgr.conf service pause
Unpause/continue service (execute on ONE node of the cluster):
/path/to/binary/repmgr --config-file=/path/to/config/repmgr.conf service unpause
I have a docker swarm HA architecture with setup of 3 nodes of PostgreSQL, 1 pgpool-II service and various other services. PostgreSQL is setup in HA Cluster using Replication Manager (repmgr) tool. 1 Primary Node + 1 Standby Node + 1 Witness Node
Docker Image Used: bitnami/postgresql-repmgr:16.3.0
Issue: Standby resyncs with Primary Node at every Restart of docker services.
What I was planning to do is to perform a graceful shutdown of postgresql database and then stop the container. In the process of shutting down the database at primary node (node-1), as soon it was shutdown then container got exited and database started as with new container id with a standby role and started to re-sync with new primary(node-2). I assumed this is normal. Since everytime container was restarting at every db shutdown try, I thought it will be better to first stop the repmgr daemon to permanently stop the database. But this didn't help.
I didn't get the permanent way to perform graceful shutdown of database before stopping docker service of postgresql. I didn't get the solution for it but I discovered another issue where whenever I restart the postgresql docker service, standby node (node-1) re-syncs (performs cloning) every single time with primary node (node-1).
PostgreSQL Logs from Standby Node:
I also compared logs of standby with other same environment's standby node which is not facing such issue. Logs are same as above, just 'Rejoining Node...' log does not exist there.
Additional information: I have already reviewed other relevant issues. Like #52213, #34986. I configured pg_rewind and enabled wal_log_hints. But situation is still same. I tested with bitnami/postgresql-repmgr:12.4.0 docker imager. Same situation is there also. I also deleted the volume and deployed the postgresql service with fresh volume, restored the database again. This time I directly stopped the docker service instead of stopping database first. But still I am facing same issue. Database Size used for testing: Around 60GB.
How to tackle this situation, anyone can please help me with this situation?