I'm playing with AlloyDB Omni, which is a standard PGSQL wrapped in a container and packed with some GCP (Google) steroids. Everything is working well, I was able to build a simple config with Primary and a single Standby. I was also able to use repmgr to test the switchover and switchback operations - this also works fine.
The problem starts when I try to use repmgr with automatic failover:
Sympthoms:
I'm able to start the repmgrd service on both nodes:
on prim:
repmgr -f /var/alloydb/config/repmgr.conf daemon start --verbose
NOTICE: using provided configuration file "/var/alloydb/config/repmgr.conf"
INFO: connecting to local node
NOTICE: executing: "sudo /usr/bin/systemctl start repmgrd"
NOTICE: repmgrd was successfully started
Jun 24 04:24:39 omnidbv-repli-03 systemd[1]: Starting LSB: Start/stop repmgrd...
Jun 24 04:24:39 omnidbv-repli-03 repmgrd[10531]: Starting PostgreSQL replication management and monitoring daemon: repmgrd.
Jun 24 04:24:39 omnidbv-repli-03 systemd[1]: Started LSB: Start/stop repmgrd.
on stby:
repmgr -f /var/alloydb/config/repmgr.conf daemon start --verbose
NOTICE: using provided configuration file "/var/alloydb/config/repmgr.conf"
INFO: connecting to local node
NOTICE: executing: "sudo /usr/bin/systemctl start repmgrd"
NOTICE: repmgrd was successfully started
Jun 24 04:24:39 omnidbv-repli-03 systemd[1]: Starting LSB: Start/stop repmgrd...
Jun 24 04:24:39 omnidbv-repli-03 repmgrd[10531]: Starting PostgreSQL replication management and monitoring daemon: repmgrd.
Jun 24 04:24:39 omnidbv-repli-03 systemd[1]: Started LSB: Start/stop repmgrd.
repmgr extention is installed on both nodes:
repmgr=# SELECT * FROM pg_extension;
oid
extname
extowner
extnamespace
extrelocatable
extversion
extconfig
extcondition
14204
plpgsql
10
11
f
1.0
99377
google_columnar_engine
10
2200
t
1.0
99567
google_db_advisor
10
2200
t
1.0
99661
hypopg
10
2200
t
1.3.2
50059
repmgr
47598
50058
f
5.4
{50060,50076,50083}
{"","",""}
repmgr service status and daemon status are able to show the repmgrd PIDs but reporting repmgrd as 'not running'
ID
Name
Role
Status
Upstream
repmgrd
PID
Paused?
Upstream last seen
1
omnidbv-03-n1
primary
* running
not running
52598
no
n/a
2
omnidbv-03-n2
standby
running
omnidbv-03-n1
not running
10536
no
0 second(s) ago
Any clue why this can be happening? What types of checks repmgr is doing to get the daemon status (beside the repmgrd_is_running function)? Appreciate any help in debugging.
BTW. why the logfile is reporting about: set_repmgrd_pid(): provided pidfile is /tmp/repmgrd.pid and not as configured: REPMGRD_PIDFILE=/var/run/repmgrd.pid,
Hi,
I'm playing with AlloyDB Omni, which is a standard PGSQL wrapped in a container and packed with some GCP (Google) steroids. Everything is working well, I was able to build a simple config with Primary and a single Standby. I was also able to use repmgr to test the switchover and switchback operations - this also works fine. The problem starts when I try to use repmgr with automatic failover:
Versions: repmgr --version repmgr 5.4.1
postgres --version postgres (PostgreSQL) 15.5
Configuration: A) repmgrd content (/etc/default/repmgrd): REPMGRD_ENABLED=yes REPMGRD_CONF="/var/alloydb/config/repmgr.conf" REPMGRD_OPTS="--daemonize=false" REPMGRD_USER=postgres REPMGRD_BIN=/usr/bin/repmgrd REPMGRD_PIDFILE=/var/run/repmgrd.pid
B) repmgr cofiguration (/var/alloydb/config/repmgr.conf): failover=automatic promote_command='/usr/bin/repmgr standby promote -f /var/alloydb/config/repmgr.conf --log-to-file' follow_command='/usr/bin/repmgr standby follow -f /var/alloydb/config/repmgr.conf --log-to-file --upstream-node-id=%n' repmgrd_service_start_command='sudo /usr/bin/systemctl start repmgrd' repmgrd_service_start_command='sudo /usr/bin/systemctl stop repmgrd' monitoring_history=yes log_level=INFO log_file='/var/log/postgres/repmgrd.log'
Sympthoms: I'm able to start the repmgrd service on both nodes:
on prim: repmgr -f /var/alloydb/config/repmgr.conf daemon start --verbose NOTICE: using provided configuration file "/var/alloydb/config/repmgr.conf" INFO: connecting to local node NOTICE: executing: "sudo /usr/bin/systemctl start repmgrd" NOTICE: repmgrd was successfully started
prim output: ● repmgrd.service - LSB: Start/stop repmgrd Loaded: loaded (/etc/init.d/repmgrd; generated) Active: active (running) since Mon 2024-06-24 04:24:39 EDT; 16min ago Docs: man:systemd-sysv-generator(8) Process: 10531 ExecStart=/etc/init.d/repmgrd start (code=exited, status=0/SUCCESS) Tasks: 1 (limit: 19151) Memory: 1.3M CPU: 532ms CGroup: /system.slice/repmgrd.service └─10536 /usr/lib/postgresql/15/bin/repmgrd --config-file /var/alloydb/config/repmgr.conf --daemonize=false
Jun 24 04:24:39 omnidbv-repli-03 systemd[1]: Starting LSB: Start/stop repmgrd... Jun 24 04:24:39 omnidbv-repli-03 repmgrd[10531]: Starting PostgreSQL replication management and monitoring daemon: repmgrd. Jun 24 04:24:39 omnidbv-repli-03 systemd[1]: Started LSB: Start/stop repmgrd.
on stby: repmgr -f /var/alloydb/config/repmgr.conf daemon start --verbose NOTICE: using provided configuration file "/var/alloydb/config/repmgr.conf" INFO: connecting to local node NOTICE: executing: "sudo /usr/bin/systemctl start repmgrd" NOTICE: repmgrd was successfully started
stby output: ● repmgrd.service - LSB: Start/stop repmgrd Loaded: loaded (/etc/init.d/repmgrd; generated) Active: active (running) since Mon 2024-06-24 04:24:39 EDT; 17min ago Docs: man:systemd-sysv-generator(8) Process: 10531 ExecStart=/etc/init.d/repmgrd start (code=exited, status=0/SUCCESS) Tasks: 1 (limit: 19151) Memory: 1.3M CPU: 567ms CGroup: /system.slice/repmgrd.service └─10536 /usr/lib/postgresql/15/bin/repmgrd --config-file /var/alloydb/config/repmgr.conf --daemonize=false
Jun 24 04:24:39 omnidbv-repli-03 systemd[1]: Starting LSB: Start/stop repmgrd... Jun 24 04:24:39 omnidbv-repli-03 repmgrd[10531]: Starting PostgreSQL replication management and monitoring daemon: repmgrd. Jun 24 04:24:39 omnidbv-repli-03 systemd[1]: Started LSB: Start/stop repmgrd.
Any clue why this can be happening? What types of checks repmgr is doing to get the daemon status (beside the repmgrd_is_running function)? Appreciate any help in debugging. BTW. why the logfile is reporting about: set_repmgrd_pid(): provided pidfile is /tmp/repmgrd.pid and not as configured: REPMGRD_PIDFILE=/var/run/repmgrd.pid,