repmgr daemon status showing repmgrd as 'not running'

Hi,

I'm playing with AlloyDB Omni, which is a standard PGSQL wrapped in a container and packed with some GCP (Google) steroids. Everything is working well, I was able to build a simple config with Primary and a single Standby. I was also able to use repmgr to test the switchover and switchback operations - this also works fine. The problem starts when I try to use repmgr with automatic failover:

Versions: repmgr --version repmgr 5.4.1

postgres --version postgres (PostgreSQL) 15.5

Configuration: A) repmgrd content (/etc/default/repmgrd): REPMGRD_ENABLED=yes REPMGRD_CONF="/var/alloydb/config/repmgr.conf" REPMGRD_OPTS="--daemonize=false" REPMGRD_USER=postgres REPMGRD_BIN=/usr/bin/repmgrd REPMGRD_PIDFILE=/var/run/repmgrd.pid

B) repmgr cofiguration (/var/alloydb/config/repmgr.conf): failover=automatic promote_command='/usr/bin/repmgr standby promote -f /var/alloydb/config/repmgr.conf --log-to-file' follow_command='/usr/bin/repmgr standby follow -f /var/alloydb/config/repmgr.conf --log-to-file --upstream-node-id=%n' repmgrd_service_start_command='sudo /usr/bin/systemctl start repmgrd' repmgrd_service_start_command='sudo /usr/bin/systemctl stop repmgrd' monitoring_history=yes log_level=INFO log_file='/var/log/postgres/repmgrd.log'

Sympthoms: I'm able to start the repmgrd service on both nodes:

on prim: repmgr -f /var/alloydb/config/repmgr.conf daemon start --verbose NOTICE: using provided configuration file "/var/alloydb/config/repmgr.conf" INFO: connecting to local node NOTICE: executing: "sudo /usr/bin/systemctl start repmgrd" NOTICE: repmgrd was successfully started

prim output: ● repmgrd.service - LSB: Start/stop repmgrd Loaded: loaded (/etc/init.d/repmgrd; generated) Active: active (running) since Mon 2024-06-24 04:24:39 EDT; 16min ago Docs: man:systemd-sysv-generator(8) Process: 10531 ExecStart=/etc/init.d/repmgrd start (code=exited, status=0/SUCCESS) Tasks: 1 (limit: 19151) Memory: 1.3M CPU: 532ms CGroup: /system.slice/repmgrd.service └─10536 /usr/lib/postgresql/15/bin/repmgrd --config-file /var/alloydb/config/repmgr.conf --daemonize=false

Jun 24 04:24:39 omnidbv-repli-03 systemd[1]: Starting LSB: Start/stop repmgrd... Jun 24 04:24:39 omnidbv-repli-03 repmgrd[10531]: Starting PostgreSQL replication management and monitoring daemon: repmgrd. Jun 24 04:24:39 omnidbv-repli-03 systemd[1]: Started LSB: Start/stop repmgrd.

on stby: repmgr -f /var/alloydb/config/repmgr.conf daemon start --verbose NOTICE: using provided configuration file "/var/alloydb/config/repmgr.conf" INFO: connecting to local node NOTICE: executing: "sudo /usr/bin/systemctl start repmgrd" NOTICE: repmgrd was successfully started

stby output: ● repmgrd.service - LSB: Start/stop repmgrd Loaded: loaded (/etc/init.d/repmgrd; generated) Active: active (running) since Mon 2024-06-24 04:24:39 EDT; 17min ago Docs: man:systemd-sysv-generator(8) Process: 10531 ExecStart=/etc/init.d/repmgrd start (code=exited, status=0/SUCCESS) Tasks: 1 (limit: 19151) Memory: 1.3M CPU: 567ms CGroup: /system.slice/repmgrd.service └─10536 /usr/lib/postgresql/15/bin/repmgrd --config-file /var/alloydb/config/repmgr.conf --daemonize=false

repmgr extention is installed on both nodes: repmgr=# SELECT * FROM pg_extension;	oid	extname	extowner	extnamespace	extrelocatable	extversion	extconfig
14204	plpgsql	10	11	f	1.0
99377	google_columnar_engine	10	2200	t	1.0
99567	google_db_advisor	10	2200	t	1.0
99661	hypopg	10	2200	t	1.3.2
50059	repmgr	47598	50058	f	5.4	{50060,50076,50083}	{"","",""}

repmgr service status and daemon status are able to show the repmgrd PIDs but reporting repmgrd as 'not running'	ID	Name	Role	Status	Upstream	repmgrd	PID	Paused?	Upstream last seen
1	omnidbv-03-n1	primary	* running		not running	52598	no	n/a
2	omnidbv-03-n2	standby	running	omnidbv-03-n1	not running	10536	no	0 second(s) ago

Any clue why this can be happening? What types of checks repmgr is doing to get the daemon status (beside the repmgrd_is_running function)? Appreciate any help in debugging. BTW. why the logfile is reporting about: set_repmgrd_pid(): provided pidfile is /tmp/repmgrd.pid and not as configured: REPMGRD_PIDFILE=/var/run/repmgrd.pid,

EnterpriseDB / repmgr

repmgr daemon status showing repmgrd as 'not running' #854