Attempting to add an old node back into the cluster fails despite no obvious errors during join process. Both nodes are idle sandbox machines. Plenty of CPU and memory, doing nothing.
This is on RHEL8, selinux enabled, and postgresql.org repos, Postgresql 16, repmgr-16. Installed with yum. selinux enabled is mandatory.
Steps in order executed.
NOTICE: standby clone (using pg_basebackup) complete
NOTICE: you can now start your PostgreSQL server
HINT: for example: /usr/pgsql-16/bin/pg_ctl start -D /var/lib/pgsql/16/data
DEBUG: get_node_record():
SELECT n.node_id, n.type, n.upstream_node_id, n.node_name, n.conninfo, n.repluser, n.slot_name, n.location, n.priority, n.active, n.config_file, '' AS upstream_node_name, NULL AS attached FROM repmgr.nodes n WHERE n.node_id = 2
DEBUG: get_node_record(): no record found for node 2
HINT: after starting the server, you need to register this standby with "repmgr standby register"
I am logging all queries on bar and an insert is not being called on repmgr.nodes during cloning.
I assume the last get_node_record() is actually successful because "standby register" hasn't been run.
But, maybe some part of the cloning process has failed because foo isn't registered in the repmgr.nodes table? If that's the case, then there's no indication the cloning process has failed.
/usr/pgsql-16/bin/repmgr -f /etc/repmgr/16/repmgr.conf node service --action start
/usr/pgsql-16/bin/repmgr -f /etc/repmgr/16/repmgr.conf standby register -d 'host=bar port=15432 dbname=repmgr user=superduper passfile=/var/lib/pgsql/.pgpass sslmode=prefer sslcert=/etc/ssl/certs/host.crt sslkey=/var/lib/pgsql/key.pem sslrootcert=/etc/ssl/certs/ca-bundle.crt' -v -L DEBUG --upstream-node-id=1
ERROR: this node does not appear to be attached to upstream node "bar" (ID: 1)
I can force the command and it registers, but it's not connected.
Somewhere on the internet, someone had "standby follow" work. It didn't. It exited successful as well.
WARNING: node "foo" not found in "pg_stat_replication"
DEBUG: sleeping 30 of max 30 seconds waiting for standby to attach to primary
NOTICE: STANDBY FOLLOW successful
/usr/pgsql-16/bin/repmgr -f /etc/repmgr/16/repmgr.conf node check
Upstream connection: CRITICAL (node "foo" (ID: 2) is not attached to expected upstream node "bar" (ID: 1))
No amount of restarting repmgr, postgresql on either node changes the outcome. Deleting and re-cloning the primary doesn't change the results.
Attempting to add an old node back into the cluster fails despite no obvious errors during join process. Both nodes are idle sandbox machines. Plenty of CPU and memory, doing nothing. This is on RHEL8, selinux enabled, and postgresql.org repos, Postgresql 16, repmgr-16. Installed with yum. selinux enabled is mandatory. Steps in order executed.
rm -rf /var/lib/pgsql/16/data/*
rm -rf /var/lib/pgsql/16/wal/*
I am logging all queries on bar and an insert is not being called on repmgr.nodes during cloning.
I assume the last get_node_record() is actually successful because "standby register" hasn't been run. But, maybe some part of the cloning process has failed because foo isn't registered in the repmgr.nodes table? If that's the case, then there's no indication the cloning process has failed.
/usr/pgsql-16/bin/repmgr -f /etc/repmgr/16/repmgr.conf node service --action start
/usr/pgsql-16/bin/repmgr -f /etc/repmgr/16/repmgr.conf standby register -d 'host=bar port=15432 dbname=repmgr user=superduper passfile=/var/lib/pgsql/.pgpass sslmode=prefer sslcert=/etc/ssl/certs/host.crt sslkey=/var/lib/pgsql/key.pem sslrootcert=/etc/ssl/certs/ca-bundle.crt' -v -L DEBUG --upstream-node-id=1
ERROR: this node does not appear to be attached to upstream node "bar" (ID: 1)
I can force the command and it registers, but it's not connected.Somewhere on the internet, someone had "standby follow" work. It didn't. It exited successful as well.
/usr/pgsql-16/bin/repmgr -f /etc/repmgr/16/repmgr.conf standby follow -d 'host=bar port=15432 dbname=repmgr user=superduper passfile=/var/lib/pgsql/.pgpass sslmode=prefer sslcert=/etc/ssl/certs/host.crt sslkey=/var/lib/pgsql/key.pem sslrootcert=/etc/ssl/certs/ca-bundle.crt' -v -L DEBUG --upstream-node-id=1
/usr/pgsql-16/bin/repmgr -f /etc/repmgr/16/repmgr.conf node check
Upstream connection: CRITICAL (node "foo" (ID: 2) is not attached to expected upstream node "bar" (ID: 1))
No amount of restarting repmgr, postgresql on either node changes the outcome. Deleting and re-cloning the primary doesn't change the results.