ClusterLabs / resource-agents

Combined repository of OCF agents from the RHCS and Linux-HA projects
GNU General Public License v2.0
493 stars 579 forks source link

WARNING: Can't get <node-name> xlog location. #1846

Open dumm1 opened 1 year ago

dumm1 commented 1 year ago

Hello,

I attempted to create an HA postgres cluster but the primary resource won't come to alive. PostgreSQL version: 11.19 Pacemaker version: 2.1.4 OS Distribution: AlmaLinux 9.1

############################## Cluster and Resource status:

Node Attributes:

pacemaker.log is repeating below lines, Mar 09 10:31:16 pgsql(pgsql)[3290789]: INFO: Master does not exist. Mar 09 10:31:16 pgsql(pgsql)[3290789]: INFO: My data status=LATEST. Mar 09 10:31:16 pgsql(pgsql)[3290789]: WARNING: Can't get node-1 xlog location. Mar 09 10:31:16 pgsql(pgsql)[3290789]: WARNING: Can't get node-2 xlog location. Mar 09 10:31:20 pgsql(pgsql)[3290995]: INFO: Master does not exist. Mar 09 10:31:20 pgsql(pgsql)[3290995]: INFO: My data status=LATEST. Mar 09 10:31:20 pgsql(pgsql)[3290995]: WARNING: Can't get node-1 xlog location. Mar 09 10:31:20 pgsql(pgsql)[3290995]: WARNING: Can't get node-2 xlog location.

I'm sure that the xlog's location is correct when I created the resource.

Am I using the wrong packages? Do you have any example commands to create this cluster?

oalbrigt commented 1 year ago

Here's an old example of how to do this setup: https://wiki.clusterlabs.org/wiki/PgSQL_Replicated_Cluster

From what I recall there was a couple of keywords that are outdated, but I think you can just remove them from the config file when you get the error until it starts without errors.

dumm1 commented 1 year ago

Here's an old example of how to do this setup: https://wiki.clusterlabs.org/wiki/PgSQL_Replicated_Cluster

From what I recall there was a couple of keywords that are outdated, but I think you can just remove them from the config file when you get the error until it starts without errors.

Thanks for your reply. I referred to this example and it worked in my environment.

I'm a little bit curious why it is the vip-rep rather than vip-master.

pcs -f pgsql_cfg resource create pgsql pgsql \ pgctl="/usr/bin/pg_ctl" \ psql="/usr/bin/psql" \ pgdata="/var/lib/pgsql/data/" \ rep_mode="sync" \ node_list="node1 node2" \ restore_command="cp /var/lib/pgsql/pg_archive/%f %p" \ primary_conninfo_opt="keepalives_idle=60 keepalives_interval=5 keepalives_count=5" \ master_ip="192.168.2.3"

oalbrigt commented 1 year ago

vip-rep is for replication, so it's on it's own network in the example.

dumm1 commented 1 year ago

vip-rep is for replication, so it's on it's own network in the example.

I understand it is in its own network environment.

In below 3 commands, the 3rd command with master_ip="192.168.2.3", why the value of master_ip is not the value of ip="192.168.0.3" from the first command?

pcs -f pgsql_cfg resource create vip-master IPaddr2 \ ip="192.168.0.3" \ nic="eth0" \ cidr_netmask="24" \ op start timeout="60s" interval="0s" on-fail="restart" \ op monitor timeout="60s" interval="10s" on-fail="restart" \ op stop timeout="60s" interval="0s" on-fail="block"

pcs -f pgsql_cfg resource create vip-rep IPaddr2 \ ip="192.168.2.3" \ nic="eth2" \ cidr_netmask="24" \ meta migration-threshold="0" \ op start timeout="60s" interval="0s" on-fail="stop" \ op monitor timeout="60s" interval="10s" on-fail="restart" \ op stop timeout="60s" interval="0s" on-fail="ignore"

pcs -f pgsql_cfg resource create pgsql pgsql \ pgctl="/usr/bin/pg_ctl" \ psql="/usr/bin/psql" \ pgdata="/var/lib/pgsql/data/" \ rep_mode="sync" \ node_list="node1 node2" \ restore_command="cp /var/lib/pgsql/pg_archive/%f %p" \ primary_conninfo_opt="keepalives_idle=60 keepalives_interval=5 keepalives_count=5" \ master_ip="192.168.2.3" \

oalbrigt commented 1 year ago

Ah. That's probably due to the devs not thinking of a better name for the parameter when they made the pgsql agent, but at least the metadata says it's for replication: https://github.com/ClusterLabs/resource-agents/blob/main/heartbeat/pgsql#L339-L341

dumm1 commented 1 year ago

Ah. That's probably due to the devs not thinking of a better name for the parameter when they made the pgsql agent, but at least the metadata says it's for replication: https://github.com/ClusterLabs/resource-agents/blob/main/heartbeat/pgsql#L339-L341

Thank you for quoting the description from metadata

Master's floating IP address to be connected from hot standby. This parameter is used for "primary_conninfo" in recovery.conf. This is required for replication.