Closed itxx00 closed 4 years ago
Hi @itxx00
The role does not do this out of the box but here's what you can do. Adding a node can disrupt running resources so the database needs to be down when adding a node. To achieve that (and not loose the advantage of role's actions before the node join) you have to temporarily edit the role to stop before node add.
Add this task
- meta: end_play
here.
Now (let's say that pgha3
is the replaced node):
run the role
when it exits, shutdown the DB and add the node manually
pcs cluster node remove pgha3 --request-timeout=1
# outage start
pcs resource disable postgres-ha
# wait for stop
pcs cluster node add pgha3
ssh pgha3 pcs cluster start
pcs resource enable postgres-ha
# outage end
revert role edits you have made before
re-run the role
now it fails on waiting for all slaves connected (but we've made it further)
then on node3:
yum install https://github.com/YanChii/ansible-role-postgres-ha/raw/master/files/resource-agents-paf-2.2.1-1.noarch.rpm
scp pgha1:/var/lib/pgsql/10/data/recovery.conf.pgcluster.pcmk /var/lib/pgsql/10/recovery.conf.pgcluster.pcmk
sed -i'' -e 's/application_name=pgha1/application_name=pgha3/' /var/lib/pgsql/10/recovery.conf.pgcluster.pcmk
pcs resource cleanup postgres-ha --node pgha3
- now finally re-run the role again
Jan
Hi @YanChii , I followed the steps and after disable postgres-ha, seems that cannot add already exists node into cluster,
The old node must be removed before adding a new one with the same name.
pcs cluster node remove pgha3 --request-timeout=1
I've updated also my post above.
Jan
After pcs resource cleanup postgres-ha --node db01
and re-run rule, the pgsql service did not startup on db01, and playbooks always failed at check if slaves are connected
. now the tasks looks like:
It is expected. Pls read the above instructions again.
After finally re-run the role again, seems still cannot start postgres on db01 :-(
The postgres must be up BEFORE the last role re-run. Do what's necessary to start it. Resource cleanup
, clear
, or refresh
should be enough (careful, one of them restarts the master resource). Then maybe check the constraints (should not be a problem when the node name is unchanged) or disable/enable the resource. Also check main logs on the new server.
Jan
Thanks for your help, the lost node be back now, thanks.
Hi, Thanks for this role. Now I have a 3 nodes cluster and 1 node's OS has been reinstalled and hard drive has been replaced with new one , how can I recover this node with playbooks? thanks.