YanChii / ansible-role-postgres-ha

Create postgresql HA auto-failover cluster using pcs, pacemaker and PAF
Apache License 2.0
33 stars 22 forks source link

first run, fresh vm's #3

Closed frank3427 closed 7 years ago

frank3427 commented 7 years ago

[root@AnsibleServer ~]# ansible-playbook --ask-pass dbs3-postgres-ha.yml SSH password:

PLAY [install PG HA] ***

TASK [Gathering Facts] ***** ok: [dbs04.prodea-int.net] ok: [dbs03.prodea-int.net]

TASK [disable firewall] **** changed: [dbs03.prodea-int.net] changed: [dbs04.prodea-int.net]

TASK [postgres-ha6 : debug] **** ok: [dbs03.prodea-int.net] => { "msg": "MASTER NODE SET TO dbs03.prodea-int.net" }

TASK [postgres-ha6 : verify postgres_ha_cluster_master_host] *** skipping: [dbs03.prodea-int.net] skipping: [dbs04.prodea-int.net]

TASK [postgres-ha6 : identify the OS] ** ok: [dbs03.prodea-int.net] ok: [dbs04.prodea-int.net]

TASK [postgres-ha6 : debug] **** ok: [dbs03.prodea-int.net] => { "msg": "cluster_members=[u'dbs03.prodea-int.net', u'dbs04.prodea-int.net']" }

TASK [postgres-ha6 : install cluster pkgs] ***** changed: [dbs04.prodea-int.net] changed: [dbs03.prodea-int.net]

TASK [postgres-ha6 : install additional cluster pkgs for centos6] ** changed: [dbs03.prodea-int.net] => (item=[u'pacemaker', u'libselinux-python']) changed: [dbs04.prodea-int.net] => (item=[u'pacemaker', u'libselinux-python'])

TASK [postgres-ha6 : Build hosts file] ***** changed: [dbs03.prodea-int.net] => (item=dbs03.prodea-int.net) changed: [dbs04.prodea-int.net] => (item=dbs03.prodea-int.net) changed: [dbs03.prodea-int.net] => (item=dbs04.prodea-int.net) changed: [dbs04.prodea-int.net] => (item=dbs04.prodea-int.net)

TASK [postgres-ha6 : service pcsd start] *** changed: [dbs03.prodea-int.net] changed: [dbs04.prodea-int.net]

TASK [postgres-ha6 : setup hacluster password] ***** changed: [dbs04.prodea-int.net] changed: [dbs03.prodea-int.net]

TASK [postgres-ha6 : setup cluster auth] *** skipping: [dbs03.prodea-int.net] skipping: [dbs04.prodea-int.net]

TASK [postgres-ha6 : create cluster] *** skipping: [dbs03.prodea-int.net]

TASK [postgres-ha6 : create cluster] *** changed: [dbs04.prodea-int.net] changed: [dbs03.prodea-int.net]

TASK [postgres-ha6 : join cluster nodes] *** skipping: [dbs03.prodea-int.net] => (item=dbs04.prodea-int.net)

TASK [postgres-ha6 : start cluster] **** changed: [dbs04.prodea-int.net] changed: [dbs03.prodea-int.net]

TASK [postgres-ha6 : alter stonith settings] *** changed: [dbs03.prodea-int.net]

TASK [postgres-ha6 : alter cluster policy settings] **** changed: [dbs03.prodea-int.net]

TASK [postgres-ha6 : alter cluster transition settings] **** changed: [dbs03.prodea-int.net]

TASK [postgres-ha6 : verify cluster configuration] ***** changed: [dbs03.prodea-int.net]

TASK [postgres-ha6 : enable cluster autostart] ***** changed: [dbs03.prodea-int.net] changed: [dbs04.prodea-int.net]

TASK [postgres-ha6 : create virtual IP resource] *** [WARNING]: when statements should not include jinja2 templating delimiters such as {{ }} or {% %}. Found: inventory_hostname == "{{ postgres_ha_cluster_master_host }}" [WARNING]: when statements should not include jinja2 templating delimiters such as {{ }} or {% %}. Found: inventory_hostname == "{{ postgres_ha_cluster_master_host }}" skipping: [dbs04.prodea-int.net] changed: [dbs03.prodea-int.net]

TASK [postgres-ha6 : import pg96 repo] ***** changed: [dbs03.prodea-int.net] changed: [dbs04.prodea-int.net]

TASK [postgres-ha6 : install pg96] ***** changed: [dbs04.prodea-int.net] changed: [dbs03.prodea-int.net]

TASK [postgres-ha6 : init DB dir on master if necessary (centos 7)] **** skipping: [dbs03.prodea-int.net] skipping: [dbs04.prodea-int.net]

TASK [postgres-ha6 : init DB dir on master if necessary (centos 6)] **** skipping: [dbs04.prodea-int.net] changed: [dbs03.prodea-int.net]

TASK [postgres-ha6 : check if DB was synchronized before] ** ok: [dbs03.prodea-int.net] ok: [dbs04.prodea-int.net]

TASK [postgres-ha6 : alter clustering-related settings in postgresql.conf] ***** [WARNING]: when statements should not include jinja2 templating delimiters such as {{ }} or {% %}. Found: inventory_hostname == "{{ postgres_ha_cluster_master_host }}" [WARNING]: when statements should not include jinja2 templating delimiters such as {{ }} or {% %}. Found: inventory_hostname == "{{ postgres_ha_cluster_master_host }}" skipping: [dbs04.prodea-int.net] => (item={'key': u'hot_standby', 'value': u'on'}) skipping: [dbs04.prodea-int.net] => (item={'key': u'listen_addresses', 'value': u"''"}) skipping: [dbs04.prodea-int.net] => (item={'key': u'wal_level', 'value': u'hot_standby'}) skipping: [dbs04.prodea-int.net] => (item={'key': u'wal_log_hints', 'value': u'on'}) skipping: [dbs04.prodea-int.net] => (item={'key': u'max_wal_senders', 'value': u'4'}) skipping: [dbs04.prodea-int.net] => (item={'key': u'max_replication_slots', 'value': u'4'}) changed: [dbs03.prodea-int.net] => (item={'key': u'hot_standby', 'value': u'on'}) changed: [dbs03.prodea-int.net] => (item={'key': u'listen_addresses', 'value': u"''"}) changed: [dbs03.prodea-int.net] => (item={'key': u'wal_level', 'value': u'hot_standby'}) changed: [dbs03.prodea-int.net] => (item={'key': u'wal_log_hints', 'value': u'on'}) changed: [dbs03.prodea-int.net] => (item={'key': u'max_wal_senders', 'value': u'4'}) changed: [dbs03.prodea-int.net] => (item={'key': u'max_replication_slots', 'value': u'4'})

RUNNING HANDLER [postgres-ha6 : restart postgresql] **** changed: [dbs03.prodea-int.net]

TASK [postgres-ha6 : alter DB ACL in pg_hba.conf] ** [WARNING]: when statements should not include jinja2 templating delimiters such as {{ }} or {% %}. Found: inventory_hostname == "{{ postgres_ha_cluster_master_host }}" or db_prevsync_file.stat.exists [WARNING]: when statements should not include jinja2 templating delimiters such as {{ }} or {% %}. Found: inventory_hostname == "{{ postgres_ha_cluster_master_host }}" or db_prevsync_file.stat.exists skipping: [dbs04.prodea-int.net] => (item=dbs03.prodea-int.net) skipping: [dbs04.prodea-int.net] => (item=dbs04.prodea-int.net) changed: [dbs03.prodea-int.net] => (item=dbs03.prodea-int.net) changed: [dbs03.prodea-int.net] => (item=dbs04.prodea-int.net)

TASK [postgres-ha6 : alter DB replication ACL in pg_hba.conf] ** [WARNING]: when statements should not include jinja2 templating delimiters such as {{ }} or {% %}. Found: inventory_hostname == "{{ postgres_ha_cluster_master_host }}" or db_prevsync_file.stat.exists [WARNING]: when statements should not include jinja2 templating delimiters such as {{ }} or {% %}. Found: inventory_hostname == "{{ postgres_ha_cluster_master_host }}" or db_prevsync_file.stat.exists skipping: [dbs04.prodea-int.net] => (item=dbs03.prodea-int.net) skipping: [dbs04.prodea-int.net] => (item=dbs04.prodea-int.net) changed: [dbs03.prodea-int.net] => (item=dbs03.prodea-int.net) changed: [dbs03.prodea-int.net] => (item=dbs04.prodea-int.net)

RUNNING HANDLER [postgres-ha6 : reload postgresql] ***** changed: [dbs03.prodea-int.net]

TASK [postgres-ha6 : setup DB cluster auth (master IP)] **** changed: [dbs03.prodea-int.net] changed: [dbs04.prodea-int.net]

TASK [postgres-ha6 : setup .pgpass replication auth for master IP] ***** changed: [dbs03.prodea-int.net] changed: [dbs04.prodea-int.net]

TASK [postgres-ha6 : setup .pgpass replication auth for other IPs] ***** changed: [dbs04.prodea-int.net] => (item=dbs03.prodea-int.net) changed: [dbs03.prodea-int.net] => (item=dbs03.prodea-int.net) changed: [dbs04.prodea-int.net] => (item=dbs04.prodea-int.net) changed: [dbs03.prodea-int.net] => (item=dbs04.prodea-int.net)

TASK [postgres-ha6 : check if master host "dbs03.prodea-int.net" is really a DB master] **** [WARNING]: when statements should not include jinja2 templating delimiters such as {{ }} or {% %}. Found: inventory_hostname == "{{ postgres_ha_cluster_master_host }}" [WARNING]: when statements should not include jinja2 templating delimiters such as {{ }} or {% %}. Found: inventory_hostname == "{{ postgres_ha_cluster_master_host }}" skipping: [dbs04.prodea-int.net] changed: [dbs03.prodea-int.net]

TASK [postgres-ha6 : mark master DB] *** [WARNING]: when statements should not include jinja2 templating delimiters such as {{ }} or {% %}. Found: inventory_hostname == "{{ postgres_ha_cluster_master_host }}" [WARNING]: when statements should not include jinja2 templating delimiters such as {{ }} or {% %}. Found: inventory_hostname == "{{ postgres_ha_cluster_master_host }}" skipping: [dbs04.prodea-int.net] changed: [dbs03.prodea-int.net]

TASK [postgres-ha6 : check if DB is running (failure is ok)] *** changed: [dbs03.prodea-int.net] fatal: [dbs04.prodea-int.net]: FAILED! => {"changed": true, "cmd": "/usr/pgsql-9.6/bin/pg_ctl -D /var/lib/pgsql/9.6/data status", "delta": "0:00:00.042861", "end": "2017-08-28 23:38:32.862478", "failed": true, "rc": 4, "start": "2017-08-28 23:38:32.819617", "stderr": "pg_ctl: directory \"/var/lib/pgsql/9.6/data\" is not a database cluster directory", "stderr_lines": ["pg_ctl: directory \"/var/lib/pgsql/9.6/data\" is not a database cluster directory"], "stdout": "", "stdout_lines": []} ...ignoring

TASK [postgres-ha6 : check if DB is running in cluster (failure is OK)] **** fatal: [dbs03.prodea-int.net]: FAILED! => {"changed": true, "cmd": "pcs constraint location show resources \"postgres-ha\" | grep -q Enabled", "delta": "0:00:00.344149", "end": "2017-08-28 23:38:34.271540", "failed": true, "rc": 1, "start": "2017-08-28 23:38:33.927391", "stderr": "", "stderr_lines": [], "stdout": "", "stdout_lines": []} ...ignoring fatal: [dbs04.prodea-int.net]: FAILED! => {"changed": true, "cmd": "pcs constraint location show resources \"postgres-ha\" | grep -q Enabled", "delta": "0:00:00.330268", "end": "2017-08-28 23:38:34.298727", "failed": true, "rc": 1, "start": "2017-08-28 23:38:33.968459", "stderr": "", "stderr_lines": [], "stdout": "", "stdout_lines": []} ...ignoring

TASK [postgres-ha6 : start master DB if necessary (without cluster)] *** [WARNING]: when statements should not include jinja2 templating delimiters such as {{ }} or {% %}. Found: (inventory_hostname == "{{ postgres_ha_cluster_master_host }}") and (db_resource_exists|failed) and (db_running|failed) skipping: [dbs03.prodea-int.net] [WARNING]: when statements should not include jinja2 templating delimiters such as {{ }} or {% %}. Found: (inventory_hostname == "{{ postgres_ha_cluster_master_host }}") and (db_resource_exists|failed) and (db_running|failed) skipping: [dbs04.prodea-int.net]

TASK [postgres-ha6 : start master DB if necessary (in cluster)] **** [WARNING]: when statements should not include jinja2 templating delimiters such as {{ }} or {% %}. Found: (inventory_hostname == "{{ postgres_ha_cluster_master_host }}") and (db_resource_exists|succeeded) and (db_running|failed) skipping: [dbs03.prodea-int.net] [WARNING]: when statements should not include jinja2 templating delimiters such as {{ }} or {% %}. Found: (inventory_hostname == "{{ postgres_ha_cluster_master_host }}") and (db_resource_exists|succeeded) and (db_running|failed) skipping: [dbs04.prodea-int.net]

TASK [postgres-ha6 : setup DB replication auth] **** [WARNING]: when statements should not include jinja2 templating delimiters such as {{ }} or {% %}. Found: inventory_hostname == "{{ postgres_ha_cluster_master_host }}" [WARNING]: when statements should not include jinja2 templating delimiters such as {{ }} or {% %}. Found: inventory_hostname == "{{ postgres_ha_cluster_master_host }}" skipping: [dbs04.prodea-int.net] changed: [dbs03.prodea-int.net]

TASK [postgres-ha6 : check if DB sync is required] ***** ok: [dbs04.prodea-int.net] ok: [dbs03.prodea-int.net]

TASK [postgres-ha6 : stop slave DB] **** skipping: [dbs03.prodea-int.net] skipping: [dbs04.prodea-int.net]

TASK [postgres-ha6 : remove slave DB datadir before sync] ** skipping: [dbs03.prodea-int.net] changed: [dbs04.prodea-int.net]

TASK [postgres-ha6 : synchronize slave databases] ** skipping: [dbs03.prodea-int.net] changed: [dbs04.prodea-int.net]

TASK [postgres-ha6 : start slave DBs] ** [WARNING]: when statements should not include jinja2 templating delimiters such as {{ }} or {% %}. Found: (inventory_hostname != "{{ postgres_ha_cluster_master_host }}") and (db_resource_exists|failed) skipping: [dbs03.prodea-int.net] [WARNING]: when statements should not include jinja2 templating delimiters such as {{ }} or {% %}. Found: (inventory_hostname != "{{ postgres_ha_cluster_master_host }}") and (db_resource_exists|failed) changed: [dbs04.prodea-int.net]

TASK [postgres-ha6 : check if slaves are connected] **** [WARNING]: when statements should not include jinja2 templating delimiters such as {{ }} or {% %}. Found: inventory_hostname == "{{ postgres_ha_cluster_master_host }}" [WARNING]: when statements should not include jinja2 templating delimiters such as {{ }} or {% %}. Found: inventory_hostname == "{{ postgres_ha_cluster_master_host }}" skipping: [dbs04.prodea-int.net] changed: [dbs03.prodea-int.net]

TASK [postgres-ha6 : select proper PAF package (centos6)] ** skipping: [dbs03.prodea-int.net] skipping: [dbs04.prodea-int.net]

TASK [postgres-ha6 : select proper PAF package (centos6)] ** ok: [dbs03.prodea-int.net] ok: [dbs04.prodea-int.net]

TASK [postgres-ha6 : copy PAF rpm to hosts] **** changed: [dbs03.prodea-int.net] changed: [dbs04.prodea-int.net]

TASK [postgres-ha6 : install PAF DB failover agent] **** changed: [dbs04.prodea-int.net] changed: [dbs03.prodea-int.net]

TASK [postgres-ha6 : prepare DB recovery config] *** changed: [dbs03.prodea-int.net] changed: [dbs04.prodea-int.net]

TASK [postgres-ha6 : stop database for clustering] ***** changed: [dbs04.prodea-int.net] changed: [dbs03.prodea-int.net]

TASK [postgres-ha6 : create database cluster resource] ***** [WARNING]: when statements should not include jinja2 templating delimiters such as {{ }} or {% %}. Found: inventory_hostname == "{{ postgres_ha_cluster_master_host }}" [WARNING]: when statements should not include jinja2 templating delimiters such as {{ }} or {% %}. Found: inventory_hostname == "{{ postgres_ha_cluster_master_host }}" skipping: [dbs04.prodea-int.net] changed: [dbs03.prodea-int.net]

TASK [postgres-ha6 : create master DB resource] **** [WARNING]: when statements should not include jinja2 templating delimiters such as {{ }} or {% %}. Found: inventory_hostname == "{{ postgres_ha_cluster_master_host }}" [WARNING]: when statements should not include jinja2 templating delimiters such as {{ }} or {% %}. Found: inventory_hostname == "{{ postgres_ha_cluster_master_host }}" skipping: [dbs04.prodea-int.net] changed: [dbs03.prodea-int.net]

TASK [postgres-ha6 : test constraints presence] **** ok: [dbs04.prodea-int.net] ok: [dbs03.prodea-int.net]

TASK [postgres-ha6 : setting VIP location constraints] ***** [WARNING]: when statements should not include jinja2 templating delimiters such as {{ }} or {% %}. Found: inventory_hostname == "{{ postgres_ha_cluster_master_host }}" [WARNING]: when statements should not include jinja2 templating delimiters such as {{ }} or {% %}. Found: inventory_hostname == "{{ postgres_ha_cluster_master_host }}" skipping: [dbs04.prodea-int.net] changed: [dbs03.prodea-int.net]

TASK [postgres-ha6 : setting DB location constraints] ** [WARNING]: when statements should not include jinja2 templating delimiters such as {{ }} or {% %}. Found: inventory_hostname == "{{ postgres_ha_cluster_master_host }}" [WARNING]: when statements should not include jinja2 templating delimiters such as {{ }} or {% %}. Found: inventory_hostname == "{{ postgres_ha_cluster_master_host }}" skipping: [dbs04.prodea-int.net] changed: [dbs03.prodea-int.net]

TASK [postgres-ha6 : setting resources colocation group 1] ***** [WARNING]: when statements should not include jinja2 templating delimiters such as {{ }} or {% %}. Found: inventory_hostname == "{{ postgres_ha_cluster_master_host }}" [WARNING]: when statements should not include jinja2 templating delimiters such as {{ }} or {% %}. Found: inventory_hostname == "{{ postgres_ha_cluster_master_host }}" skipping: [dbs04.prodea-int.net] changed: [dbs03.prodea-int.net]

TASK [postgres-ha6 : setting resources start order] **** [WARNING]: when statements should not include jinja2 templating delimiters such as {{ }} or {% %}. Found: inventory_hostname == "{{ postgres_ha_cluster_master_host }}" [WARNING]: when statements should not include jinja2 templating delimiters such as {{ }} or {% %}. Found: inventory_hostname == "{{ postgres_ha_cluster_master_host }}" skipping: [dbs04.prodea-int.net] changed: [dbs03.prodea-int.net]

TASK [postgres-ha6 : setting resources stop order] ***** [WARNING]: when statements should not include jinja2 templating delimiters such as {{ }} or {% %}. Found: inventory_hostname == "{{ postgres_ha_cluster_master_host }}" [WARNING]: when statements should not include jinja2 templating delimiters such as {{ }} or {% %}. Found: inventory_hostname == "{{ postgres_ha_cluster_master_host }}" skipping: [dbs04.prodea-int.net] changed: [dbs03.prodea-int.net]

TASK [postgres-ha6 : marking constraints as processed] ***** changed: [dbs03.prodea-int.net] changed: [dbs04.prodea-int.net]

TASK [postgres-ha6 : enable database cluster resource] ***** [WARNING]: when statements should not include jinja2 templating delimiters such as {{ }} or {% %}. Found: inventory_hostname == "{{ postgres_ha_cluster_master_host }}" [WARNING]: when statements should not include jinja2 templating delimiters such as {{ }} or {% %}. Found: inventory_hostname == "{{ postgres_ha_cluster_master_host }}" skipping: [dbs04.prodea-int.net] changed: [dbs03.prodea-int.net]

TASK [postgres-ha6 : refresh database cluster resource] **** [WARNING]: when statements should not include jinja2 templating delimiters such as {{ }} or {% %}. Found: inventory_hostname == "{{ postgres_ha_cluster_master_host }}" [WARNING]: when statements should not include jinja2 templating delimiters such as {{ }} or {% %}. Found: inventory_hostname == "{{ postgres_ha_cluster_master_host }}" skipping: [dbs04.prodea-int.net] changed: [dbs03.prodea-int.net]

TASK [postgres-ha6 : check if all slaves are connected] **** [WARNING]: when statements should not include jinja2 templating delimiters such as {{ }} or {% %}. Found: inventory_hostname == "{{ postgres_ha_cluster_master_host }}" [WARNING]: when statements should not include jinja2 templating delimiters such as {{ }} or {% %}. Found: inventory_hostname == "{{ postgres_ha_cluster_master_host }}" skipping: [dbs04.prodea-int.net] FAILED - RETRYING: check if all slaves are connected (16 retries left). FAILED - RETRYING: check if all slaves are connected (15 retries left). FAILED - RETRYING: check if all slaves are connected (14 retries left). changed: [dbs03.prodea-int.net]

PLAY RECAP ***** dbs03.prodea-int.net : ok=54 changed=46 unreachable=0 failed=0
dbs04.prodea-int.net : ok=30 changed=24 unreachable=0 failed=0

[root@AnsibleServer ~]#

[root@localhost ~]# pcs status Cluster name: pgcluster Stack: cman Current DC: dbs04.prodea-int.net (version 1.1.15-5.el6-e174ec8) - partition with quorum Last updated: Mon Aug 28 23:50:05 2017 Last change: Mon Aug 28 23:39:18 2017 by root via crm_attribute on dbs03.prodea-int.net

2 nodes and 3 resources configured

Online: [ dbs03.prodea-int.net dbs04.prodea-int.net ]

Full list of resources:

pg-vip (ocf::heartbeat:IPaddr2): Started dbs03.prodea-int.net Master/Slave Set: postgres-ha [postgres] Slaves: [ dbs03.prodea-int.net dbs04.prodea-int.net ]

Daemon Status: cman: active/disabled corosync: active/disabled pacemaker: active/enabled pcsd: active/enabled [root@localhost ~]#

frank3427 commented 7 years ago

in my installation both units are in read only mode

1) service postgresql-9.6 stop # on master only 2) rm /var/lib/pgsql/9.6/data/recovery.conf # on master only 3) service postgresql-9.6 start # on master only

then the master is writeable.

I noticed that when I " service postgresql-9.6 stop" the vip stayed on the server and did not move to the other server. I know that it was in stave mode and it did not get promoted.

frank3427 commented 7 years ago

Looking at the PAF docs, and there examples

I do not see "op notify timeout=60s --master notify=true" i see "op notify timeout=60s "

YanChii commented 7 years ago

Hi @frank3427

so you ended up as I did - with not-promoted master. But the installation went thru. So you have the system in the same state as if you've installed it by hand (in other words: the role did exactly what it was told to do). Now you (we) have to find out why it doesn't get promoted. I really don't recommend you to use the service command after you've installed the cluster. When you stop the postgres without notifying the cluster itself (e.g. not using pcs resource disable or pcs constraint location commands), your cluster will most probably fall apart on the first problem (like two master instances or other juicy things).

I've modified the role and now the pcs cluster auth works also on centos 6. But still no luck with promotion.

Jan

frank3427 commented 7 years ago

Jan,

in my env, both servers have recovery.conf at the end of the playbook run. Are you also seeing the same thing in yours? I end up stopping psql on svr1 , remove recovery.conf, start psql

YanChii commented 7 years ago

Hi @frank3427

Yes, same issue also on my c6 server. And as I said, removing recovery.conf manually will break your cluster. The recovery.conf file is managed automatically by the cluster. If you want to do it like this, don't use cluster then.

PAF support for centos 6 seems to be a bit fragile because it uses old versions of cluster software. I strongly encourage you to use centos 7 for your production. Anyway, I'll continue to communicate with PAF guys to see if the issue can be resolved.

Jan

frank3427 commented 7 years ago

SUCCESS!! fresh install--- fresh vm [root@localhost ~]# pcs status Cluster name: pgcluster Stack: cman Current DC: dbs03.prodea-int.net (version 1.1.15-5.el6-e174ec8) - partition with quorum Last updated: Thu Aug 31 03:30:18 2017 Last change: Thu Aug 31 03:29:37 2017 by root via crm_attribute on dbs03.prodea-int.net

2 nodes and 3 resources configured

Online: [ dbs03.prodea-int.net dbs04.prodea-int.net ]

Full list of resources:

pg-vip (ocf::heartbeat:IPaddr2): Started dbs03.prodea-int.net Master/Slave Set: postgres-ha [postgres] Masters: [ dbs03.prodea-int.net ] Slaves: [ dbs04.prodea-int.net ]

Daemon Status: cman: active/disabled corosync: active/disabled pacemaker: active/enabled pcsd: active/enabled [root@localhost ~]#

here is what I changed. in the constraint.yml replaced;

with:

frank3427 commented 7 years ago

a small issue... still no files in /var/liv/pcsd/token [root@localhost ~]# ll /var/lib/pcsd/ total 12 -rwx------ 1 root root 60 Aug 31 03:26 pcsd.cookiesecret -rwx------ 1 root root 1224 Aug 31 03:26 pcsd.crt -rwx------ 1 root root 1675 Aug 31 03:26 pcsd.key

with out it pcs cluster stop --all does not work, nor the GUI after enabling gui support pcs cluster stop --all dbs03.prodea-int.net: Unable to authenticate to dbs03.prodea-int.net - (HTTP error: 401), try running 'pcs cluster auth' dbs04.prodea-int.net: Unable to authenticate to dbs04.prodea-int.net - (HTTP error: 401), try running 'pcs cluster auth' Error: unable to stop all nodes dbs03.prodea-int.net: Unable to authenticate to dbs03.prodea-int.net - (HTTP error: 401), try running 'pcs cluster auth' dbs04.prodea-int.net: Unable to authenticate to dbs04.prodea-int.net - (HTTP error: 401), try running 'pcs cluster auth' [root@localhost ~]# pcs status Cluster name: pgcluster Stack: cman Current DC: dbs03.prodea-int.net (version 1.1.15-5.el6-e174ec8) - partition with quorum Last updated: Thu Aug 31 03:44:34 2017 Last change: Thu Aug 31 03:29:37 2017 by root via crm_attribute on dbs03.prodea-int.net

2 nodes and 3 resources configured

Online: [ dbs03.prodea-int.net dbs04.prodea-int.net ]

Full list of resources:

pg-vip (ocf::heartbeat:IPaddr2): Started dbs03.prodea-int.net Master/Slave Set: postgres-ha [postgres] Masters: [ dbs03.prodea-int.net ] Slaves: [ dbs04.prodea-int.net ]

Daemon Status: cman: active/disabled corosync: active/disabled pacemaker: active/enabled pcsd: active/enabled

to correct it, I manually run; pcs cluster auth node1 node2 -u hacluster -p (password) then /var/lib/pcsd/token exists ll /var/lib/pcsd/ total 20 -rwx------ 1 root root 60 Aug 31 03:26 pcsd.cookiesecret -rwx------ 1 root root 1224 Aug 31 03:26 pcsd.crt -rwx------ 1 root root 1675 Aug 31 03:26 pcsd.key -rw-r--r-- 1 root root 437 Aug 31 03:45 pcs_users.conf -rw------- 1 root root 200 Aug 31 03:45 tokens [root@localhost ~]# pcs cluster stop --all dbs04.prodea-int.net: Stopping Cluster (pacemaker)... dbs03.prodea-int.net: Stopping Cluster (pacemaker)... dbs03.prodea-int.net: Stopping Cluster (cman)... dbs04.prodea-int.net: Stopping Cluster (cman)... [root@localhost ~]# pcs cluster start --all dbs03.prodea-int.net: Starting Cluster... dbs04.prodea-int.net: Starting Cluster... [root@localhost ~]#

YanChii commented 7 years ago

Hi @frank3427

That's great news! I've modified the role to integrate your changes. I've also changed the pcs cluster auth play order for centos6 (according to this issue). Please pull latest changes and after you confirm it's working, I'll merge them to the master.

Thank you for your help. Jan

ioguix commented 7 years ago

I've also changed the pcs cluster auth play order for centos6 (according to this issue).

Acording to what I see in the commit and previous comments here, I must disagree. The issue you are linking is from 2013. Moreover, this comment is wrong: https://github.com/YanChii/ansible-role-postgres-ha/commit/d0f178f3c12c41ace174578987dbedac1fafcfa9#diff-1e7e2a4817632e65eba954a468780920R35

Following the CentOS6 quick start, we can do a pcs cluster auth before the pcs cluster setup. See: http://dalibo.github.io/PAF/Quick_Start-CentOS-6.html#cluster-creation

I just experienced it before answering here:

> yum install -y pcs
> service pcsd start
> pcs cluster auth srv1 srv2 srv3 -u hacluster
> pcs cluster setup --name cluster_pgsql srv1 srv2 srv3
> pcs cluster start --all

Result:

> crm_mon -1n
Stack: cman
Current DC: srv2 (version 1.1.15-5.el6-e174ec8) - partition with quorum
Last updated: Thu Aug 31 14:07:33 2017      Last change: Thu Aug 31 14:07:31 2017 by root via crmd on srv2

3 nodes and 0 resources configured

Node srv1: online
Node srv2: online
Node srv3: online
YanChii commented 7 years ago

Hi @ioguix

You are right. I was experiencing some errors following this procedure but apparently they were caused by something else. Anyway, it should be fixed now.

I'm waiting for @frank3427 for test output. My tests executed without error.

Thanks.

Jan

frank3427 commented 7 years ago

[root@localhost ~]# pcs status Cluster name: pgcluster Stack: cman Current DC: dbs03.prodea-int.net (version 1.1.15-5.el6-e174ec8) - partition with quorum Last updated: Thu Aug 31 15:22:47 2017 Last change: Thu Aug 31 15:22:40 2017 by root via crmd on dbs03.prodea-int.net

2 nodes and 3 resources configured

Online: [ dbs03.prodea-int.net dbs04.prodea-int.net ]

Full list of resources:

pg-vip (ocf::heartbeat:IPaddr2): Started dbs03.prodea-int.net Master/Slave Set: postgres-ha [postgres] Masters: [ dbs03.prodea-int.net ] Slaves: [ dbs04.prodea-int.net ]

Daemon Status: cman: active/disabled corosync: active/disabled pacemaker: active/enabled pcsd: active/enabled [root@localhost ~]# pcs resource move --master postgres-ha dbs04.prodea-int.net [root@localhost ~]# pcs resource clear postgres-ha [root@localhost ~]# pcs status Cluster name: pgcluster Stack: cman Current DC: dbs03.prodea-int.net (version 1.1.15-5.el6-e174ec8) - partition with quorum Last updated: Thu Aug 31 15:27:49 2017 Last change: Thu Aug 31 15:27:45 2017 by root via crm_resource on dbs03.prodea-int.net

2 nodes and 3 resources configured

Online: [ dbs03.prodea-int.net dbs04.prodea-int.net ]

Full list of resources:

pg-vip (ocf::heartbeat:IPaddr2): Started dbs04.prodea-int.net Master/Slave Set: postgres-ha [postgres] Masters: [ dbs04.prodea-int.net ] Slaves: [ dbs03.prodea-int.net ]

Daemon Status: cman: active/disabled corosync: active/disabled pacemaker: active/enabled pcsd: active/enabled [root@localhost ~]#

ioguix commented 7 years ago

@frank3427 : hint: use fencing block (put "~~~" or "```" around your code blocks) for better code/output formatting. It will be much easier to read to others.

See: https://guides.github.com/features/mastering-markdown/#examples

thx

YanChii commented 7 years ago

Cool. So it looks like we've figured it out. I'll merge the changes. Thank you all for your help. Jan