Closed frank3427 closed 7 years ago
question,
why would this role not be able to edit or make changes on centos 6 system? I find that very odd. if i manual per form setup i get further along. it's just strange that it does not write, edit, update. create files on centos 6.
Hi @frank3427,
The default clustering stack in centos 6 (cman + rgmanager) is very diferrent from one in centos 7 (corosync + pacemaker). It is not a stopper but honestly I did't think someone would be using it on centos 6 by this time. Anyway, it seems that you have problems also with other parts of the role.
Which tasks exactly are failing? Please share also contents of your cluster.conf. Do other ansible roles/tasks (not from postgres-ha) have the simmilar problems?
If you really want to have this role working on centos 6, we can do it together. What I need so far:
Thanks for the report.
Jan
For the point 2: It seems that centos 6 guide is quite simmilar to c7: https://dalibo.github.io/PAF/Quick_Start-CentOS-6.html
yes, let' get this to work on Centos 6, currently I am stuck on the postgresql_sync.yml section of the role. from the init DB section down nothing is being written on the master host. both servers are fresh minimal installs, in Centos6 need to add libselinux-python to installation os I added a step in pre-task.yml to install it.
name: "disable firewall"
service: "name=iptables state=stopped enabled=no"
roles:
[root@AnsibleServer ~]# ansible-playbook --ask-pass dbs3-postgres-ha.yml SSH password:
PLAY [install PG HA] ***
TASK [Gathering Facts] ***** ok: [dbs03.prodea-int.net] ok: [dbs04.prodea-int.net]
TASK [disable firewall] **** ok: [dbs03.prodea-int.net] ok: [dbs04.prodea-int.net]
TASK [postgres-ha6 : debug] **** ok: [dbs03.prodea-int.net] => { "msg": "MASTER NODE SET TO dbs03.prodea-int.net" }
TASK [postgres-ha6 : verify postgres_ha_cluster_master_host] *** skipping: [dbs03.prodea-int.net] skipping: [dbs04.prodea-int.net]
TASK [postgres-ha6 : yum] ** ok: [dbs03.prodea-int.net] ok: [dbs04.prodea-int.net]
TASK [postgres-ha6 : debug] **** ok: [dbs03.prodea-int.net] => { "msg": "cluster_members=[u'dbs03.prodea-int.net', u'dbs04.prodea-int.net']" }
TASK [postgres-ha6 : Build hosts file] ***** changed: [dbs03.prodea-int.net] => (item=dbs03.prodea-int.net) changed: [dbs04.prodea-int.net] => (item=dbs03.prodea-int.net) changed: [dbs03.prodea-int.net] => (item=dbs04.prodea-int.net) changed: [dbs04.prodea-int.net] => (item=dbs04.prodea-int.net)
TASK [postgres-ha6 : install cluster pkgs] ***** ok: [dbs03.prodea-int.net] => (item=[u'pcs', u'pacemaker', u'cman', u'ccs']) ok: [dbs04.prodea-int.net] => (item=[u'pcs', u'pacemaker', u'cman', u'ccs'])
TASK [postgres-ha6 : service pcsd start] *** ok: [dbs03.prodea-int.net] ok: [dbs04.prodea-int.net]
TASK [postgres-ha6 : setup hacluster password] ***** ok: [dbs04.prodea-int.net] ok: [dbs03.prodea-int.net]
TASK [postgres-ha6 : setup cluster auth] *** changed: [dbs04.prodea-int.net] changed: [dbs03.prodea-int.net]
TASK [postgres-ha6 : create cluster] *** skipping: [dbs04.prodea-int.net] changed: [dbs03.prodea-int.net]
TASK [postgres-ha6 : join cluster nodes] *** skipping: [dbs04.prodea-int.net] => (item=dbs03.prodea-int.net) failed: [dbs03.prodea-int.net] (item=dbs04.prodea-int.net) => {"changed": true, "cmd": "/bin/sh -c \"if ! grep -q 'ring0_addr[:] dbs04.prodea-int.net[\t ]$' /etc/corosync/corosync.conf; then pcs cluster node add dbs04.prodea-int.net; fi\"", "delta": "0:00:01.731010", "end": "2017-08-28 03:28:23.058813", "failed": true, "item": "dbs04.prodea-int.net", "rc": 1, "start": "2017-08-28 03:28:21.327803", "stderr": "grep: /etc/corosync/corosync.conf: No such file or directory\nError: Unable to add 'dbs04.prodea-int.net' to cluster: node is already in a cluster", "stderr_lines": ["grep: /etc/corosync/corosync.conf: No such file or directory", "Error: Unable to add 'dbs04.prodea-int.net' to cluster: node is already in a cluster"], "stdout": "", "stdout_lines": []}
TASK [postgres-ha6 : start cluster] **** changed: [dbs04.prodea-int.net]
TASK [postgres-ha6 : alter stonith settings] *** ok: [dbs04.prodea-int.net]
TASK [postgres-ha6 : alter cluster policy settings] **** ok: [dbs04.prodea-int.net]
TASK [postgres-ha6 : alter cluster transition settings] **** ok: [dbs04.prodea-int.net]
TASK [postgres-ha6 : verify cluster configuration] ***** changed: [dbs04.prodea-int.net]
TASK [postgres-ha6 : enable cluster autostart] ***** changed: [dbs04.prodea-int.net]
TASK [postgres-ha6 : create virtual IP resource] *** skipping: [dbs04.prodea-int.net]
TASK [postgres-ha6 : import pg96 repo] ***** ok: [dbs04.prodea-int.net]
TASK [postgres-ha6 : install epel-release] ***** ok: [dbs04.prodea-int.net]
TASK [postgres-ha6 : install pg96] *** ok: [dbs04.prodea-int.net] ((((( from here down there are no changes or steps performed on the master defined host ( dbs03-prodea-int.net) TASK [postgres-ha6 : init DB dir on master if necessary] * skipping: [dbs04.prodea-int.net]
TASK [postgres-ha6 : check if DB was synchronized before] ** ok: [dbs04.prodea-int.net]
TASK [postgres-ha6 : alter clustering-related settings in postgresql.conf] ***** skipping: [dbs04.prodea-int.net] => (item={'key': u'hot_standby', 'value': u'on'}) skipping: [dbs04.prodea-int.net] => (item={'key': u'listen_addresses', 'value': u"'*'"}) skipping: [dbs04.prodea-int.net] => (item={'key': u'wal_level', 'value': u'hot_standby'}) skipping: [dbs04.prodea-int.net] => (item={'key': u'wal_log_hints', 'value': u'on'}) skipping: [dbs04.prodea-int.net] => (item={'key': u'max_wal_senders', 'value': u'2'}) skipping: [dbs04.prodea-int.net] => (item={'key': u'max_replication_slots', 'value': u'2'})
TASK [postgres-ha6 : alter DB ACL in pg_hba.conf] ** skipping: [dbs04.prodea-int.net] => (item=dbs04.prodea-int.net)
TASK [postgres-ha6 : alter DB replication ACL in pg_hba.conf] ** skipping: [dbs04.prodea-int.net] => (item=dbs04.prodea-int.net)
TASK [postgres-ha6 : setup DB cluster auth (master IP)] **** ok: [dbs04.prodea-int.net]
TASK [postgres-ha6 : setup .pgpass replication auth for master IP] ***** ok: [dbs04.prodea-int.net]
TASK [postgres-ha6 : setup .pgpass replication auth for other IPs] ***** ok: [dbs04.prodea-int.net] => (item=dbs04.prodea-int.net)
TASK [postgres-ha6 : check if master host "dbs03.prodea-int.net" is really a DB master] **** skipping: [dbs04.prodea-int.net]
TASK [postgres-ha6 : mark master DB] *** skipping: [dbs04.prodea-int.net]
TASK [postgres-ha6 : check if DB is running (failure is ok)] *** fatal: [dbs04.prodea-int.net]: FAILED! => {"changed": true, "cmd": "/usr/pgsql-9.6/bin/pg_ctl -D /var/lib/pgsql/9.6/data status", "delta": "0:00:00.025437", "end": "2017-08-28 03:28:40.713378", "failed": true, "rc": 4, "start": "2017-08-28 03:28:40.687941", "stderr": "pg_ctl: directory \"/var/lib/pgsql/9.6/data\" is not a database cluster directory", "stderr_lines": ["pg_ctl: directory \"/var/lib/pgsql/9.6/data\" is not a database cluster directory"], "stdout": "", "stdout_lines": []} ...ignoring
TASK [postgres-ha6 : check if DB is running in cluster (failure is OK)] **** fatal: [dbs04.prodea-int.net]: FAILED! => {"changed": true, "cmd": "pcs constraint location show resources \"postgres-ha\" | grep -q Enabled", "delta": "0:00:00.351822", "end": "2017-08-28 03:28:41.872381", "failed": true, "rc": 1, "start": "2017-08-28 03:28:41.520559", "stderr": "", "stderr_lines": [], "stdout": "", "stdout_lines": []} ...ignoring
TASK [postgres-ha6 : start master DB if necessary (without cluster)] *** skipping: [dbs04.prodea-int.net]
TASK [postgres-ha6 : start master DB if necessary (in cluster)] **** skipping: [dbs04.prodea-int.net]
TASK [postgres-ha6 : setup DB replication auth] **** skipping: [dbs04.prodea-int.net]
TASK [postgres-ha6 : check if DB sync is required] ***** ok: [dbs04.prodea-int.net]
TASK [postgres-ha6 : stop slave DB] **** skipping: [dbs04.prodea-int.net]
TASK [postgres-ha6 : remove slave DB datadir before sync] ** changed: [dbs04.prodea-int.net]
TASK [postgres-ha6 : synchronize slave databases] ** fatal: [dbs04.prodea-int.net]: FAILED! => {"changed": true, "cmd": "/usr/pgsql-9.6/bin/pg_basebackup -h \"172.24.2.187\" -p 5432 -R -D \"/var/lib/pgsql/9.6/data\" -U \"replicator\" -v -P --xlog-method=stream", "delta": "0:00:00.029059", "end": "2017-08-28 03:28:44.872909", "failed": true, "rc": 1, "start": "2017-08-28 03:28:44.843850", "stderr": "pg_basebackup: could not connect to server: could not connect to server: Connection refused\n\tIs the server running on host \"172.24.2.187\" and accepting\n\tTCP/IP connections on port 5432?", "stderr_lines": ["pg_basebackup: could not connect to server: could not connect to server: Connection refused", "\tIs the server running on host \"172.24.2.187\" and accepting", "\tTCP/IP connections on port 5432?"], "stdout": "", "stdout_lines": []} to retry, use: --limit @/root/dbs3-postgres-ha.retry
PLAY RECAP *****
dbs03.prodea-int.net : ok=11 changed=3 unreachable=0 failed=1
dbs04.prodea-int.net : ok=25 changed=8 unreachable=0 failed=1
one of the issues is that I do not see a failure for the master node, when looking at the activity logs on the server I am not see hits on the server for the tasks, nor am I seeing any changes to files
I am installing using user root if that make a difference
Hi @frank3427 there are two diferrent errors.
any_errors_fatal: true
is set. I've seen this before and it is probably a bug in ansible itself.The diferrences between pcs/corosync/pacemaker
stacks in centos 6 and 7 are subtle.. but with strong consequences. I've modified the role and now it runs smoothly even on centos 6. Please try it from centos6
branch and let me know if it runs also for you.
But there's still one very important issue - the postgres master is not properly promoted and stays in slave position. PAF developers maintain a separate version for older corosync stack but even with this version installed, it is not working out of the box for me.
@frank3427 can you please help me with debugging that? I've seen several issues about this on PAF github, maybe they will give you hints what to do. Thanks. Jan
I rebuilt the vms so we could have a fresh look, I will rebuild as need to confirm changes are working from fresh starts. setting the hacluster password did not work on centos 6 server
[root@localhost ~]# pcs cluster auth dbs03.prodea-int.net dbs04.prodea-int.net -u hacluster Password: Error: dbs03.prodea-int.net: Username and/or password is incorrect Error: dbs04.prodea-int.net: Username and/or password is incorrect [root@localhost ~]#
type=USER_AUTH msg=audit(1503956924.156:576): user pid=8599 uid=0 auid=0 ses=3 msg='op=PAM:authentication acct="hacluster" exe="/usr/bin/ruby" hostname=? addr=? terminal=? res=failed'
TASK [postgres-ha6 : setup hacluster password] ***** changed: [dbs04.prodea-int.net] changed: [dbs03.prodea-int.net]
TASK [postgres-ha6 : setup cluster auth] *** fatal: [dbs04.prodea-int.net]: FAILED! => {"changed": true, "cmd": "pcs cluster auth dbs03.prodea-int.net dbs04.prodea-int.net -u hacluster -p \"Pr0d3aOps\"", "delta": "0:00:06.191536", "end": "2017-08-28 17:13:38.570556", "failed": true, "rc": 1, "start": "2017-08-28 17:13:32.379020", "stderr": "Error: dbs03.prodea-int.net: Username and/or password is incorrect\nError: dbs04.prodea-int.net: Username and/or password is incorrect", "stderr_lines": ["Error: dbs03.prodea-int.net: Username and/or password is incorrect", "Error: dbs04.prodea-int.net: Username and/or password is incorrect"], "stdout": "", "stdout_lines": []} fatal: [dbs03.prodea-int.net]: FAILED! => {"changed": true, "cmd": "pcs cluster auth dbs03.prodea-int.net dbs04.prodea-int.net -u hacluster -p \"Pr0d3aOps\"", "delta": "0:00:06.202676", "end": "2017-08-28 17:13:38.584373", "failed": true, "rc": 1, "start": "2017-08-28 17:13:32.381697", "stderr": "Error: dbs03.prodea-int.net: Username and/or password is incorrect\nError: dbs04.prodea-int.net: Username and/or password is incorrect", "stderr_lines": ["Error: dbs03.prodea-int.net: Username and/or password is incorrect", "Error: dbs04.prodea-int.net: Username and/or password is incorrect"], "stdout": "", "stdout_lines": []} to retry, use: --limit @/root/dbs3-postgres-ha.retry
PLAY RECAP *****
dbs03.prodea-int.net : ok=9 changed=5 unreachable=0 failed=1
dbs04.prodea-int.net : ok=7 changed=5 unreachable=0 failed=1
[root@AnsibleServer ~]#
so to get passed this I manually reset the password on each server and then [root@localhost ~]# pcs cluster auth dbs03.prodea-int.net dbs04.prodea-int.net -u hacluster Password: dbs03.prodea-int.net: Authorized dbs04.prodea-int.net: Authorized
so I think the method for generating the hash is not working, I used the following openssl passwd -1 -salt xyz (password) and that gets me passed this issue on to the next one.
we are now back to postgres initdb task.
trying
name: init DB dir on master if necessary command: /etc/init.d/postgresql-{{ postgres_ha_pg_version}} initdb /var/lib/pgsql/{{ postgres_ha_pg_version }}/data dbs3-postgres-ha6.txt
become: true become_user: postgres args: creates: "{{ postgres_ha_pg_data }}/PG_VERSION" when: inventory_hostname == postgres_ha_cluster_master_host # run only on one node
tail -f /var/log/messages on the (master host = dbs03) shows no activity after
Aug 28 22:58:02 localhost ansible-command: Invoked with warn=True executable=None _uses_shell=True _raw_params=pcs cluster auth dbs03.prodea-int.net dbs04.prodea-int.net -u hacluster -p "Pr0d3aOps" removes=None creates=None chdir=None Aug 28 22:58:04 localhost ansible-command: Invoked with creates=/etc/corosync/corosync.conf executable=None _uses_shell=True _raw_params=pcs cluster --force setup --name pgcluster "dbs03.prodea-int.net" removes=None warn=True chdir=None Aug 28 22:58:13 localhost ansible-command: Invoked with warn=True executable=None _uses_shell=True _raw_params=/bin/sh -c "if ! grep -q 'ring0_addr[:] dbs04.prodea-int.net[\t ]$' /etc/corosync/corosync.conf; then pcs cluster node add dbs04.prodea-int.net; fi" removes=None creates=None chdir=None
All outputs look exactly the same as before. You are therefore running the old version of the role. Are you sure you have switched the git branch to centos6
? The "setup cluster auth" should not be run anymore on centos 6.
i know you said that should run on centos 7 and I am trying to get it to run on centos 6, none of the tasks that create or modify file are not working, also it seems that any task that includes postgres_ha_cluster_master_host variable does not work either.
any help would be greatly appreciated
frank