ceph / ceph-ansible

Ansible playbooks to deploy Ceph, the distributed filesystem.
Apache License 2.0
1.69k stars 1.01k forks source link

Failure: 'ansible_hostname' is undefined; should gather_facts default to true? #1876

Closed fultonj closed 5 years ago

fultonj commented 7 years ago

It seems a recent change [1] to site-docker.yaml caused a new deployment from the latest master branch with to fail with error 'ansible_hostname' is undefined [2]. It's easy to work around the issue by setting gather_facts to true. Why was it defaulted to false?

Footnotes:

[1] https://github.com/ceph/ceph-ansible/commit/b7db600caa639728cec298ba640d7246a584a95b#diff-6eb4d19d9e1991145da14236d83c5ba8

[2]

2017-09-08 19:32:25,924 p=19793 u=mistral |  PLAY [mons,agents,osds,mdss,rgws,nfss,restapis,rbdmirrors,clients,iscsigws,mgrs] ***
2017-09-08 19:32:25,996 p=19793 u=mistral |  TASK [gather and delegate facts] ***********************************************
2017-09-08 19:32:26,054 p=19793 u=mistral |  PLAY [mons] ********************************************************************
2017-09-08 19:32:26,113 p=19793 u=mistral |  TASK [ceph-defaults : set_fact] ************************************************
2017-09-08 19:32:26,132 p=19793 u=mistral |  fatal: [192.168.24.17]: FAILED! => {"failed": true, "msg": "the field 'args' has an invalid value, which appears to include a variable that is undefined. The error was: 'ansible_hostname' is undefined\n\nThe error appears to have been in '/usr/share/ceph-ansible/roles/ceph-defaults/tasks/facts.yml': line 2, column 3, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n---\n- set_fact:\n  ^ here\n"}
2017-09-08 19:32:26,133 p=19793 u=mistral |  PLAY RECAP *********************************************************************
2017-09-08 19:32:26,133 p=19793 u=mistral |  192.168.24.13              : ok=0    changed=0    unreachable=0    failed=0   
2017-09-08 19:32:26,133 p=19793 u=mistral |  192.168.24.14              : ok=0    changed=0    unreachable=0    failed=0   
2017-09-08 19:32:26,134 p=19793 u=mistral |  192.168.24.15              : ok=0    changed=0    unreachable=0    failed=0   
2017-09-08 19:32:26,134 p=19793 u=mistral |  192.168.24.17              : ok=0    changed=0    unreachable=0    failed=1   
2017-09-08 19:32:26,134 p=19793 u=mistral |  192.168.24.18              : ok=0    changed=0    unreachable=0    failed=0   
2017-09-08 19:32:26,134 p=19793 u=mistral |  192.168.24.19              : ok=0    changed=0    unreachable=0    failed=0   
leseb commented 7 years ago

Looks like you are running Ansible < 2.3. You must use Ansible >= 2.3.x. If so, let's close this :)

Thanks!

fultonj commented 7 years ago

ansible-2.3.0.0-3.el7.noarch is what I am using.

fultonj commented 7 years ago

Aside from this version being >= 2.3.x, I also wanted to add that this is what seems to be shipped in Pike by Delorean.

(undercloud) [stack@undercloud tripleo-ceph-ansible]$ sudo yum whatprovides */ansible 
...
ansible-2.3.0.0-3.el7.noarch : SSH-based configuration management, deployment, and task
                             : execution system
Repo        : delorean-pike-testing
Matched from:
Filename    : /etc/ansible
Filename    : /usr/bin/ansible
Filename    : /usr/lib/python2.7/site-packages/ansible
...
leseb commented 7 years ago

Hum actually, we require 2.3.1 at least. Last time I checked RHEL 7.4 provides the right version.

fultonj commented 7 years ago

I agree we should be fine for RHEL7.4 [1], but this needs to work with TripleO CI. I hit this issue using a new TripleO quickstart CentOS 7.3 [2] which is the same tool that runs TripleO CI.

I'm worried a new version of ceph-ansible which now requires this version of Ansible won't work with TripleO. We could pursue requiring a newer version of Ansible in Queens but I think it's too late for Pike.

[1]

(undercloud) [stack@hci-director ~]$ rpm -qa | grep ansible 
ceph-ansible-3.0.0-0.1.rc3.el7cp.noarch
ansible-2.3.1.0-3.el7.noarch
(undercloud) [stack@hci-director ~]$ cat /etc/redhat-release 
Red Hat Enterprise Linux Server release 7.4 (Maipo)
(undercloud) [stack@hci-director ~]$ 

[2]

[stack@undercloud ~]$ cat /etc/redhat-release 
CentOS Linux release 7.3.1611 (Core) 
[stack@undercloud ~]$ rpm -qa | grep ansible
python-heat-agent-ansible-1.4.0-1.el7.noarch
ansible-2.3.0.0-3.el7.noarch
ansible-pacemaker-1.0.3-0.20170907130218.1279294.el7.centos.noarch
[stack@undercloud ~]$ uptime
 20:03:17 up 5 min,  1 user,  load average: 1.27, 0.72, 0.31
[stack@undercloud ~]$ 
fultonj commented 7 years ago

Apologies, I thought it might not be possible to upgrade the Ansible version in TripleO at this stage but that seems to be an option for this. It is being pursued in https://review.rdoproject.org/r/#/c/9421/1.

fultonj commented 7 years ago

Now that the undercloud will have ansible 2.3.2 I expect this issue will go away.

ktdreyer commented 7 years ago

David Simard pointed me at this issue. @fultonj @leseb do we need to change the ceph-ansible.spec file to require ansible 2.3.2?

ktdreyer commented 7 years ago

Noting for my own understanding: CentOS 7 Extras currently has Ansible 2.3.1. In the meantime Ansible 2.3.2 is rebuilt just for the CentOS SIGs and this is what RDO is using. Soon this version will be in CentOS 7 Extras itself and the SIGs will not need to carry this special build.

wby1089 commented 7 years ago

I faced the same issue. The problem was a link made by openstak-ansible

root@ansible:/opt/ceph-ansible# which ansible-playbook
/usr/local/bin/ansible-playbook
root@ansible:/opt/ceph-ansible# ls -l /usr/local/bin/ansible-playbook
lrwxrwxrwx 1 root root 32 Oct 22 18:00 /usr/local/bin/ansible-playbook -> /usr/local/bin/openstack-ansible
root@ansible:/opt/ceph-ansible# dpkg -L ansible | grep ansible-playbook$
/usr/bin/ansible-playbook

root@ansible:/opt/ceph-ansible# ansible-playbook --version
ansible-playbook 2.2.3.0
  config file = /opt/ceph-ansible/ansible.cfg
  configured module search path = Default w/o overrides
root@ansible:/opt/ceph-ansible# /usr/bin/ansible-playbook --version
ansible-playbook 2.4.0.0
  config file = /opt/ceph-ansible/ansible.cfg
  configured module search path = [u'/root/.ansible/plugins/modules', u'/usr/share/ansible/plugins/modules']
  ansible python module location = /usr/lib/python2.7/dist-packages/ansible
  executable location = /usr/bin/ansible-playbook
  python version = 2.7.12 (default, Nov 19 2016, 06:48:10) [GCC 5.4.0 20160609]

Try to use absolute path like this:

root@ansible:/opt/ceph-ansible# /usr/bin/ansible-playbook site.yml

guits commented 5 years ago

Closed due to inactivity, feel free to re-open if needed. Thanks!