ansible / ansible-modules-core

Ansible modules - these modules ship with ansible
1.3k stars 1.95k forks source link

wait_for suck in wait until timeout, it cannot detect host back after reboot #861

Closed nobody93 closed 7 years ago

nobody93 commented 9 years ago

The following statment stuck until the timeout, it cannot detect host back after rebooting. The version is 1.9 from git source.

- name: wait_restart
  local_action: wait_for host="{{ inventory_hostname }}" port=22 delay=5 timeout=600
  sudo: false
bcoca commented 9 years ago

does inventory_hostname resolve to an IP ?

nobody93 commented 9 years ago

Yes, there is no fundmantal issues as I've run the ansible for months using many modules in many roles without problems, but wat_for module issue is currently sting ing us badly.

Jmainguy commented 9 years ago

Works for me on devel 2.0

ansible-playbook -i hosts site.yml 

PLAY [Test all-databases] ***************************************************** 

TASK: [reboot | Reboot] ******************************************************* 
changed: [ubuntu1404.soh.re]

TASK: [reboot | wait_restart] ************************************************* 
ok: [ubuntu1404.soh.re -> 127.0.0.1]

PLAY RECAP ******************************************************************** 
ubuntu1404.soh.re          : ok=2    changed=1    unreachable=0    failed=0   

Give devel a shot @jupiterh

bcoca commented 9 years ago

i cannot reproduce this issue, do you have any more information?

mgedmin commented 8 years ago

FWIW wait_for: host=... timeout=300 is equivalent to time.sleep(300) if you don't provide a port (or path, or use state=drained): https://github.com/ansible/ansible-modules-core/blob/devel/utilities/logic/wait_for.py#L399-L400

ansibot commented 8 years ago

@gregswift, ping. This issue is still waiting on your response. click here for bot help

gregswift commented 8 years ago

needs_info

mrummuka commented 8 years ago

I believe I might just have encountered the same issue. I've been experimenting with ansible 2.1.1.0 installed to Fedora 24 (ansible control host), with "ubuntu/trusty64" as current Vagrant base image. To me it seems as if Ansible is always waiting_for the localhost to come up for (delay+timeout) time, despite the other tasks use {{ inventory_hostname }} appropriately.

vagrant provision
==> myserver: Running provisioner: ansible...
   myserver: Running ansible-playbook...
PYTHONUNBUFFERED=1 ANSIBLE_FORCE_COLOR=true ANSIBLE_HOST_KEY_CHECKING=false ANSIBLE_SSH_ARGS='-o UserKnownHostsFile=/dev/null -o IdentitiesOnly=yes -i '/path/to/vagrant/private_key' -o ControlMaster=auto -o ControlPersist=60s' ansible-playbook --connection=ssh --timeout=30 --extra-vars=ansible_ssh_user='vagrant' --limit='myserver' --inventory-file=/etc/ansible/hosts -v site/myservers.yml
Using /etc/ansible/ansible.cfg as config file
(..)
TASK [myservers : restart machine] *******************************************
ok: [myserver] => {"ansible_job_id": "697120816534.2727", "changed": false, "finished": 0, "results_file": "/my/path/replaced", "started": 1}

TASK [myservers : debug variable values] *************************************
ok: [myserver] => {
    "msg": "System myserver has uuid NA"
}

TASK [myservers : waiting for server to come back] ***************************
ok: [myserver -> **localhost**] => {"changed": false, "elapsed": **330**, "path": null, "port": null, "search_regex": null, "state": "started"}

PLAY RECAP *********************************************************************
myserver                   : ok=3    changed=0    unreachable=0    failed=0  

Here is the the restart task grabbed from somewhere in the 'net with debugging information printed (myservers.yml)

- name: restart machine
  shell: sleep 2 && shutdown -r now "Ansible updates triggered"
  async: 1
  poll: 0
  become: true
  ignore_errors: true

- name: debug variable values
  debug: msg="System {{ inventory_hostname }} has uuid {{ ansible_product_uuid }}"

- name: waiting for server to come back
  local_action: wait_for host={{ inventory_hostname }} state=started delay=**30** timeout=**300**
  sudo: false

Additionally:

(Vagrantfile)

(..)
  config.vm.network "private_network", ip: "10.0.0.123"
  config.vm.hostname = "myserver"
  config.vm.define "myserver"
(..)
  ansible.inventory_path = "/etc/ansible/hosts"

(/etc/ansible/hosts)

myserver ansible_ssh_host=10.0.0.123 ansible_ssh_port=22 ansible_ssh_user='vagrant' ansible_ssh_private_key_file='/path/to/vagrant/private_key' ansible_ssh_common_args='-o StrictHostKeyChecking=no'

[myservers]
myserver

(after vagrant up)

$ ansible myserver -m ping
myserver | SUCCESS => {
    "changed": false, 
    "ping": "pong"
}
gregswift commented 8 years ago

@mrummuka Your wait_for doesnt specify port or path, and so the module actually just does a sleep for defined timeout period. I haven't looked into what causes this, but I believe that without a specific delegate wait_for will delegate to localhost for a remote check because the target host would be down during the waiting.

mrummuka commented 8 years ago

OK, I did some additional experimentation and here are my observations:

  1. @gregswift By adding port=22 the task wait_for still reports to be connecting to localhost, but now instead of waiting for delay+timeout it simply fails in few seconds after delay. fatal: [myserver -> localhost]: FAILED! => {"changed": false, "elapsed": 31, "failed": true, "msg": "Timeout when waiting for myserver:22"}
  2. Althought I had ansible myserver -m ping working, the "normal" hostname to ip resolution was not setup (i.e. $ping myserver did not work) as I had not setup /etc/hosts manually (I perhaps incorrectly thought that 'ansible/hosts' file was supposed to be there to override/remove the need for manually specifying hostname=>ip mapping on host operating system level) .

Therefore, after adding myserver 10.0.0.123 to /etc/hosts the wait_for command (with port=22 added) started working as expected (although IMO still reporting what is connecting and where incorrectly?).

TASK [myservers : waiting for server to come back] ***************************
ok: [myserver -> localhost] => {"changed": false, "elapsed": 25, "path": null, "port": 22, "search_regex": null, "state": "started"}

So, could it be that while ansible in general follows the hostname to ip mapping defined in ansible/hosts, the local_action wait_for ignores it and relies on local operating system instead? And if this is then the case, is this occurrence then a bug, or (perhaps just an undocumented) feature?

gregswift commented 8 years ago

@mrummuka I completely missed that you were defining it as a local action. that would be why its delegating locally

bcoca commented 8 years ago

@mrummuka, @jupiterh modules only know the info you give them, they cannot do 'inventory resolution' themselves.

Whatever you pass to hostname will be what the module tries to connect to, if inventory_hostname is not resolvable try using ansible_host.

ansibot commented 8 years ago

@gregswift, ping. This issue is still waiting on your response. click here for bot help

ansibot commented 8 years ago

@jupiterh ping, this issue is still waiting on your feedback. We will close the issue if you do not respond. click here for bot help

ansibot commented 8 years ago

@jupiterh ping, this issue is still waiting on your feedback. We will close the issue if you do not respond. click here for bot help

ansibot commented 7 years ago

@jupiterh ping, this issue is still waiting on your feedback. We will close the issue if you do not respond. click here for bot help

ansibot commented 7 years ago

This repository has been locked. All new issues and pull requests should be filed in https://github.com/ansible/ansible

Please read through the repomerge page in the dev guide. The guide contains links to tools which automatically move your issue or pull request to the ansible/ansible repo.

ansibot commented 7 years ago

This issue was migrated to https://github.com/ansible/ansible/issues/30171