Open jplitza opened 2 years ago
I created a small ansible module that, when run, manually sets a given host to be unreachable.
The module can be found here: https://gist.github.com/lucaelin/0128de00cbc545045742caeeec0805fc
You can put this file in a folder called library next to the playbook that you want to use it in (or globally, see the ansible documentation on how to do that).
You can then combine this module with something like wait_for_connection
like so:
- wait_for_connection:
connect_timeout: 5
timeout: 10
ignore_errors: yes
register: wait_for_connection
- set_unreachable:
when: wait_for_connection.failed
This is ofc a workaround until a proper fix is available, although i think that this module might be worth to be included in ansible by default.
We're also experiencing this within our implementation of the network_cli code here: https://github.com/aruba/aos-switch-ansible-collection/issues/31
Seeing this as well. All the switches are offline but still report as successful.
Another interesting thing that shows up is that using async masks the connection fails even more.
arubaoss_command:
commands: ["conf", "interface {{ item }}", "disable"]
loop: "{{ list_of_ports if list_of_ports is iterable and list_of_ports is not string else [] }}"
The above snippet returns the connection error in stdout per item
ssh connection failed: ssh connect failed: No route to host
Whereas the following just completes fine.
arubaoss_command:
commands: ["conf", "interface {{ item }}", "disable"]
async: 45
poll: 0
loop: "{{ list_of_ports if list_of_ports is iterable and list_of_ports is not string else [] }}"
This issue is probably affecting all network connection plugins. Tested ansible.netcommon.network_cli
and ansible.netcommon.netconf
myself. Same behavior...if machine is unreachable, ansible/connection plugin will ignore gather_timeout
setting and wait until command timeout
is triggered.
TASK [Gathering Facts] ********************************************************************************************************************************************************************************************
fatal: [switch1.example.com]: FAILED! => {"ansible_facts": {}, "changed": false, "failed_modules": {"junipernetworks.junos.junos_facts": {"failed": true, "invocation": {"module_args": {"available_network_resources": false, "config_format": "text", "gather_network_resources": null, "gather_subset": ["min"]}}, "msg": "command timeout triggered, timeout value is 30 secs.\nSee the timeout setting options in the Network Debug and Troubleshooting Guide."}}, "msg": "The following modules failed to execute: junipernetworks.junos.junos_facts\n"}
fatal: [switch2.example.com]: FAILED! => {"ansible_facts": {}, "changed": false, "failed_modules": {"cisco.ios.ios_facts": {"failed": true, "invocation": {"module_args": {"available_network_resources": false, "gather_network_resources": null, "gather_subset": ["min"]}}, "msg": "ssh connection failed: ssh connect failed: Timeout connecting to switch2.example.com"}}, "msg": "The following modules failed to execute: cisco.ios.ios_facts\n"}
This is a major issue for us, since we have inventory with few thousand hosts and command timeout
has a pretty high value (default 30s, in our case 60). We are also parsing ansible result/stats and atm cannot differentiate between unreachable & failed. Since ticket is open since 2021, i'm wondering if there is any intention to fix this, as it is breaking ansible's basic/core concept?
Version dump:
ansible [core 2.15.2]
python version = 3.9.2
ansible.netcommon 5.1.2
cisco.ios 4.6.1
junipernetworks.junos 5.2.0
SUMMARY
I hope this is the right collection to report this to.
Hosts that are using
ansible_connection=network_cli
are never reported as unreachable. Instead, the task fails with a fatal error.This is a problem, because the error handling treats the two differently. One example would be
ignore_errors: yes
, which continues after a fatal error, but doesn't do anything for unreachable hosts.I guess this has to do with network_cli internally using a local connection. Nevertheless, it's a problem!
ISSUE TYPE
COMPONENT NAME
network_cli
ANSIBLE VERSION
COLLECTION VERSION
CONFIGURATION
OS / ENVIRONMENT
Ubuntu 20.04, but that shouldn't matter.
STEPS TO REPRODUCE
EXPECTED RESULTS
The same as for, say,
ansible_connection=ssh
ACTUAL RESULTS