ansible-collections / ansible.netcommon

Ansible Network Collection for Common Code
GNU General Public License v3.0
144 stars 104 forks source link

Hosts using network_cli are never reported unreachable #340

Open jplitza opened 2 years ago

jplitza commented 2 years ago
SUMMARY

I hope this is the right collection to report this to.

Hosts that are using ansible_connection=network_cli are never reported as unreachable. Instead, the task fails with a fatal error.

This is a problem, because the error handling treats the two differently. One example would be ignore_errors: yes, which continues after a fatal error, but doesn't do anything for unreachable hosts.

I guess this has to do with network_cli internally using a local connection. Nevertheless, it's a problem!

ISSUE TYPE
COMPONENT NAME

network_cli

ANSIBLE VERSION
ansible [core 2.11.3]
  config file = ~/ansible/ansible.cfg
  configured module search path = ['~/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
  ansible python module location = ~/.local/lib/python3.8/site-packages/ansible
  ansible collection location = ~/.ansible/collections:/usr/share/ansible/collections
  executable location = ~/.local/bin/ansible
  python version = 3.8.10 (default, Sep 28 2021, 16:10:42) [GCC 9.3.0]
  jinja version = 2.10.1
  libyaml = True
COLLECTION VERSION
# ~/.ansible/collections/ansible_collections
Collection         Version 
------------------ --------
community.routeros 2.0.0-a1

# ~/.local/lib/python3.8/site-packages/ansible_collections
Collection         Version
------------------ -------
community.routeros 1.2.0  
CONFIGURATION
OS / ENVIRONMENT

Ubuntu 20.04, but that shouldn't matter.

STEPS TO REPRODUCE
ansible -i 2001:db8::, -m cli_command -a 'command="foo"' -e ansible_connection=network_cli -e ansible_network_os=ios all
EXPECTED RESULTS

The same as for, say, ansible_connection=ssh

$ ansible -i 2001:db8::, -m command -a 'foo' -e ansible_connection=ssh all
2001:db8:: | UNREACHABLE! => {
    "changed": false,
    "msg": "Failed to connect to the host via ssh: ssh: connect to host 2001:db8:: port 22: Network is unreachable",
    "unreachable": true
}
ACTUAL RESULTS
2001:db8:: | FAILED! => {
    "ansible_facts": {
        "discovered_interpreter_python": "/usr/bin/python"
    },
    "changed": false,
    "msg": "[Errno 101] Network is unreachable"
}
lucaelin commented 2 years ago

I created a small ansible module that, when run, manually sets a given host to be unreachable. The module can be found here: https://gist.github.com/lucaelin/0128de00cbc545045742caeeec0805fc You can put this file in a folder called library next to the playbook that you want to use it in (or globally, see the ansible documentation on how to do that). You can then combine this module with something like wait_for_connection like so:

- wait_for_connection:
    connect_timeout: 5
    timeout: 10
  ignore_errors: yes
  register: wait_for_connection
- set_unreachable:
  when: wait_for_connection.failed

This is ofc a workaround until a proper fix is available, although i think that this module might be worth to be included in ansible by default.

tchiapuziowong commented 2 years ago

We're also experiencing this within our implementation of the network_cli code here: https://github.com/aruba/aos-switch-ansible-collection/issues/31

gm0neyl0ve commented 2 years ago

Seeing this as well. All the switches are offline but still report as successful.

Another interesting thing that shows up is that using async masks the connection fails even more.

       arubaoss_command:
        commands: ["conf", "interface {{ item }}", "disable"]
       loop: "{{ list_of_ports if list_of_ports is iterable and list_of_ports is not string else [] }}"

The above snippet returns the connection error in stdout per item ssh connection failed: ssh connect failed: No route to host

Whereas the following just completes fine.


      arubaoss_command:
        commands: ["conf", "interface {{ item }}", "disable"]
        async: 45
        poll: 0
      loop: "{{ list_of_ports if list_of_ports is iterable and list_of_ports is not string else [] }}"  
eleksis commented 1 year ago

This issue is probably affecting all network connection plugins. Tested ansible.netcommon.network_cli and ansible.netcommon.netconf myself. Same behavior...if machine is unreachable, ansible/connection plugin will ignore gather_timeout setting and wait until command timeout is triggered.

TASK [Gathering Facts] ********************************************************************************************************************************************************************************************
fatal: [switch1.example.com]: FAILED! => {"ansible_facts": {}, "changed": false, "failed_modules": {"junipernetworks.junos.junos_facts": {"failed": true, "invocation": {"module_args": {"available_network_resources": false, "config_format": "text", "gather_network_resources": null, "gather_subset": ["min"]}}, "msg": "command timeout triggered, timeout value is 30 secs.\nSee the timeout setting options in the Network Debug and Troubleshooting Guide."}}, "msg": "The following modules failed to execute: junipernetworks.junos.junos_facts\n"}
fatal: [switch2.example.com]: FAILED! => {"ansible_facts": {}, "changed": false, "failed_modules": {"cisco.ios.ios_facts": {"failed": true, "invocation": {"module_args": {"available_network_resources": false, "gather_network_resources": null, "gather_subset": ["min"]}}, "msg": "ssh connection failed: ssh connect failed: Timeout connecting to switch2.example.com"}}, "msg": "The following modules failed to execute: cisco.ios.ios_facts\n"}

This is a major issue for us, since we have inventory with few thousand hosts and command timeout has a pretty high value (default 30s, in our case 60). We are also parsing ansible result/stats and atm cannot differentiate between unreachable & failed. Since ticket is open since 2021, i'm wondering if there is any intention to fix this, as it is breaking ansible's basic/core concept?

Version dump:

ansible [core 2.15.2]
python version = 3.9.2
ansible.netcommon             5.1.2  
cisco.ios                        4.6.1  
junipernetworks.junos         5.2.0