ansible-collections / ansible.netcommon

Ansible Network Collection for Common Code
GNU General Public License v3.0
144 stars 104 forks source link

network_cli doesn't respect user-specified timeout values for all tasks #233

Open rudimocnik opened 3 years ago

rudimocnik commented 3 years ago
SUMMARY

network_cli doesn't respect ansible_command_timeout on one of my tasks.

ISSUE TYPE
COMPONENT NAME

network_cli

ANSIBLE VERSION
ansible 2.10.6
config file = /home/rudimocnik/ansible/dvp/ansible.cfg
configured module search path = ['/home/rudimocnik/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
ansible python module location = /home/rudimocnik/virtualenv/py3-ansible/lib/python3.8/site-packages/ansible
executable location = /home/rudimocnik/virtualenv/py3-ansible/bin/ansible
python version = 3.8.6 (default, Sep 25 2020, 00:00:00) [GCC 10.2.1 20200723 (Red Hat 10.2.1-1)]
CONFIGURATION
ACTION_WARNINGS(/home/rudimocnik/ansible/dvp/ansible.cfg) = False
DEFAULT_FORKS(/home/rudimocnik/ansible/dvp/ansible.cfg) = 40
DEFAULT_HOST_LIST(/home/rudimocnik/ansible/dvp/ansible.cfg) = ['/home/rudimocnik/ansible/dvp/inv.yml']
DEPRECATION_WARNINGS(/home/rudimocnik/ansible/dvp/ansible.cfg) = False
HOST_KEY_CHECKING(/home/rudimocnik/ansible/dvp/ansible.cfg) = False
PERSISTENT_COMMAND_TIMEOUT(/home/rudimocnik/ansible/dvp/ansible.cfg) = 30
PERSISTENT_CONNECT_TIMEOUT(/home/rudimocnik/ansible/dvp/ansible.cfg) = 90
OS / ENVIRONMENT

Cisco cat9300 running 16.12.3a

STEPS TO REPRODUCE

run the install_ios.yml on a 9300 switch

---
# This playbook Upgrades Cisco devices

# Find 9300 stack

- name: UPGRADE c9300 & ASR-920
  hosts: all
  #strategy: free
  connection: network_cli
  gather_facts: false

  tasks:
    - name: Scan the network
      ios_facts:
        gather_subset: all

#### Upgrade IOS on c9300 stack

    - name: UPGRADE C9300 stack  // This task will be skipped if image is compliant and for non c9300 devices.
      include_role:
        name: install_upgrade
      when: (ansible_net_model == "C9300-24P" or ansible_net_model == "C9300-24T") and (ansible_net_version != c9300_upgrade_ios_version)

##### my install_upgrade role #####

########### main.yml ##########
---
# tasks file for ./roles/ios_image_upgrade

- include_tasks: version-check.yml

- include_tasks: file_transfer.yml

- include_tasks: install_ios.yml

- include_tasks: save_config.yml

- include_tasks: reload.yml

- include_tasks: version-check.yml

- include_tasks: cleanup.yml

########### file_transfer.yml ##########
---
# Transfer file to Cisco device

- name: Copy image to target device 
  cisco.ios.ios_command:
    commands:
    - command: "copy {{ c9300_file_source }}{{ c9300_file_name }} flash:"
      prompt: "Destination filename [{{ c9300_file_name }}]?"
      answer: "\r"
  vars:
    ansible_command_timeout: 300

########### install_ios.yml ##########
---
# Install image file to Cisco device

- name: Install new image
  cisco.ios.ios_command:
    commands: "install add file flash:{{ c9300_file_name }} activate commit prompt-level none"
  vars:
    ansible_command_timeout: 900
EXPECTED RESULTS

I expect for timeout of 900 to be respected in the install_ios.yml task inside my role similar to the file_transfer.yml.

ACTUAL RESULTS

File transfers successfully while install_ios task fails with 30 second timeout error. output-playbook

Also I am not sure where is the 'b' coming from in the error message b'install add file ....'

timeout value 30 seconds reached while trying to send command
rohitthakur2590 commented 3 years ago

@rudimocnik could you please share detailed ansible logs for the same Play. you could enable that by executing below on the terminal:

export ANSIBLE_LOG_PATH=ansible_logs.log
export ANSIBLE_PERSISTENT_LOG_MESSAGES=TRUE
export ANSIBLE_DEBUG=TRUE
rudimocnik commented 3 years ago

@rohitthakur2590 Excuse my late reply. Here is the output you requested.

ansible_logs.log

lorephoenix commented 3 years ago

@rohitthakur2590 I am having the same issue as @rudimocnik

ansible-playbook 2.10.7
config file = /etc/ansible/ansible.cfg
configured module search path = ['/home/deployer/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
ansible python module location = /usr/lib/python3.6/site-packages/ansible
executable location = /usr/bin/ansible-playbook
python version = 3.6.7 (default, Dec  6 2018, 11:09:34) [GCC 4.4.7 20120313 (Red Hat 4.4.7-23)]

Ansible collection list

ansible-galaxy collection list
 24539 1617127198.50085: starting run
/usr/lib/python3.6/site-packages/requests/__init__.py:91: RequestsDependencyWarning: urllib3 (1.26.4) or chardet (3.0.4) doesn't match a supported version!
  RequestsDependencyWarning)
 24539 1617127202.65879: Validate TLS certificates for https://galaxy.ansible.com: True

# /home/deployer/.ansible/collections/ansible_collections
Collection        Version
----------------- -------
ansible.netcommon 2.0.0
ansible.posix     1.2.0
ansible.utils     2.0.1
cisco.ios         2.0.0

When running the command directly on device:

begnt-vdb4-lab-023#install add file flash:cat9k_lite_iosxe.16.12.05.SPA.bin
install_add: START Tue Mar 30 17:34:51 UTC 2021
Mar 30 17:34:54.588 %INSTALL-5-INSTALL_START_INFO: R0/0: install_engine: Started install add flash:cat9k_lite_iosxe.16.12.05.SPA.bin
install_add: Adding PACKAGE
install_add: Checking whether new add is allowed ....

--- Starting initial file syncing ---
Info: Finished copying flash:cat9k_lite_iosxe.16.12.05.SPA.bin to the selected switch(es)
Finished initial file syncing

--- Starting Add ---
Performing Add on all members
  [1] Add package(s) on switch 1
  [1] Finished Add on switch 1
Checking status of Add on [1]
Add: Passed on [1]
Finished Add

Image added. Version: 16.12.5.0.5625
SUCCESS: install_add  Tue Mar 30 17:43:35 UTC 2021
Mar 30 17:43:37.045 %INSTALL-5-INSTALL_COMPLETED_INFO: R0/0: install_engine: Completed install add PACKAGE flash:cat9k_lite_iosxe.16.12.05.SPA.bin

Ansible task:

- name: ios-xe_installing | MODE INSTALL | install | Add image
  vars:
    ansible_command_timeout: 1800
  when: not image_installed
  register: upgrade_results
  cisco.ios.ios_command:
    commands:
      - command: "install add file {{ ansible_net_filesystems if (ansible_net_filesystems|length > 1) else ansible_net_filesystems[0] }}{{ required_ios_binary }}"

but when running the Ansible role then I notice that the ansible_command_timeout doesn't have any impact, see logs (partial) ansible_logs.log

rudimocnik commented 3 years ago

@rohitthakur2590 I have been testing different scenarios and I couldn't make the "Install new image" task work inside the role. However, I was able to use include_playbook and in this seperate playbook included just the install task. Strangely this worked but I have no explanation why this time te timeout did not trigger. Furthermore, if I add more tasks to "Install new image2" play the problems with timeout reappear.

This is Install new image2 playbook

lorephoenix commented 3 years ago

@rohitthakur2590 I also did some test to reduce the amount of tasks as low as possible. I first started to run the playbook 'test1' without Gathering Facts and then I am able to process the 'install add file ...' without any timeout issue. ansible_without_ios_facts.log

When I am running the same playbook but where I added a task 'test | Gathering Facts' that I am getting a timeout value. ansible_using_ios_facts.log

update 2021-04-29 Instead using the default Paramiko connection plugin that I tried it with the new LibSSH connection plugin and I don't have this timeout issue anymore. https://www.ansible.com/blog/new-libssh-connection-plugin-for-ansible-network

[persistent_connection]
ssh_type = libssh

ANSIBLE PLAYBOOK

---
- name: Cisco IOS-XE upgrade
  hosts: NETWORK
  gather_facts: no
  roles:
  - role: test1

ANSIBLE ROLE 'test1' - tasks/main.yml

---
# tasks file for test1
- name: test | Gathering Facts
  ansible.builtin.ios_facts:
    gather_subset: hardware
  tags:
    - installing

- name: test | Define dictionary
  ansible.builtin.set_fact:
    device_findfile_info: "{{ device_findfile_info|default({}) | 
        combine( { 'flash:' : { 'filename' : 'cat9k_lite_iosxe.16.12.05b.SPA.bin' }}) }}"
  tags:
    - installing

- name: test | debug
  debug: var=device_findfile_info
  tags:
    - installing

- name: "test | Add image "
  vars:
    ansible_command_timeout: 1800
  register: command_results
  cisco.ios.ios_command:
    commands:
    - command: "install add file {{ item }}{{ device_findfile_info[item]['filename'] }}\n\n"
  with_items: "{{ device_findfile_info.keys() | list }}"
  tags:
    - installing

ANSIBLE VERSION ansible 2.10.7 config file = /etc/ansible/ansible.cfg configured module search path = ['/home/deployer/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules'] ansible python module location = /usr/lib/python3.6/site-packages/ansible executable location = /usr/bin/ansible python version = 3.6.7 (default, Dec 6 2018, 11:09:34) [GCC 4.4.7 20120313 (Red Hat 4.4.7-23)]

Collection Version ansible.netcommon 2.0.1 ansible.posix 1.2.0 ansible.utils 2.0.1 cisco.ios 2.0.0

jknight-netscout commented 2 years ago

I found a workaround which I posted into a different issue https://github.com/ansible-collections/ansible.netcommon/issues/269#issuecomment-1102979029

You can use meta: reset_connection before and after the task you'd like to increase the timeout for.

My example task, which seemed to work

- name: Workaround to bump timeout
  meta: reset_connection

- name: Find any required upgrades for modules
  register: epld_upgrade_required
    vars:
      ansible_command_timeout: 90
    ansible.netcommon.cli_command:
      command: "show install all impact epld bootflash:{{ epld_file }} | json"

- name: Workaround, back to default timeout
  meta: reset_connection

Before using this workaround, the command would timeout at 30 seconds even though for this task I had the timeout set to 90 seconds. After this workaround, my command correctly waits longer and no longer fails.

The problem is network_cli.py will only read in the timeout variable on a new ssh connection. If it's not the first task in the playbook, and therefore you have an existing ssh connection already, the plugin will not update the command_timeout variable and continue to use the value used when the session was first established.