ansible-collections / ansible.netcommon

Ansible Network Collection for Common Code
GNU General Public License v3.0
143 stars 102 forks source link

Not connected to NETCONF server after a while #435

Open jean-christophe-manciot opened 2 years ago

jean-christophe-manciot commented 2 years ago
SUMMARY

The connection to the remote NETCONF device is lost forever after a while when looping over the same netconf_rpc operation with some delay using ansible_network_cli_ssh_type: paramiko or ansible_network_cli_ssh_type: libssh

The issue does not come from a congested bandwidth as the device is the only one tested. However, it happens with all versions of IOS XRv and IOS XRv 9k I could put my hands on, but not on any IOS XEv..

ISSUE TYPE
COMPONENT NAME

module_utils

ANSIBLE VERSION
ansible [core 2.12.6]
  config file = /etc/ansible/ansible.cfg
  ansible python module location = /usr/local/lib/python3.10/dist-packages/ansible
  ansible collection location = /opt
  executable location = /usr/local/bin/ansible
  python version = 3.10.4 (main, May 14 2022, 05:40:22) [GCC 11.3.0]
  jinja version = 3.1.2
  libyaml = True
COLLECTION VERSION
$ ansible-galaxy collection list ansible.netcommon

# /usr/local/lib/python3.10/dist-packages/ansible_collections
Collection        Version
----------------- -------
ansible.netcommon 2.6.1  

# /opt/ansible_collections
Collection        Version
----------------- -------
ansible.netcommon 3.0.0  
CONFIGURATION
CACHE_PLUGIN(/etc/ansible/ansible.cfg) = redis
CACHE_PLUGIN_CONNECTION(/etc/ansible/ansible.cfg) = localhost:<port>:0:<password>
CACHE_PLUGIN_TIMEOUT(/etc/ansible/ansible.cfg) = 259200
COLLECTIONS_PATHS(/etc/ansible/ansible.cfg) = ['/opt']
DEFAULT_EXECUTABLE(/etc/ansible/ansible.cfg) = /bin/bash
DEFAULT_FORKS(/etc/ansible/ansible.cfg) = 1000
DEFAULT_GATHERING(/etc/ansible/ansible.cfg) = explicit
DEFAULT_GATHER_TIMEOUT(/etc/ansible/ansible.cfg) = 30
DEFAULT_HASH_BEHAVIOUR(/etc/ansible/ansible.cfg) = replace
DEFAULT_HOST_LIST(/etc/ansible/ansible.cfg) = ['/etc/ansible/hosts']
DEFAULT_LOAD_CALLBACK_PLUGINS(/etc/ansible/ansible.cfg) = True
DEFAULT_LOG_PATH(/etc/ansible/ansible.cfg) = /var/log/ansible.log
DEFAULT_PRIVATE_ROLE_VARS(/etc/ansible/ansible.cfg) = False
DEFAULT_STDOUT_CALLBACK(/etc/ansible/ansible.cfg) = yaml
DEFAULT_TIMEOUT(/etc/ansible/ansible.cfg) = 180
DEFAULT_TRANSPORT(/etc/ansible/ansible.cfg) = ssh
ENABLE_TASK_DEBUGGER(/etc/ansible/ansible.cfg) = True
HOST_KEY_CHECKING(/etc/ansible/ansible.cfg) = True
INJECT_FACTS_AS_VARS(/etc/ansible/ansible.cfg) = True
INTERPRETER_PYTHON(/etc/ansible/ansible.cfg) = /usr/bin/python3
PERSISTENT_COMMAND_TIMEOUT(/etc/ansible/ansible.cfg) = 3599
PERSISTENT_CONNECT_RETRY_TIMEOUT(/etc/ansible/ansible.cfg) = 200
PERSISTENT_CONNECT_TIMEOUT(/etc/ansible/ansible.cfg) = 3600
RETRY_FILES_ENABLED(/etc/ansible/ansible.cfg) = False
SHOW_CUSTOM_STATS(/etc/ansible/ansible.cfg) = True
OS / ENVIRONMENT
STEPS TO REPRODUCE
- name: Downloading a list of schemas from IOS-XR 9k 7.4.1
  hosts:
        - all
  gather_facts: false
  strategy: debug
  tasks:
        - name: Downloading a list of schemas using paramiko
          vars:
                ansible_connection: ansible.netcommon.netconf
                ansible_network_os: default
                ansible_network_cli_ssh_type: paramiko
                ansible_ssh_private_key_file: "{{ private_key_file }}"
                ansible_ssh_private_key_file_password: "{{ private_key_file_password }}"
                ansible_user: "{{ username }}"
          ansible.netcommon.netconf_rpc:
                rpc: get-schema
                xmlns: urn:ietf:params:xml:ns:yang:ietf-netconf-monitoring
                content: |
                        <identifier>{{ item.name }}</identifier>
                        <version>{{ item.revision }}</version>
                        <format>yang</format>
                display: xml
          loop:   
                - name: ietf-netconf-acm
                  revision: '2018-02-14'
                - name: ietf-netconf-monitoring
                  revision: '2010-10-04'
                - name: ietf-yang-library
                  revision: '2019-01-04'
                - name: openconfig-acl
                  revision: '2017-05-26'
                - name: openconfig-bgp-policy
                  revision: '2017-07-30'
                - name: openconfig-if-ip
                  revision: '2019-01-08'
                - name: openconfig-if-ip-ext
                  revision: '2018-11-21'
                - name: openconfig-interfaces
                  revision: '2019-11-19'
                - name: openconfig-lacp
                  revision: '2017-05-05'
                - name: openconfig-lldp
                  revision: '2016-05-16'
                - name: openconfig-local-routing
                  revision: '2017-05-15'
                - name: openconfig-platform-cpu
                  revision: '2018-01-30'
                - name: openconfig-platform
                  revision: '2019-04-16'
                - name: openconfig-platform-port
                  revision: '2018-01-20'
                - name: openconfig-platform-psu
                  revision: '2018-11-21'
                - name: openconfig-platform-transceiver
                  revision: '2018-11-25'
                - name: openconfig-rib-bgp
                  revision: '2016-04-11'
                - name: openconfig-rsvp-sr-ext
                  revision: '2017-03-06'
                - name: openconfig-system
                  revision: '2018-07-17'
                - name: openconfig-telemetry
                  revision: '2016-02-04'
                - name: openconfig-vlan
                  revision: '2016-05-26'
          loop_control:
                pause: 20
          ignore_errors: true
EXPECTED RESULTS

No disconnection from remote NETCONF server

ACTUAL RESULTS
...
<172.21.202.121> Using network group action ansible.netcommon.netconf for ansible.netcommon.netconf_rpc
<172.21.202.121> ANSIBLE_NETWORK_IMPORT_MODULES: enabled via connection option
<172.21.202.121> ANSIBLE_NETWORK_IMPORT_MODULES: found ansible.netcommon.netconf_rpc  at /opt/ansible_collections/ansible/netcommon/plugins/modules/netconf_rpc.py
<172.21.202.121> ANSIBLE_NETWORK_IMPORT_MODULES: running ansible.netcommon.netconf_rpc
<172.21.202.121> ANSIBLE_NETWORK_IMPORT_MODULES: complete
The full traceback is:
  File "/opt/ansible_collections/ansible/netcommon/plugins/module_utils/network/netconf/netconf.py", line 130, in dispatch
    response = conn.dispatch(request)
  File "/opt/ansible_collections/ansible/netcommon/plugins/module_utils/network/common/netconf.py", line 80, in __rpc__
    return self.parse_rpc_error(
  File "/opt/ansible_collections/ansible/netcommon/plugins/module_utils/network/common/netconf.py", line 126, in parse_rpc_error
    raise ConnectionError(rpc_error)
failed: [IOS_XRv_9k-7.4.1] (item={'name': 'openconfig-system', 'revision': '2018-07-17'}) => changed=false 
  ansible_loop_var: item
  invocation:
    module_args:
      content: |-
        <identifier>openconfig-system</identifier>
        <version>2018-07-17</version>
        <format>yang</format>
      display: xml
      rpc: get-schema
      xmlns: urn:ietf:params:xml:ns:yang:ietf-netconf-monitoring
  item:
    name: openconfig-system
    revision: '2018-07-17'
  msg: b'Not connected to NETCONF server'
...

The full log is available here: Not connected to NETCONF server after a while.log.

jean-christophe-manciot commented 2 years ago

Same issue with netcommon 3.0.1.

jean-christophe-manciot commented 2 years ago

A few more details: 1) The first failed netconf_rpc which yields:

redirecting (type: netconf) ansible.builtin.default to ansible.netcommon.default
<172.21.202.121> attempting to start connection
<172.21.202.121> using connection plugin ansible.netcommon.netconf
Found ansible-connection at path /usr/local/bin/ansible-connection
<172.21.202.121> found existing local domain socket, using it!
<172.21.202.121> 
<172.21.202.121> local domain socket path is /media/SAMSUNG9-Shared/home/admin/.ansible/pc/14e3bdf45e
<172.21.202.121> Using network group action ansible.netcommon.netconf for ansible.netcommon.netconf_rpc
<172.21.202.121> ANSIBLE_NETWORK_IMPORT_MODULES: enabled
<172.21.202.121> ANSIBLE_NETWORK_IMPORT_MODULES: found ansible.netcommon.netconf_rpc  at /opt/ansible_collections/ansible/netcommon/plugins/modules/netconf_rpc.py
<172.21.202.121> ANSIBLE_NETWORK_IMPORT_MODULES: running ansible.netcommon.netconf_rpc
<172.21.202.121> ANSIBLE_NETWORK_IMPORT_MODULES: complete
The full traceback is:
  File "/opt/ansible_collections/ansible/netcommon/plugins/module_utils/network/netconf/netconf.py", line 128, in dispatch
    response = conn.dispatch(request)
  File "/opt/ansible_collections/ansible/netcommon/plugins/module_utils/network/common/netconf.py", line 80, in __rpc__
    return self.parse_rpc_error(
  File "/opt/ansible_collections/ansible/netcommon/plugins/module_utils/network/common/netconf.py", line 126, in parse_rpc_error
    raise ConnectionError(rpc_error)
fatal: [IOS_XRv_9k-7.4.1]: FAILED! => changed=false 
  invocation:
    module_args:
      content: |2-
          <identifier>Cisco-IOS-XR-lldp-clear-act</identifier>
          <version>2019-11-13</version>
          <format>yang</format>
      display: xml
      rpc: get-schema
      xmlns: urn:ietf:params:xml:ns:yang:ietf-netconf-monitoring
  msg: b'Not connected to NETCONF server'

2) According to netconf statistics, the session is dropped along the way:

{
  "ietf-netconf-monitoring:netconf-state": {
    "statistics": {
      "dropped-sessions": "1",
      "in-bad-hellos": "0",
      "in-bad-rpcs": "0",
      "in-rpcs": "42",
      "in-sessions": "1",
      "netconf-start-time": "2022-06-01T07:57:39Z",
      "out-notifications": "0",
      "out-rpc-errors": "0"
    }
  }
}

Even inserting another type of rpc call such as a rpc: get between all rpc: get-schema does not change anything: the netconf sesssion is still dropped after a while.