ansible-collections / cisco.nxos

Ansible Network Collection for Cisco NXOS
GNU General Public License v3.0
115 stars 109 forks source link

Unable to respond to interactive prompt command #769

Open colinet opened 11 months ago

colinet commented 11 months ago
SUMMARY

While running below playbook, it fails :

- name: switch_fc | helpers | session_reset - perform reset
  cisco.nxos.nxos_command:
    commands:
      - command: clear zone lock vsan 1014
        prompt: 'Do you want to continue'
        answer: 'y'

I get the below error:

TASK [ds-role-san_CRUD : switch_fc | helpers | session_reset - perform reset] **************************************************************************************************************
Friday 06 October 2023  14:29:08 +0200 (0:00:00.057)       0:00:03.903 ******** 
failed: [localhost] (item={'name': 'fabric_a', 'switch_fabric': 'xxxxxxxxxxxx', 'vsan_id': 1014}) => {"ansible_loop_var": "fab", "changed": false, "fab": {"name": "fabric_a", "switch_fabric": "xxxxxxxxxxxx", "vsan_id": 1014}, "module_stderr": "command timeout triggered, timeout value is 30 secs.\nSee the timeout setting options in the Network Debug and Troubleshooting Guide.", "module_stdout": "", "msg": "MODULE FAILURE\nSee stdout/stderr for the exact error"}
ISSUE TYPE

I used same example as exposed in the documentation: https://docs.ansible.com/ansible/latest/collections/cisco/nxos/nxos_command_module.html

COMPONENT NAME

nxos

ANSIBLE VERSION
[xxxxxxxxxx@xxxxxxxxxxxxxxx~]$  ansible --version
ansible [core 2.15.0]
  config file = /etc/ansible/ansible.cfg
  configured module search path = ['/home/xxxxxxxx/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
  ansible python module location = /usr/local/lib/python3.9/site-packages/ansible
  ansible collection location = /home/xxxxxxxxxxxx/.ansible/collections:/usr/share/ansible/collections
  executable location = /usr/local/bin/ansible
  python version = 3.9.16 (main, May 29 2023, 00:00:00) [GCC 11.3.1 20221121 (Red Hat 11.3.1-4)] (/usr/bin/python)
  jinja version = 3.1.2
  libyaml = True
[xxxxxxxxxxxx@xxxxxxxxxxxxxxxxx~]$
COLLECTION VERSION
[xxxxxxxx@xxxxxxxxxxxxxxx~]$  ansible-galaxy collection list |grep pure
purestorage.flasharray        1.21.0 
purestorage.flasharray        1.18.0 
CONFIGURATION
[xxxxxx@xxxxxxxxxxxx~]$  ansible-config dump --only-changed
CACHE_PLUGIN(/etc/ansible/ansible.cfg) = memory
CALLBACKS_ENABLED(/etc/ansible/ansible.cfg) = ['ansible.posix.profile_tasks']
CONFIG_FILE() = /etc/ansible/ansible.cfg
DEFAULT_FORKS(/etc/ansible/ansible.cfg) = 5
DEFAULT_GATHERING(/etc/ansible/ansible.cfg) = implicit
DEFAULT_HOST_LIST(/etc/ansible/ansible.cfg) = ['/etc/ansible/hosts']
DEFAULT_MANAGED_STR(/etc/ansible/ansible.cfg) = # WARNING: This script is managed by Ansible with The Linux Framework. Any manual changes will be lost the next time Ansible runs.
DEFAULT_POLL_INTERVAL(/etc/ansible/ansible.cfg) = 15
DEFAULT_ROLES_PATH(/etc/ansible/ansible.cfg) = ['/home/xxxxxxxx/workspace/ansible/ds-roles']
DEFAULT_TRANSPORT(/etc/ansible/ansible.cfg) = smart
DEFAULT_VAULT_PASSWORD_FILE(/etc/ansible/ansible.cfg) = /home/svc_worker/.vps.txt
DISPLAY_SKIPPED_HOSTS(/etc/ansible/ansible.cfg) = True
HOST_KEY_CHECKING(/etc/ansible/ansible.cfg) = False
PERSISTENT_CONNECT_RETRY_TIMEOUT(/etc/ansible/ansible.cfg) = 30
PERSISTENT_CONNECT_TIMEOUT(/etc/ansible/ansible.cfg) = 60
RETRY_FILES_ENABLED(/etc/ansible/ansible.cfg) = False
[xxxxxxx@xxxxxxxxxxxxx~]$ 
OS / ENVIRONMENT

Redhat 9.0

EXPECTED RESULTS

This should clear zone lock.

NilashishC commented 11 months ago

@colinet Is the target device Cisco MDS?

colinet commented 11 months ago

Yes, it is for an MDS switch.

I've tried different syntaxes. But no way. I wonder whether the example exposed in the documentation https://docs.ansible.com/ansible/latest/collections/cisco/nxos/nxos_command_module.html is valid.

NilashishC commented 11 months ago

@colinet As mentioned in the Notes section of docs, this module only has limited support for Cisco MDS switches and hence, might not fully work right out of the box, as it would for Nexus.

@srbharadwaj Would you be able to look into this?

srbharadwaj commented 11 months ago

@NilashishC is the option 'prompt' a valid one? i don't see that is the documentation.. and i also see that commented out in the code https://github.com/ansible-collections/cisco.nxos/blob/1fd405b383827716ef8f3c8c7eabe9d2e317d61d/plugins/modules/nxos_command.py#L176

NilashishC commented 11 months ago

@srbharadwaj The prompt option is valid. Since commands can be of at least two forms - (a) a list of strings (commands to send), (b) a list of dictionary (command + prompt + answer combination), it's element type is set to raw in argspec. The prompt handling logic is implemented in the cliconf plugin and in the network_cli connection plugin code.

https://github.com/ansible-collections/cisco.nxos/blob/main/plugins/cliconf/nxos.py#L240-L248 https://github.com/ansible-collections/ansible.netcommon/blob/main/plugins/connection/network_cli.py#L1059

NilashishC commented 11 months ago

@colinet Could you please share the device interaction logs for this scenario?

Steps: https://docs.ansible.com/ansible/latest/network/user_guide/network_debug_troubleshooting.html#enabling-networking-device-interaction-logging

srbharadwaj commented 11 months ago

ok can we know where this was tested?

On Mon, 9 Oct 2023 at 13:15, Nilashish Chakraborty @.***> wrote:

@srbharadwaj https://github.com/srbharadwaj The prompt option is valid. Since commands can be at least two forms - (a) a list of strings (commands to send), (b) a list of dictionary (command + prompt + answer combination), it's element type is set to raw in argspec. The prompt handling logic is implemented in the cliconf plugin and in the network_cli connection plugin code.

https://github.com/ansible-collections/cisco.nxos/blob/main/plugins/cliconf/nxos.py#L240-L248

https://github.com/ansible-collections/ansible.netcommon/blob/main/plugins/connection/network_cli.py#L1059

— Reply to this email directly, view it on GitHub https://github.com/ansible-collections/cisco.nxos/issues/769#issuecomment-1752495190, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACXMEFJ4OG7XZBQ3Q5TU5A3X6OTTJAVCNFSM6AAAAAA5VZAMZ6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTONJSGQ4TKMJZGA . You are receiving this because you were mentioned.Message ID: @.***>

NilashishC commented 11 months ago

@srbharadwaj The following task is tested to be working with Nexus 9300v (NX-OS 9.3.6):

    - name: Switch to maintenance mode
      cisco.nxos.nxos_command:
        commands:
          - configure terminal
          - command: system mode maintenance
            prompt: Do you want to continue
            answer: y
NilashishC commented 11 months ago

@colinet You can temporarily turn off cli confirmation prompts before you run the clear command as a workaround. Have you tried that?

- name: switch_fc | helpers | session_reset - perform reset
  cisco.nxos.nxos_command:
    commands:
      - terminal dont-ask
      - clear zone lock vsan 1014
colinet commented 11 months ago

The solution works on one fabric but surprisingly failed on second fabric with unexpected result:

The playbook is now:


- name: switch_fc | helpers | session_reset - perform reset
  cisco.nxos.nxos_command:
    commands:
      - terminal dont-ask
      - clear device-alias session
      - "clear zone lock vsan {{ fab.vsan_id }}"
  vars:
    ansible_connection: "{{ san_CRUD_switch_fabric_api }}"
    ansible_network_os: "{{ san_CRUD_switch_fabric_os }}"
    ansible_user: "{{ san_CRUD_switch_fabric_svc_user }}"
    ansible_password: "{{ san_CRUD_switch_fabric_svc_password }}"
    ansible_host: "{{ fab.switch_fabric }}"
    ansible_httpapi_port: "{{ san_CRUD_switch_fabric_port }}"
    ansible_httpapi_use_ssl: true
    ansible_httpapi_validate_certs: false
  loop: "{{ reset_data }}"
  loop_control:
    loop_var: fab

The outcome is:

TASK [ds-role-san_CRUD : switch_fc | helpers | session_reset - perform reset] *********************************************************************************************************************************************************************************
Monday 09 October 2023  16:58:10 +0200 (0:00:00.055)       0:00:02.221 ******** 
ok: [localhost] => (item={'name': 'fabric_a', 'switch_fabric': 'switch_001', 'vsan_id': 1014})
failed: [localhost] (item={'name': 'fabric_b', 'switch_fabric': 'switch_002', 'vsan_id': 2014}) => {"ansible_loop_var": "fab", "changed": false, "fab": {"name": "fabric_b", "switch_fabric": "switch_002", "vsan_id": 2014}, "module_stderr": "clear zone lock vsan 2014: CLI execution error: Command will clear lock from the entire fabric only if issued on initiating switch.\nElse lock will be cleared only locally.\nVSAN 2014 is not active\n", "module_stdout": "", "msg": "MODULE FAILURE\nSee stdout/stderr for the exact error"}

So it tells me on second fabric that VSAN 2014 is not active which is wrong. If I run the command directly on the switch of the second fabric, it is successfull.

srbharadwaj commented 11 months ago

Hi Remi, can you check the accounting logs when the failure occurred?

On Mon, 9 Oct 2023 at 20:31, Remi Colinet @.***> wrote:

The solution works on one fabric but surprisingly failed on second fabric with unexpected result:

TASK [ds-role-san_CRUD : switch_fc | helpers | session_reset - perform reset] ***** Monday 09 October 2023 16:58:10 +0200 (0:00:00.055) 0:00:02.221 **** ok: [localhost] => (item={'name': 'fabric_a', 'switch_fabric': 'swich_001', 'vsan_id': 1014}) failed: [localhost] (item={'name': 'fabric_b', 'switch_fabric': 'swich_002', 'vsan_id': 2014}) => {"ansible_loop_var": "fab", "changed": false, "fab": {"name": "fabric_b", "switch_fabric": "swich_002", "vsan_id": 2014}, "module_stderr": "clear zone lock vsan 2014: CLI execution error: Command will clear lock from the entire fabric only if issued on initiating switch.\nElse lock will be cleared only locally.\nVSAN 2014 is not active\n", "module_stdout": "", "msg": "MODULE FAILURE\nSee stdout/stderr for the exact error"}

So it tells me on second fabric that VSAN 2014 is not active which is wrong. If I run the command directly on the switch of the second fabric, it is successfull.

— Reply to this email directly, view it on GitHub https://github.com/ansible-collections/cisco.nxos/issues/769#issuecomment-1753179442, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACXMEFM22ZV6FGSO532STVLX6QGU5AVCNFSM6AAAAAA5VZAMZ6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTONJTGE3TSNBUGI . You are receiving this because you were mentioned.Message ID: @.***>

NilashishC commented 11 months ago

@colinet Just for my understanding, which connection type are you using to run the original task with the prompt and answer?

colinet commented 11 months ago

check the accounting logs when the failure occurred

I'am getting the following accounting log just after having run the playbook on the 1st switch of the second fabric:

Tue Oct 10 12:43:10 2023:type=stop:id=10.80.144.120@pts/0:user=dcnmuser:cmd=shell terminated because the ssh session closed
Tue Oct 10 12:43:10 2023:type=start:id=10.80.144.120@pts/7:user=dcnmuser:cmd=
Tue Oct 10 12:43:10 2023:type=update:id=10.80.144.120@pts/7:user=dcnmuser:cmd=terminal session-timeout 60 (SUCCESS)
Tue Oct 10 12:43:10 2023:type=update:id=10.80.144.120@pts/7:user=dcnmuser:cmd=terminal length 0 (SUCCESS)
Tue Oct 10 12:43:11 2023:type=stop:id=10.80.144.120@pts/7:user=dcnmuser:cmd=shell terminated because the ssh session closed
Tue Oct 10 12:43:11 2023:type=start:id=10.80.144.120@pts/0:user=dcnmuser:cmd=
Tue Oct 10 12:43:11 2023:type=update:id=10.80.144.120@pts/0:user=dcnmuser:cmd=terminal session-timeout 60 (SUCCESS)
Tue Oct 10 12:43:12 2023:type=update:id=10.80.144.120@pts/0:user=dcnmuser:cmd=terminal length 0 (SUCCESS)
colinet commented 11 months ago

@colinet Just for my understanding, which connection type are you using to run the original task with the prompt and answer?

I'am using API connexion type. For the above playbook, I have: san_CRUD_switch_fabric_api: ansible.netcommon.httpapi

colinet commented 11 months ago

I run the playbook with -vvv. The outcome is

TASK [ds-role-san_CRUD : switch_fc | helpers | session_reset - perform reset] *********************************************************************************************************************************************************************************************
task path: /home/xxxxxxx/workspace/ansible/ds-roles/ds-role-san_CRUD/tasks/switch_fc/helpers/session_reset.yml:20
Tuesday 10 October 2023  14:57:05 +0200 (0:00:00.048)       0:00:02.201 ******* 
redirecting (type: action) cisco.nxos.nxos_command to cisco.nxos.nxos
redirecting (type: action) cisco.nxos.nxos_command to cisco.nxos.nxos
ok: [localhost] => (item={'name': 'fabric_a', 'switch_fabric': 'mhxcissan000sas', 'vsan_id': 1014}) => {
    "ansible_loop_var": "fab",
    "changed": false,
    "fab": {
        "name": "fabric_a",
        "switch_fabric": "mhxcissan000sas",
        "vsan_id": 1014
    },
    "invocation": {
        "module_args": {
            "commands": [
                "terminal dont-ask",
                "clear zone lock vsan 1014"
            ],
            "interval": 1,
            "match": "all",
            "retries": 9,
            "wait_for": null
        }
    },
    "stdout": [
        {},
        "Command will clear lock from the entire fabric only if issued on initiating switch.\nElse lock will be cleared only locally.\nNo pending info found"
    ],
    "stdout_lines": [
        {},
        [
            "Command will clear lock from the entire fabric only if issued on initiating switch.",
            "Else lock will be cleared only locally.",
            "No pending info found"
        ]
    ]
}
redirecting (type: action) cisco.nxos.nxos_command to cisco.nxos.nxos
redirecting (type: action) cisco.nxos.nxos_command to cisco.nxos.nxos
failed: [localhost] (item={'name': 'fabric_b', 'switch_fabric': 'mhxcissan001sas', 'vsan_id': 2014}) => {
    "ansible_loop_var": "fab",
    "changed": false,
    "fab": {
        "name": "fabric_b",
        "switch_fabric": "mhxcissan001sas",
        "vsan_id": 2014
    },
    "module_stderr": "clear zone lock vsan 2014: CLI execution error: Command will clear lock from the entire fabric only if issued on initiating switch.\nElse lock will be cleared only locally.\nVSAN 2014 is not active\n",
    "module_stdout": "",
    "msg": "MODULE FAILURE\nSee stdout/stderr for the exact error"
}

PLAY RECAP ****************************************************************************************************************************************************************************************************************************************************************
localhost                  : ok=22   changed=0    unreachable=0    failed=1    skipped=4    rescued=0    ignored=0  

The message 'VSAN 2014 is not active' on the second fabric is unrelated to current action, and unexpected.

When I run the command ""clear zone lock vsan 2014" manually on the switch mhxcissan001sas, I get:

mhxcissan001sas# clear zone lock vsan 2014
Command will clear lock from the entire fabric only if issued on initiating switch.
Else lock will be cleared only locally.
Do you want to continue? (y/n) [n] y
No pending info found
mhxcissan001sas#

'VSAN 2014 is not active' should not show up when executing the command via the API and Ansible.

NilashishC commented 11 months ago

@colinet Just for my understanding, which connection type are you using to run the original task with the prompt and answer?

I'am using API connexion type. For the above playbook, I have: san_CRUD_switch_fabric_api: ansible.netcommon.httpapi

I don't think prompts will ever work with NX-API due to the very nature of HTTP. Have you tried doing the same thing via the NX-API sandbox? Does it work there?

colinet commented 11 months ago

@colinet Just for my understanding, which connection type are you using to run the original task with the prompt and answer?

I'am using API connexion type. For the above playbook, I have: san_CRUD_switch_fabric_api: ansible.netcommon.httpapi

I don't think prompts will ever work with NX-API due to the very nature of HTTP. Have you tried doing the same thing via the NX-API sandbox? Does it work there?

I'am fine with '- terminal dont-ask' 1st command (and forget about prompt through NX-API). This is running on Fabric A. But the command fails on Fabric B with "VSAN 2014 is not active" despite this VSAN is active.

colinet commented 11 months ago

On fabric B where the error related to VSAN 2014 not being active, the state is :

mhxcissan001sas# show vsan 2014
vsan 2014 information
         name:VSAN2014  state:active
         interoperability mode:default
         loadbalancing:src-id/dst-id/oxid
         operational state:up

mhxcissan001sas#
srbharadwaj commented 11 months ago

@colinet does the accounting log on mhxcissan001sas show failure after running the playbook? (show accounting log | i clear) also what is the mhxcissan001sas switch version and model?