ansible-collections / ansible.netcommon

Ansible Network Collection for Common Code
GNU General Public License v3.0
143 stars 102 forks source link

modules built off of network_cli failing #301

Closed thedoubl3j closed 2 years ago

thedoubl3j commented 3 years ago
SUMMARY

This is being reported on behalf of a partner. related: https://github.com/ansible-collections/community.network/issues/290


We’re currently experiencing issues with some of our modules that are built off of the network_cli Connection plugin.

ISSUE TYPE
COMPONENT NAME

network_cli.py

ANSIBLE VERSION
ansible --version
ansible [core 2.11.0]
  config file = /etc/ansible/ansible.cfg
  configured module search path = ['/root/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
  ansible python module location = /usr/local/lib/python3.8/site-packages/ansible
COLLECTION VERSION
*
CONFIGURATION
OS / ENVIRONMENT
STEPS TO REPRODUCE
EXPECTED RESULTS
ACTUAL RESULTS
fatal: [switch1]: FAILED! => {"ansible_facts": {"discovered_interpreter_python": "/usr/bin/python3"}, "changed": false, "msg": "unable to retrieve current config", "stderr": "Unable to decode JSON from response to exec_command(show running-config). Received 'None'.", "stderr_lines": ["Unable to decode JSON from response to exec_command(show running-config). Received 'None'."]}
thedoubl3j commented 3 years ago

@tchiapuziowong could you fill in any of the gaps I left or add further context when you can?

Qalthos commented 3 years ago

This does seem to be down to the modules using exec_command instead of calling send directly, but I'm not entirely sure what is causing this.

Is it possible to supply a debug log following these steps? That would at least give me an idea of what is going on on the collection side.

danielgarciavaglio commented 3 years ago

Hi, as per @tchiapuziowong and @Qalthos request, here are the logs:

exec_command_fail.log

tchiapuziowong commented 3 years ago

Hi @Qalthos ! Can we provide any additional info to help troubleshoot this issue?

tchiapuziowong commented 2 years ago

We've learned that this is specific to Ansible versions 2.10+ and Ansible Netcommon version 2.0.0+ - was there a breaking change between those two versions that we need to handle as we're dependent?

thedoubl3j commented 2 years ago

@Qalthos any updates on this?

tchiapuziowong commented 2 years ago

@Qalthos @thedoubl3j can I please get an update on this? or some guidance on how we can solve this on our end with our code? this is effecting a lot of users and it's been months since we've reported this without any progress

Qalthos commented 2 years ago

Applying ansible/ansible#75313 should help diagnose what exactly is happening here. My guess is that it's related to the aruba collection's use of exec_command instead of send_command.

My guess is that somehow exec_command is getting called to send a command to the device before the connection is established, which means the command gets sent to the local shell instead and failing. There is a complicated reason why this happens, but the main takeaway is that exec_command remains this way for compatibility purposes and platforms should be using send_command (or a platform-specific wrapper around that) when a cliconf plugin is available.

Without a log with 75313 applied, I couldn't speculate about why it is failing, but given that it this issue first appeared with netcommon 2.0.0, I imagine this might be a side-effect of another breaking change introduced in that release.

tchiapuziowong commented 2 years ago

Hi @Qalthos I've pulled your PR and ran the playbook again to produce the issue and attached the logs. When looking at our code and trying to find the "exec_command" this is the only place I saw it referenced - is this where you're suggesting we change to send_command ? https://github.com/aruba/aoscx-ansible-collection/blob/master/plugins/module_utils/aoscx.py#L82-L101

ansible.log

Qalthos commented 2 years ago

Hi @Qalthos I've pulled your PR and ran the playbook again to produce the issue and attached the logs. When looking at our code and trying to find the "exec_command" this is the only place I saw it referenced - is this where you're suggesting we change to send_command ? https://github.com/aruba/aoscx-ansible-collection/blob/master/plugins/module_utils/aoscx.py#L82-L101

That is the spot where the original failure showed up, but it isn't actually the part calling the real exec_command. What you're probably looking to do is changing https://github.com/aruba/aoscx-ansible-collection/blob/master/plugins/module_utils/aoscx.py#L124-L136 to swap conn.exec_command to conn.send_command and using Python's exception handling instead of the no-longer-available rc and err. You should be able to slim down that function to just

def exec_command(module, command):
    '''
    Execute command on the switch
    '''
    conn = create_ssh_connection(module)
    return conn.send_command(command)

and change how you're calling that exec_command to

try:
    out = exec_command(module, 'configure terminal')
except ConnectionError as exc:
    module.fail_json(msg='unable to enter configuration mode',
                     err=to_text(exc, errors='surrogate_then_replace'))

in the four places it's called. You should also absolutely remove the from ansible.module_utils.connection import exec_command from this file as at best it's being replaced by the version defined in the file and at worst being called when you mean to call the local function of the same name

ansible.log

This log, on the other hand, seems to be a completely different issue. It appears to be failing because of a poorly formatted terminal_initial_prompt value. The default value for this option in the aruba collection is for no prompts, and I don't have the playbook that generated this log, but it looks like the option has been overridden and starts with an unescaped bracket of some kind that the regular expression parser is failing to match.

Qalthos commented 2 years ago

@tchiapuziowong I see a commit implementing the suggestions I made here, do you consider this solved, or is there more that you need before I can close this?

tchiapuziowong commented 2 years ago

@Qalthos yes this seems to have fixed our issue! Thank you sooo so much for your support in resolving this 👏