Command timeout on RouterOS 7

radokristof commented 9 months ago

SUMMARY

Dear Community!

I have a weird issue. I have some Mikrotik devices, these are using my ansible backup playbook, to do a backup once a week.

Some of them now fails to do the backup (one site). Some devices were gave up during a storm recently, but some are still the same as before. New, identical devices were installed, backup was applied to them.

Around since then, the backup script is not working at this site. On all of the devices.

SSH is working correctly, I can log-in from the server to these devices. API connection is working correctly, even in Ansible.

I can see that Ansible can log-in, key is accepted, access is granted. But after that, nothing happens basically.

Login attempt by Ansible:

In Ansible, I can't find anything suspicious. It just times out, as it is unable to reach the destination...:

<10.0.13.1> EXEC /bin/sh -c 'rm -f -r /root/.ansible/tmp/ansible-local-2069615kcu5ya9/ansible-tmp-1697089639.2060575-207982-15282461545527/ > /dev/null 2>&1 && sleep 0'
The full traceback is:
  File "/tmp/ansible_community.network.routeros_command_payload_yfp95ui9/ansible_community.network.routeros_command_payload.zip/ansible_collections/community/routeros/plugins/module_utils/routeros.py", line 51, in get_capabilities
    capabilities = Connection(module._socket_path).get_capabilities()
  File "/tmp/ansible_community.network.routeros_command_payload_yfp95ui9/ansible_community.network.routeros_command_payload.zip/ansible/module_utils/connection.py", line 200, in __rpc__
    raise ConnectionError(to_text(msg, errors='surrogate_then_replace'), code=code)
fatal: [ayc-sw3]: FAILED! => {
    "changed": false,
    "invocation": {
        "module_args": {
            "commands": [
                "/system/leds/set 0 type=on"
            ],
            "interval": 1,
            "match": "all",
            "retries": 10,
            "wait_for": null
        }
    },
    "msg": "command timeout triggered, timeout value is 60 secs.\nSee the timeout setting options in the Network Debug and Troubleshooting Guide."
}
The full traceback is:
  File "/tmp/ansible_community.network.routeros_command_payload_6mq7m511/ansible_community.network.routeros_command_payload.zip/ansible_collections/community/routeros/plugins/module_utils/routeros.py", line 51, in get_capabilities
    capabilities = Connection(module._socket_path).get_capabilities()
  File "/tmp/ansible_community.network.routeros_command_payload_6mq7m511/ansible_community.network.routeros_command_payload.zip/ansible/module_utils/connection.py", line 200, in __rpc__
    raise ConnectionError(to_text(msg, errors='surrogate_then_replace'), code=code)
fatal: [ayc-gw1]: FAILED! => {
    "changed": false,
    "invocation": {
        "module_args": {
            "commands": [
                "/system/leds/set 0 type=on"
            ],
            "interval": 1,
            "match": "all",
            "retries": 10,
            "wait_for": null
        }
    },
    "msg": "command timeout triggered, timeout value is 60 secs.\nSee the timeout setting options in the Network Debug and Troubleshooting Guide."
}

ISSUE TYPE

Bug Report

COMPONENT NAME

community.network.routeros_command

ANSIBLE VERSION

ansible [core 2.14.3]
  config file = /etc/ansible/ansible.cfg
  configured module search path = ['/home/kristof/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
  ansible python module location = /home/kristof/.local/lib/python3.10/site-packages/ansible
  ansible collection location = /home/kristof/.ansible/collections:/usr/share/ansible/collections
  executable location = /home/kristof/.local/bin/ansible
  python version = 3.10.12 (main, Jun 11 2023, 05:26:28) [GCC 11.4.0] (/usr/bin/python3)
  jinja version = 3.0.3
  libyaml = True

COLLECTION VERSION

community.routeros            2.9.0

radokristof commented 9 months ago

I found the issue: I have added parenthesis in all devices identity at this location. So name convenction some something like: location-sw1-(rack1)

That caused ssh to work improperly. I did not had time yet to check, but I suspect that this is not escaped / parsed incorrectly when receiving response.

felixfontein commented 9 months ago

Maybe it's related to prompt detection, or something like that. (For my personal use, I started only using the SSH modules to set up API via HTTPS and then only use that.)

radokristof commented 9 months ago

Yes, same for me, 99% I use API where I can, though there are some cases where it is not implemented in API yet or even not possible through API. For the record, this playbook creates an export and a backup of the config and pushes it to my server through FTP.

There are no endpoints for these operations. It might be better to create a script/scheduler on device for this, but I did not like that before when I tried, it is easier to manage this centrally (at least for me).

felixfontein commented 9 months ago

IIRC there is an API endpoint for exporting the config, but you can only write it to a file on the router's filesystem. Then you have to use something like net_get to download the file. (At least that's what I wrote a longer time ago when working on the api_facts module: https://github.com/ansible-collections/community.routeros/pull/88#issuecomment-1121876460 - I don't remember anymore how exactly to use the api module for it.)

stasstryukov commented 9 months ago

Have same issue. Any workaround for this?

kyerlasswell commented 1 month ago

Linking this page for reference: How to connect to RouterOS devices with SSH.

It specifies that device names can only use alphanumeric characters, underscores and dashes.

Another big one is the need to add +cet512w to the end of the username (like admin+cet512w). Without this, if your commands are too long, it will produce the same command timeout error.

@stasstryukov if you're still having this issue, give this page a glance and see if that resolves it for you.

ansible-collections / community.routeros