ansible-collections / community.routeros

Ansible modules for managing MikroTik RouterOS instances.
https://galaxy.ansible.com/ui/repo/published/community/routeros/
GNU General Public License v3.0
98 stars 45 forks source link

Duplicate data in stdout and facts starting with RouterOS 7.2 #132

Open markwell-ch opened 1 year ago

markwell-ch commented 1 year ago
SUMMARY

Starting with Router OS 7.2 stdout of community.routeros.command contains the output twice. And when running community.routeros.facts against a device with RouterOS 7.2 or newer (tested up to RouterOS 7.6) hostvars[inventory_hostname]['ansible_net_config'] contains the config twice.

ISSUE TYPE
COMPONENT NAME

community.routeros.facts community.routeros.command

ANSIBLE VERSION
ansible [core 2.13.6]
  config file = /home/markus/gits/ansible-speedcom-backbone/ansible.cfg
  configured module search path = ['/home/markus/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
  ansible python module location = /home/markus/gits/ansible-speedcom-backbone/.venv/lib/python3.10/site-packages/ansible
  ansible collection location = /home/markus/gits/ansible-speedcom-backbone/.ansible/collections
  executable location = /home/markus/gits/ansible-speedcom-backbone/.venv/bin/ansible
  python version = 3.10.6 (main, Nov  2 2022, 18:53:38) [GCC 11.3.0]
  jinja version = 3.1.2
  libyaml = True
COLLECTION VERSION
Collection         Version
------------------ -------
community.routeros 2.3.1 
OS / ENVIRONMENT

RouterOS>=7.2

STEPS TO REPRODUCE
    - name: Collect only the config and default facts
      community.routeros.facts:
        gather_subset:
          - "config"

    - name: Save Config Backup
      ansible.builtin.copy:
        content: "{{ hostvars[inventory_hostname]['ansible_net_config'] }}"
        dest: "{{ backup_dir }}/{{ inventory_hostname }}_config.{{ lookup('pipe', 'date +%Y%m%d') }}.txt"
      delegate_to: localhost

    - name: Run command
      community.routeros.command:
        commands: /system/routerboard/print
      register: config

    - name: Save Output to File
      ansible.builtin.copy:
        content: "{{ config.stdout_lines }}"
        dest: "{{ backup_dir }}/{{ inventory_hostname }}_routerboard_print.{{ lookup('pipe', 'date +%Y%m%d') }}.txt"
      delegate_to: localhost
EXPECTED RESULTS

stdout and hostvars[inventory_hostname]['ansible_net_config'] should not contain duplicate data.

ACTUAL RESULTS

stdout and hostvars[inventory_hostname]['ansible_net_config'] contains duplicate data.

felixfontein commented 1 year ago

Reminds me a bit of #62. I unfortunately don't have ROS 7 (I'm still waiting for a ROS 7 device I ordered earlier this year, damn supply chain problems :D ).

You might want to use the API modules instead if possible.

markwell-ch commented 1 year ago

Thanks for your reply. Yes, the supply chain problems suck. In the meantime you could also run a VM with the Cloud Hosted Router image, see: https://help.mikrotik.com/docs/pages/viewpage.action?pageId=18350234 or even some unofficial docker image that runs RouterOS inside a container: https://hub.docker.com/r/evilfreelancer/docker-routeros

therfert commented 1 year ago

I noticed the same issue earlier and tried to locate where in the code the problem could be, but I haven't been successful so far.

I also noticed that the behavior isn't persistent. It's radnom - mostly it is duplicated, but occasionally it is not.

What's even worse is that the duplicated output is sometime merged into one line and if a more complex script is passed as the command with new line characters in it, the output of the command is randomly mixed together with the terminal prompt and the command itself and all is duplicated.

The only thing I made sure so far is, that the command is not executed twice. It's duplicate output of the one command execution.

felixfontein commented 1 year ago

I'm not sure anymore where exactly all the parsing magic happens, but I think it was here or in one of the callers: https://github.com/ansible-collections/ansible.netcommon/blob/main/plugins/connection/network_cli.py#L1176

Last time I've tried debugging something like that I used q (https://github.com/zestyping/q) to dump all kind of stuff in the potentially interesting functions to try to figure out what happens.

markwell-ch commented 1 year ago

For me the duplication does not happen when ssh_type = paramiko instead of ssh_type = sshlib is used. Important is adding this options to the username ansible_user: username+cte1000w1000h. Maybe smaller numbers will also work. See: https://help.mikrotik.com/docs/display/ROS/Command+Line+Interface#CommandLineInterface-LoginOptions

izzzhoga commented 1 year ago

You might want to use the API modules instead if possible.

Good time of day. Does RouterOS v7 work correctly with the api modules?

felixfontein commented 1 year ago

I have not heard anything to the contrary.

dmqgit commented 1 year ago

I have exactly the same issue (routeros 7.5 and 7.6). I can also reproduce that the duplicate output just happens with ssh_type = libssh setting, ssh_type = paramiko looks "normal". I hopefully can try the api implementation soon, at the moment my whole setup is based on ssh. Thanks for any help.

AleksNovak commented 1 year ago

Hi there, I think I can reproduce that behavior with community.routeros.command. I have that behavior with one or two command. Target: RouterOS 7.6

    - name: Configure wireguard interface
      community.routeros.command:
        commands:                           
          - /interface/list/print where name=ADM
          - /interface/list/member/print where interface={{ admin_network_config.wireguard_ifName }} and list=ADM
      register: tutu
      tags:                     
        - toto

    - debug:                                                                                                                         
        var: tutu                                                                 
      tags: toto

output:

❯ ansible-playbook -i inventory.ini -t toto mikrotik_wg.yml

PLAY [Install certificates] ************************************************************

TASK [Configure wireguard interface] ***************************************************
changed: [toto]

TASK [debug] ***************************************************************************
ok: [pvc.tiersbox.org] => {
    "tutu": {
        "changed": true,
        "failed": false,
        "stdout": [
            "Columns: NAME\n# NAME\n0 ADMColumns: NAME\n# NAME\n0 ADM",
            "/interface/list/member/print where interface=wg-admin and list=AD\n<nt where interface=wg-admin and list=ADM                                      \n<nt where interface=wg-admin and list=ADM\n<nt where interface=wg-admin and list=ADM\n\nColumns: LIST, INTERFACE\n# LIST  INTERFACE\n0 ADM   wg-admin/interface/list/member/print where interface=wg-admin and list=AD\n<nt where interface=wg-admin and list=ADM                                      \n<nt where interface=wg-admin and list=ADM\n<nt where interface=wg-admin and list=ADM\n\nColumns: LIST, INTERFACE\n# LIST  INTERFACE\n0 ADM   wg-admin"
        ],
        "stdout_lines": [
            [
                "Columns: NAME",
                "# NAME",
                "0 ADMColumns: NAME",
                "# NAME",
                "0 ADM"
            ],
            [
                "/interface/list/member/print where interface=wg-admin and list=AD",
                "<nt where interface=wg-admin and list=ADM                                      ",
                "<nt where interface=wg-admin and list=ADM",
                "<nt where interface=wg-admin and list=ADM",
                "",
                "Columns: LIST, INTERFACE",
                "# LIST  INTERFACE",
                "0 ADM   wg-admin/interface/list/member/print where interface=wg-admin and list=AD",
                "<nt where interface=wg-admin and list=ADM                                      ",
                "<nt where interface=wg-admin and list=ADM",
                "<nt where interface=wg-admin and list=ADM",
                "",
                "Columns: LIST, INTERFACE",
                "# LIST  INTERFACE",
                "0 ADM   wg-admin"
            ]
        ]
    }
}

PLAY RECAP *****************************************************************************
toto           : ok=2    changed=1    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0
o0-o commented 1 year ago

I unfortunately don't have ROS 7 (I'm still waiting for a ROS 7 device I ordered earlier this year, damn supply chain problems :D ).

You can run ROS 7 for free using Vagrant. The Vagrant box I'm using is cheretbe/routeros7. The doubling issue is occurring there so it should be possible to diagnose it without any additional hardware or cost. Happy to assist if you are unfamiliar with Vagrant.

Vagrantfile:

Vagrant.configure("2") do |config|

  config.vm.define "routeros7" do |routeros|
    routeros.vm.box = "cheretbe/routeros7"
    routeros.vm.hostname = "routeros7.test"
    routeros.vm.network :forwarded_port, guest: 22, host: 2827, id: "ssh"
  end

end
ok: [routeros7.hq.example.com] => {
    "changed": false,
    "invocation": {
        "module_args": {
            "commands": [
                "/put [/system clock get time]"
            ],
            "interval": 1,
            "match": "all",
            "retries": 10,
            "wait_for": null
        }
    },
    "stdout": [
        "00:54:2600:54:26"
    ],
    "stdout_lines": [
        [
            "00:54:2600:54:26"
        ]
    ]
}
kar0v commented 1 year ago

On RouterOS 7.6 I can reproduce this with the community.routeros.command module as well:

---
- name: RouterOS test with network_cli connection
  hosts: routers[0]
  gather_facts: false
  tasks:

  - name: Gather statistics
    community.routeros.command:
      commands:
        - "/interface print stats from=wlan1,wlan2"
    register: print_stats

  - name: Show system resources
    debug:
      var: print_stats

  - name: Create log directory if it does not exist
    file:
      path: ./logs
      state: directory
      mode: '0755'

  - local_action: lineinfile line={{ print_stats | to_nice_yaml}} create=true path="./logs/{{ inventory_hostname }}.log"
TASK [Show system resources] ****************************************************************************
ok: [192.168.179.231] => 
  msg:
    ansible_facts:
      discovered_interpreter_python: /usr/bin/python3
    changed: true
    failed: false
    stdout:
    - |-
      Flags: D - dynamic; X - disabled, R - running; S - slave; P - passthrough
       #      NAME              RX-BYTE           TX-BYTE     RX-PACKET     TX-PACKET
       0  RS  ;;; managed by CAPsMAN
              ;;; channel: 2462/20-eC/gn(14dBm), SSID: X_24, local forwarding
              wlan1          46 654 072    12 532 162 795       198 471    24 568 945
       1  RS  ;;; managed by CAPsMAN
              ;;; channel: 5180/20-Ceee/ac(11dBm), SSID: X, local forwarding
              wlan2       1 093 050 444    16 045 504 342     2 364 491    28 825 963Flags: D - dynamic; X - disabled, R - running; S - slave; P - passthrough
       #      NAME              RX-BYTE           TX-BYTE     RX-PACKET     TX-PACKET
       0  RS  ;;; managed by CAPsMAN
              ;;; channel: 2462/20-eC/gn(14dBm), SSID: X_24, local forwarding
              wlan1          46 654 072    12 532 162 795       198 471    24 568 945
       1  RS  ;;; managed by CAPsMAN
              ;;; channel: 5180/20-Ceee/ac(11dBm), SSID: X, local forwarding
              wlan2       1 093 050 444    16 045 504 342     2 364 491    28 825 963
    stdout_lines:
    - - 'Flags: D - dynamic; X - disabled, R - running; S - slave; P - passthrough '
      - ' #      NAME              RX-BYTE           TX-BYTE     RX-PACKET     TX-PACKET'
      - ' 0  RS  ;;; managed by CAPsMAN'
      - '        ;;; channel: 2462/20-eC/gn(14dBm), SSID: X_24, local forwarding'
      - '        wlan1          46 654 072    12 532 162 795       198 471    24 568 945'
      - ' 1  RS  ;;; managed by CAPsMAN'
      - '        ;;; channel: 5180/20-Ceee/ac(11dBm), SSID: X, local forwarding'
      - '        wlan2       1 093 050 444    16 045 504 342     2 364 491    28 825 963Flags: D - dynamic; X - disabled, R - running; S - slave; P - passthrough '
      - ' #      NAME              RX-BYTE           TX-BYTE     RX-PACKET     TX-PACKET'
      - ' 0  RS  ;;; managed by CAPsMAN'
      - '        ;;; channel: 2462/20-eC/gn(14dBm), SSID: X_24, local forwarding'
      - '        wlan1          46 654 072    12 532 162 795       198 471    24 568 945'
      - ' 1  RS  ;;; managed by CAPsMAN'
      - '        ;;; channel: 5180/20-Ceee/ac(11dBm), SSID: X, local forwarding'
      - '        wlan2       1 093 050 444    16 045 504 342     2 364 491    28 825 963'
o0-o commented 1 year ago

I am pretty confident that it is an upstream issue. It definitely relates to libssh although I suppose the bug could be in network_cli or elsewhere. I don't have a good understanding of how network modules like this work. It will likely be up to one of the maintainers here to investigate and file the appropriate issue upstream.

I have switched to using paramiko and have had no problems. However, my use of Ansible with Mikrotik devices is minimal so your mileage may vary.

felixfontein commented 1 year ago

I've finally got my ROS7 device, I'll see whether I can debug this a bit next weekend or so...

felixfontein commented 1 year ago

I think I managed to figure out what goes wrong; it seems to be a bug in the ansible.netcommon.network_cli connection plugin: https://github.com/ansible-collections/ansible.netcommon/pull/517

When applying that change the duplicate output no longer happens for me.

Qalthos commented 1 year ago

I took a look through everything, and while the fix does work, I don't think https://github.com/ansible-collections/ansible.netcommon/pull/517 is quite the right answer. Can someone test https://github.com/ansible-collections/ansible.netcommon/pull/521 and let me know if that works as well?

felixfontein commented 1 year ago

@Qalthos I tested https://github.com/ansible-collections/ansible.netcommon/pull/521, it also fixes the problem for me!

Qalthos commented 1 year ago

Cool. Thanks for looking into it. I've closed 517 in favor of 521 and will merge 521 when I get a chance.

felixfontein commented 1 year ago

@Qalthos cool, thanks a lot! Will it be possible to do a new 4.x.y bugfix release once that is merged?

Qalthos commented 1 year ago

We won't be able to do that, unfortunately.

izzzhoga commented 1 year ago

I have exactly the same problem with duplicating data, only on RouterOS 7.8

Haven't fixed the bug yet?

izzzhoga commented 1 year ago

I dare to assume that because of this problem, the collection of facts on RouterOS 7.2-7.8 is broken, the example below.

- name: Collect all facts from the device
  community.routeros.facts:
    gather_subset: all
  register: all_facts

The error I'm getting:

fatal: [mikrotik]: FAILED! => {
    "changed": false,
    "module_stderr": "Traceback (most recent call last):\n  File \"/home/user/.ansible/tmp/ansible-local-3253737c37dt00n/ansible-tmp-1678356214.2582262-3254449-220312769211413/AnsiballZ_facts.py\", line 107, in <module>\n    _ansiballz_main()\n  File \"/home/user/.ansible/tmp/ansible-local-3253737c37dt00n/ansible-tmp-1678356214.2582262-3254449-220312769211413/AnsiballZ_facts.py\", line 99, in _ansiballz_main\n    invoke_module(zipped_mod, temp_path, ANSIBALLZ_PARAMS)\n  File \"/home/user/.ansible/tmp/ansible-local-3253737c37dt00n/ansible-tmp-1678356214.2582262-3254449-220312769211413/AnsiballZ_facts.py\", line 47, in invoke_module\n    runpy.run_module(mod_name='ansible_collections.community.routeros.plugins.modules.facts', init_globals=dict(_module_fqn='ansible_collections.community.routeros.plugins.modules.facts', _modlib_path=modlib_path),\n  File \"/usr/lib/python3.9/runpy.py\", line 210, in run_module\n    return _run_module_code(code, init_globals, run_name, mod_spec)\n  File \"/usr/lib/python3.9/runpy.py\", line 97, in _run_module_code\n    _run_code(code, mod_globals, init_globals,\n  File \"/usr/lib/python3.9/runpy.py\", line 87, in _run_code\n    exec(code, run_globals)\n  File \"/tmp/ansible_community.routeros.facts_payload_n6n95orf/ansible_community.routeros.facts_payload.zip/ansible_collections/community/routeros/plugins/modules/facts.py\", line 663, in <module>\n  File \"/tmp/ansible_community.routeros.facts_payload_n6n95orf/ansible_community.routeros.facts_payload.zip/ansible_collections/community/routeros/plugins/modules/facts.py\", line 651, in main\n  File \"/tmp/ansible_community.routeros.facts_payload_n6n95orf/ansible_community.routeros.facts_payload.zip/ansible_collections/community/routeros/plugins/modules/facts.py\", line 455, in populate\n  File \"/tmp/ansible_community.routeros.facts_payload_n6n95orf/ansible_community.routeros.facts_payload.zip/ansible_collections/community/routeros/plugins/modules/facts.py\", line 532, in parse_route\n  File \"/tmp/ansible_community.routeros.facts_payload_n6n95orf/ansible_community.routeros.facts_payload.zip/ansible_collections/community/routeros/plugins/modules/facts.py\", line 474, in preprocess\nIndexError: list index out of range\n",
    "module_stdout": "",
    "msg": "MODULE FAILURE\nSee stdout/stderr for the exact error",
    "rc": 1
}

Is it necessary to open a new bug report for this?

felixfontein commented 1 year ago

@izzzhoga while https://github.com/ansible-collections/ansible.netcommon/pull/521 has been merged, no new release of ansible.netcommon has been made. Once a new release is out and you installed that one, the problem should be gone. If you want to have it fixed earlier, you need to patch your local ansible.netcommon install accordingly.

The problem you have with facts gathering could also be caused by this. If the problem still happens with that fix applied, opening a new issue would help (then it's a problem in this collection).

izzzhoga commented 1 year ago

@felixfontein Please tell me when the new version of the module is planned to be released? :)

felixfontein commented 1 year ago

@izzzhoga I do not know when the ansible.netcommon collection will have another release.

felixfontein commented 1 year ago

ansible.netcommon 5.1.0 has been released (https://github.com/ansible-collections/ansible.netcommon/tree/5.1.0), it should contain that fix. (The changelog does mention it: https://github.com/ansible-collections/ansible.netcommon/blob/main/CHANGELOG.rst#v510)

felixfontein commented 1 year ago

(If you are using Ansible (and not ansible-core with your own set of collections), you will have to wait for Ansible 8.0.0 though since it is a newer major version than the version included in 7.x.y.)