[API] The comment breaks the work of the module

akimrx commented 3 years ago

Of course, it was very funny to look for the cause of the error for so much time, but after looking at the source code, everything became clear.

Guys, we need to correct the division of the string into parameters, because the comment does not always consist of one word.

The function that returns the error

A function that incorrectly separates arguments and breaks down into comment="some multiple string"

How to reproduce it?
Try to add a simple rule of the form:

  community.routeros.api:
    hostname: "{{ mikrotik_host }}"
    username: "{{ mikrotik_user }}"
    password: "{{ mikrotik_password }}"
    path: ip firewall filter
    add: action=accept chain=forward comment='accept in ipsec policy' ipsec-policy=in,ipsec

The actual result will be an error when adding the rule, but by the way, this module does not understand its own errors at all and always returns OK, which is also funny and requires correction.

With the debug, I was able to see the true result of the execution and the essence of the "error":

    "msg": [
        "missing '=' after 'in'"

A few details:

ok: [localhost] => (item={'comment': 'accept in ipsec policy', 'id': 6, 'action': 'accept', 'chain': 'forward', 'options': 'ipsec-policy=in,ipsec'}) => {
    "ansible_loop_var": "item",
    "changed": false,
    "invocation": {
        "module_args": {
            "add": "action=accept chain=forward comment='accept in ipsec policy' ipsec-policy=in,ipsec",  <<< here
            "cmd": null,
            "hostname": "10.0.0.1",
            "password": "VALUE_SPECIFIED_IN_NO_LOG_PARAMETER",
            "path": "ip firewall filter",
            "port": null,
            "query": null,
            "remove": null,
            "ssl": false,
            "update": null,
            "username": "ansible"
        }
    },
    "item": {
        "action": "accept",
        "chain": "forward",
        "comment": "accept in ipsec policy",
        "ipsec-policy: "in,ipsec"
    },
    "msg": [
        "missing '=' after 'in'"  <<< here

felixfontein commented 3 years ago

The current splitting is indeed not a great idea :) Fortunately we are in the 2.0.0 preparation phase, so we can add another breaking change (fixing this would probably be one).

I guess we need to figure out / define which syntax we exactly want to accept. (Maybe using shlex.split() would be a solution?)

akimrx commented 3 years ago

@felixfontein

shlex is less productive than native re. This is proved by a lot of tests, you can search for them on the Internet.

The current pattern for regular expressions should cover what can be used in this module, while maintaining performance at the proper level.

And besides, shlex is an additional dependency, which is actually a wrapper)

felixfontein commented 3 years ago

Not sure what you mean with "additional dependency" and "less productive", but shlex is part of the Python standard library since Python 2.3.

Whether it's the right choice here I'm not so sure, but I'm convinced that your solution isn't perfect either - it doesn't support escaping for example.

That's why I said "I guess we need to figure out / define which syntax we exactly want to accept".

Does anyone know how RouterOS CLI parses inputs?

akimrx commented 3 years ago

About the additional dependence - I was wrong. The module really s part of the standard library, I'm sorry.

By performance, I meant the time for which the split will work.

Benchmark:

shlex.split:               10,000 loops: 11.51 s, per loop: 1.151 ms
manual algorithm split:    10,000 loops: 0.951 s, per loop: 95.11 µs
split with re.findall:     10,000 loops: 0.482 s, per loop: 48.16 µs

If speed is not important for this module, then we can sacrifice it, but I dare to assume that with a large number of parameters multiplied by the number of routers, this can significantly slow down the execution of the playbook in the future by minutes.

Yes, indeed, the shlex module has its advantages in comparison with the proposed solution, such as escaping, etc. It is worth thinking about how to be better and what to sacrifice.

Does anyone know how RouterOS CLI parses inputs?

I didn't fully understand the question, but RouterOS supports escaping in its CLI:

[akimrx@mikrotik-gw] > /ip firewall filter add action=drop chain=forward connection-nat-state=!dstnat connection-state=new in-interface-list=WAN comment="drop all from \"WAN\" not DSTNATed"
[akimrx@mikrotik-gw] > ip firewall filter print
Flags: X - disabled, I - invalid, D - dynamic 
 0  D ;;; special dummy rule to show fasttrack counters
      chain=forward action=passthrough 

...

12    ;;; drop all from "WAN" not DSTNATed
      chain=forward action=drop connection-state=new connection-nat-state=!dstnat in-interface-list=WAN

сс @felixfontein

akimrx commented 3 years ago

I updated the PR, removing the previously proposed option with re, replacing the split with shlex.split

CASE_1 = "action=drop   chain=forward  comment='hello my friends'"
CASE_2 = "action=dst-nat chain=dstnat comment='just for \"test\"'"

---
['action=drop', 'chain=forward', 'comment=hello my friends']
result: {'action': 'drop', 'chain': 'forward', 'comment': 'hello my friends'}
['action=dstnat', 'chain=dst-nat', 'comment=just for "test"']
result: {'action': 'dstnat', 'chain': 'dst-nat', 'comment': 'just for "test"'}

This also solves the problem if more than one space is specified. Previously, this was really a problem.

felixfontein commented 3 years ago

By performance, I meant the time for which the split will work.

Benchmark:

shlex.split:               10,000 loops: 11.51 s, per loop: 1.151 ms
manual algorithm split:    10,000 loops: 0.951 s, per loop: 95.11 µs
split with re.findall:     10,000 loops: 0.482 s, per loop: 48.16 µs

That's not really relevant here, since we're not using shlex.split in a tight loop. The time needed to communicate with the device will very likely exceed the time needed for splitting by at least one magnitude.

If speed is not important for this module, then we can sacrifice it, but I dare to assume that with a large number of parameters multiplied by the number of routers, this can significantly slow down the execution of the playbook in the future by minutes.

I really doubt this. Just the time needed to execute the module, set up the API (done once per module invocation), and actually talk to the device should contribute most of the total time needed.

Does anyone know how RouterOS CLI parses inputs?

I didn't fully understand the question, but RouterOS supports escaping in its CLI:

We should use a splitting algorithm which is as close as possible to what RouterOS does in its CLI. That's why my question is: which splitting algorithm is RouterOS using in its CLI?

felixfontein commented 3 years ago

The closest I could find to a description is https://wiki.mikrotik.com/wiki/Manual:Scripting#Constant_Escape_Sequences, which isn't really for the CLI (but for scripts). I could find no information in https://help.mikrotik.com/docs/display/ROS/Command+Line+Interface.

felixfontein commented 3 years ago

CC @heuels @NikolayDachev

NikolayDachev commented 3 years ago

Actually we should follow requirements for

https://librouteros.readthedocs.io/en/latest/ https://github.com/luqasz/librouteros

felixfontein commented 3 years ago

@NikolayDachev actually if we should do that, we have to provide the dict directly, and not a string that is parsed.

For parsing the string, I've played around with the CLI a bit. It looks like only " is allowed for quoting, and only the escape sequences explicitly mentioned here: https://wiki.mikrotik.com/wiki/Manual:Scripting#Constant_Escape_Sequences So shlex is not really fitting well here, since it does too many transformations that are not valid in the CLI.

felixfontein commented 3 years ago

I've played around with this a bit, I think the following code splits the way the RouterOS console splits:

def to_bytes(x):
    # this is provided in a better form by Ansible, only as a hack so it's easy to run this code stand-alone
    return x.encode('utf-8')

ESCAPE_SEQUENCES = {
    b'"': b'"',
    b'\\': b'\\',
    b'?': b'?',
    b'$': b'$',
    b'_': b'_',
    b'a': b'\a',
    b'b': b'\b',
    b'f': b'\xFF',
    b'n': b'\n',
    b'r': b'\r',
    b't': b'\t',
    b'v': b'\v',
}

ESCAPE_DIGITS = b'0123456789ABCDEF'

def split_routeros(line):
    line = to_bytes(line)
    result = []
    current = []
    index = 0
    length = len(line)
    # States:
    #   0 = outside param
    #   1 = param before '='
    #   2 = param after '=' without quote
    #   3 = param after '=' with quote
    state = 0
    while index < length:
        ch = line[index:index + 1]
        index += 1
        if state == 0 and ch == b' ':
            pass
        elif state in (1, 2) and ch == b' ':
            state = 0
            result.append(b''.join(current))
            current = []
        elif ch == b'=' and state == 1:
            state = 2
            current.append(ch)
            if index + 1 < length and line[index:index + 1] == b'"':
                state = 3
                index += 1
        elif ch == b'"':
            if state == 3:
                state = 0
                result.append(b''.join(current))
                current = []
                if index + 1 < length and line[index:index + 1] != b' ':
                    raise Exception('Ending \'"\' must be followed by space or end of string')
            else:
                raise Exception('\'"\' must follow \'=\'')
        elif ch == b'\\':
            if index + 1 == length:
                raise Exception('\'\\\' must not be at the end of the line')
            ch = line[index:index + 1]
            index += 1
            if ch in ESCAPE_SEQUENCES:
                current.append(ch)
            else:
                d1 = ESCAPE_DIGITS.find(ch)
                if d1 < 0:
                    raise Exception('Invalid escape sequence \'\\{0}\''.format(ch))
                if index + 1 == length:
                    raise Exception('Hex escape sequence cut off at end of line')
                ch2 = line[index:index + 1]
                d2 = ESCAPE_DIGITS.find(ch2)
                index += 1
                if d2 < 0:
                    raise Exception('Invalid hex escape sequence \'\\{0}{1}\''.format(ch, ch2))
                result.append(chr(d1 * 16 + d2))
        else:
            current.append(ch)
            if state == 0:
                state = 1
    if state in (1, 2):
        if current:
            result.append(b''.join(current))
    elif state == 3:
        raise Exception('Unexpected end of string during escaped parameter')
    return result

print(split_routeros(r'  '))
print(split_routeros(r'a b c'))
print(split_routeros(r'a=b c d=e'))
print(split_routeros(r'a="b f" c d=e'))
print(split_routeros(r'a="b\"f" c\FF d=\"e'))
print(split_routeros(r'a="b\"f" c\FF d="e'))

That results in:

[]
['a', 'b', 'c']
['a=b', 'c', 'd=e']
['a=b f', 'c', 'd=e']
['a=b"f', '\xff', 'c', 'd="e']
Traceback (most recent call last):
  File "tt.py", line 93, in <module>
    print(split_routeros(r'a="b\"f" c\FF d="e'))
  File "tt.py", line 84, in split_routeros
    raise Exception('Unexpected end of string during escaped parameter')
Exception: Unexpected end of string during escaped parameter

NikolayDachev commented 3 years ago

@felixfontein will check this days

ansible-collections / community.routeros

[API] The comment breaks the work of the module #44