Allow recursive keep_keys

ansible-collections / community.general

Ansible Community General Collection

https://galaxy.ansible.com/ui/repo/published/community/general/

GNU General Public License v3.0

816 stars 1.5k forks source link

Allow recursive keep_keys #8571

Closed MPStudyly closed 2 months ago

MPStudyly commented 3 months ago

SUMMARY

This PR adds a parameter and functionality to the keep_keys filter introduced by https://github.com/ansible-collections/community.general/issues/8438 to allow recursive searches over nested lists of dicts. This is a draft for now, as it for sure needs some polishing. Also I was unsure if the checks performed should be moved to the according plugins_utils file. remove_keys and replace_keys are also not touched by this (yet), as I'd like to have some input on this first.

Due to no issue existing yet, I was unaware on how to name/number a changelog fragment appropriately. If I can simply use the ref number of the PR, I'll create one with that. If I have to open a feature request ticket, I'll do that.

ISSUE TYPE

Feature Pull Request

COMPONENT NAME

keep_keys

ADDITIONAL INFORMATION

- name: Debug before change
  vars:
    router_static_routes:
    - address_families:
        - afi: "ipv4"
          routes:
            - description: "WANv4"
              dest: "0.0.0.0/0" # default route
              next_hops:
                - forward_router_address: "192.168.0.1"
        - afi: "ipv6"
          routes:
            - description: "WANv6"
              dest: "::/0" # default route
              next_hops:
                - forward_router_address: "2001:db8::1"
  run_once: true
  ansible.builtin.debug:
    msg: {{ groups.all | map('extract', hostvars) | community.general.keep_keys(target=['dest', 'forward_router_address']) }}

gives

TASK [Debug static routes non recursive] ****************************************
ok: [localhost] => 
  msg: []

while

- name: Debug adter change
  vars:
    router_static_routes:
    - address_families:
        - afi: "ipv4"
          routes:
            - description: "WANv4"
              dest: "0.0.0.0/0" # default route
              next_hops:
                - forward_router_address: "192.168.0.1"
        - afi: "ipv6"
          routes:
            - description: "WANv6"
              dest: "::/0" # default route
              next_hops:
                - forward_router_address: "2001:db8::1"
  run_once: true
  ansible.builtin.debug:
    msg: {{ groups.all | map('extract', hostvars) | community.general.keep_keys(target=['dest', 'forward_router_address'], recurse_lists=true)) }}

gives

TASK [Debug static routes recursive] ************************************
ok: [localhost] => 
  msg:
  - router_static_routes:
    - address_families:
      - routes:
        - dest: 0.0.0.0/0
          next_hops:
          - forward_router_address: 192.168.0.1
      - routes:
        - dest: ::/0
          next_hops:
          - forward_router_address: 2001:db8::1

ansibullbot commented 3 months ago

The test ansible-test sanity --test pep8 [explain] failed with 4 errors:

plugins/filter/keep_keys.py:135:1: W293: blank line contains whitespace
plugins/filter/keep_keys.py:148:45: E126: continuation line over-indented for hanging indent
plugins/filter/keep_keys.py:152:38: E124: closing bracket does not match visual indentation
plugins/filter/keep_keys.py:157:1: W293: blank line contains whitespace

The test ansible-test sanity --test pep8 [explain] failed with 4 errors:

plugins/filter/keep_keys.py:135:1: W293: blank line contains whitespace
plugins/filter/keep_keys.py:148:45: E126: continuation line over-indented for hanging indent
plugins/filter/keep_keys.py:152:38: E124: closing bracket does not match visual indentation
plugins/filter/keep_keys.py:157:1: W293: blank line contains whitespace

The test ansible-test sanity --test pep8 [explain] failed with 4 errors:

plugins/filter/keep_keys.py:135:1: W293: blank line contains whitespace
plugins/filter/keep_keys.py:148:45: E126: continuation line over-indented for hanging indent
plugins/filter/keep_keys.py:152:38: E124: closing bracket does not match visual indentation
plugins/filter/keep_keys.py:157:1: W293: blank line contains whitespace

The test ansible-test sanity --test pep8 [explain] failed with 4 errors:

plugins/filter/keep_keys.py:135:1: W293: blank line contains whitespace
plugins/filter/keep_keys.py:148:45: E126: continuation line over-indented for hanging indent
plugins/filter/keep_keys.py:152:38: E124: closing bracket does not match visual indentation
plugins/filter/keep_keys.py:157:1: W293: blank line contains whitespace

The test ansible-test sanity --test pylint [explain] failed with 2 errors:

plugins/filter/keep_keys.py:135:0: trailing-whitespace: Trailing whitespace
plugins/filter/keep_keys.py:157:0: trailing-whitespace: Trailing whitespace

The test ansible-test sanity --test pylint [explain] failed with 2 errors:

plugins/filter/keep_keys.py:135:0: trailing-whitespace: Trailing whitespace
plugins/filter/keep_keys.py:157:0: trailing-whitespace: Trailing whitespace

The test ansible-test sanity --test pylint [explain] failed with 2 errors:

plugins/filter/keep_keys.py:135:0: trailing-whitespace: Trailing whitespace
plugins/filter/keep_keys.py:157:0: trailing-whitespace: Trailing whitespace

The test ansible-test sanity --test pylint [explain] failed with 2 errors:

plugins/filter/keep_keys.py:135:0: trailing-whitespace: Trailing whitespace
plugins/filter/keep_keys.py:157:0: trailing-whitespace: Trailing whitespace

click here for bot help

ansibullbot commented 3 months ago

cc @vbotka click here for bot help

vbotka commented 3 months ago

Get the lists you want to transform and map the filter

    t: [key_in_list, other]
    rl: "{{ testvar |
            map(attribute='list_of_dicts') |
            map('community.general.keep_keys', target=t) }}"

gives

  rl:
    - - {key_in_list: one, other: one other}
      - {key_in_list: two, other: two other}
    - - {key_in_list: three, other: three other}
      - {key_in_list: four, other: four other}

Use the filter dict_kv if you want a list of dictionaries

  rd: "{{ rl | map('community.general.dict_kv', 'list_of_dicts' ) }}"

gives what you want

  rd:
    - list_of_dicts:
      - {key_in_list: one, other: one other}
      - {key_in_list: two, other: two other}
    - list_of_dicts:
      - {key_in_list: three, other: three other}
      - {key_in_list: four, other: four other}

This use case is not consistent and can be solved with current tools. I'd propose to reject this PR.

MPStudyly commented 3 months ago

Get the lists you want to transform and map the filter

    t: [key_in_list, other]
    rl: "{{ testvar |
            map(attribute='list_of_dicts') |
            map('community.general.keep_keys', target=t) }}"

gives

  rl:
    - - {key_in_list: one, other: one other}
      - {key_in_list: two, other: two other}
    - - {key_in_list: three, other: three other}
      - {key_in_list: four, other: four other}

You are right, it does. But as another lever gets added, this won't work anymore. I should have made a more accurate example of the situation I'm facing to show how this PR makes sense.

Use the filter dict_kv if you want a list of dictionaries

  rd: "{{ rl | map('community.general.dict_kv', 'list_of_dicts' ) }}"

gives what you want

  rd:
    - list_of_dicts:
      - {key_in_list: one, other: one other}
      - {key_in_list: two, other: two other}
    - list_of_dicts:
      - {key_in_list: three, other: three other}
      - {key_in_list: four, other: four other}

Again, in this case, it does. It still requires me to "hardcode" a new dict key name to restore the original layout. Also this breaks, when at least one other level is added.

This use case is not consistent and can be solved with current tools. I'd propose to reject this PR.

As explained, there is no meaningful way yet to apply this filter to deeply nested dicts with lists of dicts. Mapping the way through a dict is a possible way around (map(attribute=...)), but heavily limits flexibility, as the whole dict structure has to be known beforehand. I will update my example code to better reflect the actual problem I try to solve.

EDIT: I've updated the example @vbotka, please have another look.

vbotka commented 3 months ago

There are recursive filters ansible.utils (keep_keys, remove_keys, and replace_keys) . The design of the community.general filters is simple, compact, and clear. To keep it that way, the filters are intentionally not recursive. The complexity of your PR would increase the maintenance effort not worth the use case benefit.

Instead, the filter ansible.utils.keep_keys is recursive. Quoting:

This plugin keeps only specified keys from provided data recursively

There is a bug:

keep_keys doesn't exclude a key if the value is list. https://github.com/ansible-collections/ansible.utils/issues/353

Instead of fixing the filter ansible.utils.keep_keys complexity I decided to write the simple non-recursive filter community.general.keep_keys. I'm sure you understand that merging your PR would be counterproductive. If you need the recursion I suggest you fix the ansible.utils filters or write your recursive filters, e.g. keep_keys_recursive. Please feel free to use the code.

To avoid a monolithic design the code must be properly structured. If you decide to write your filter reuse the code from plugins/filter/keep_keys.py

    if matching_parameter == 'equal':
        def keep_key(key):
            return key in tt
    elif matching_parameter == 'starts_with':
        def keep_key(key):
            return key.startswith(tt)
    elif matching_parameter == 'ends_with':
        def keep_key(key):
            return key.endswith(tt)
    elif matching_parameter == 'regex':
        def keep_key(key):
            return tt.match(key) is not None

    return [dict((k, v) for k, v in d.items() if keep_key(k)) for d in data]

and create function _keep_keys in plugins/plugin_utils/keys_filter.py

def _keep_keys(data,tt, matching_parameter):
    ...

Then, import the functions in your code

from ansible_collections.community.general.plugins.plugin_utils.keys_filter import (
    _keys_filter_params,
    _keys_filter_target_str,
    _keep_keys)

MPStudyly commented 2 months ago

Sorry for my late response.

There are recursive filters ansible.utils (_keep_keys, remove_keys, and replacekeys) . The design of the community.general filters is simple, compact, and clear. To keep it that way, the filters are intentionally not recursive. The complexity of your PR would increase the maintenance effort not worth the use case benefit.

Okay, while I don't like the naming collision of these two filters that basically do different stuff (I know, the FQDN is different, but still), I accept that keeping complexity low is a valid reason to not merge this.

Instead, the filter _ansible.utils.keepkeys is recursive. Quoting:

This plugin keeps only specified keys from provided data recursively

There is a bug:
* keep_keys doesn't exclude a key if the value is list.
  [keep_keys doesn't exclude a key if the value is list. ansible.utils#353](https://github.com/ansible-collections/ansible.utils/issues/353)

That's the exact reason we resorted to the community.general variant. We first misunderstood this to be recursive as well, which it wasn't. As we had a look at both filters, we found this one way easier to adapt than fixing the other one.

Instead of fixing the filter _ansible.utils.keepkeys complexity I decided to write the simple non-recursive filter _community.general.keepkeys. I'm sure you understand that merging your PR would be counterproductive. If you need the recursion I suggest you fix the ansible.utils filters or write your recursive filters, e.g. _keep_keysrecursive. Please feel free to use the code.

I'll see if I can have another look at the ansible.utils filter. We found that code unnecessarily hard to grasp. For now we're working with a local copy of this filter only, as that suffices our needs.

To avoid a monolithic design the code must be properly structured. If you decide to write your filter reuse the code from _plugins/filter/keepkeys.py

    if matching_parameter == 'equal':
        def keep_key(key):
            return key in tt
    elif matching_parameter == 'starts_with':
        def keep_key(key):
            return key.startswith(tt)
    elif matching_parameter == 'ends_with':
        def keep_key(key):
            return key.endswith(tt)
    elif matching_parameter == 'regex':
        def keep_key(key):
            return tt.match(key) is not None

    return [dict((k, v) for k, v in d.items() if keep_key(k)) for d in data]

and create function __keepkeys in _plugins/plugin_utils/keysfilter.py

def _keep_keys(data,tt, matching_parameter):
    ...

Then, import the functions in your code

from ansible_collections.community.general.plugins.plugin_utils.keys_filter import (
    _keys_filter_params,
    _keys_filter_target_str,
    _keep_keys)

Thx for the input! This is why the whole PR is tagged as a draft, this code wasn't meant to be merged as is anyway, as there is quite some stuff missing (e.g. remove_keys, replace_keys adaptions, tests, etc.) :)

How should we proceed with this? Should we close the PR and move on at ansible.utils or is this still of relevance for the community.general collection?

EDIT: I just noticed your issue/PR which referenced this. As far as the examples go, it looks like it is doing the exact same thing, but I don't really get the naming there.

vbotka commented 2 months ago

How should we proceed with this? Should we close the PR and move on at ansible.utils or is this still of relevance for the community.general collection?

Yes. I think this PR should be closed. I think the justification to solve this use case by this filter is weak. IMO, if the data are reported by a device you might want to parse it first. There is a new universe in Parsing semi-structured text with Ansible. If this is configuration data you might find a better structure that fits your use case. A custom filter might be a solution too. Generally, the below structure is wrong

    - ld:
      - {key: 1, other: other1}
      - {key: 2, other: other2}
    - ld:
      - {key: 3, other: other3}
      - {key: 4, other: other4}

This can be reduced to what you ultimately need (optionally, including other dictionaries)

  ld:
    1: other1
    2: other2
    3: other3
    4: other4

, or if you need the batches

  ld:
    - 1: other1
      2: other2
    - 3: other3
      4: other4

EDIT: I just noticed your issue/PR which referenced this. As far as the examples go, it looks like it is doing the exact same thing, but I don't really get the naming there.

Yes. The filter community.general.reveal_ansible_type and the test community.general.ansible_type should help to solve use cases like yours. For more details see the plugin examples or the integration test. There are also my public notes.

MPStudyly commented 2 months ago

How should we proceed with this? Should we close the PR and move on at ansible.utils or is this still of relevance for the community.general collection?

Yes. I think this PR should be closed. I think the justification to solve this use case by this filter is weak. IMO, if the data are reported by a device you might want to parse it first. There is a new universe in Parsing semi-structured text with Ansible. If this is configuration data you might find a better structure that fits your use case. A custom filter might be a solution too. Generally, the below structure is wrong
    - ld:
      - {key: 1, other: other1}
      - {key: 2, other: other2}
    - ld:
      - {key: 3, other: other3}
      - {key: 4, other: other4}

The structure is actually required by the collection we are using to configure routers. Though, we came to a similar conclusion that treating this structure as source of truth is not optimal. Some restructuring already cleared most of the issues we had in this regard.

Principally I don't see why such structures should be "wrong" per se, as nested lists of dicts are way easier to traverse in a generic way than nested dicts. I know there is dict2items, but that often unnecessarily bloats templates, at least in our use cases.

EDIT: I just noticed your issue/PR which referenced this. As far as the examples go, it looks like it is doing the exact same thing, but I don't really get the naming there.

Yes. The filter _community.general.reveal_ansibletype and the test _community.general.ansibletype should help to solve use cases like yours. For more details see the plugin examples or the integration test. There are also my public notes.

I'm not 100% convinced yet that it would really help us/our use case (if we didn't work around it yet), as it is still required to generate a list of dicts from a list of lists using a statically defined key. I appreciate your work though!

vbotka commented 2 months ago

    - ld:
      - {key: 1, other: other1}
      - {key: 2, other: other2}
    - ld:
      - {key: 3, other: other3}
      - {key: 4, other: other4}
... I don't see why such structures should be "wrong" per se, as nested lists of dicts are way easier to traverse in a generic way than nested dicts. I know there is dict2items, but that often unnecessarily bloats templates

The dictionary's lower complexity O(1) should be chosen over list O(n) when possible. See TimeComplexity. The benefit of lower complexity exceeds the cost of Ansible filter dict2items, or Python method .items(). The tools (filters, tests, methods, ...) are designed to cover rational choices.

MPStudyly commented 2 months ago

    - ld:
      - {key: 1, other: other1}
      - {key: 2, other: other2}
    - ld:
      - {key: 3, other: other3}
      - {key: 4, other: other4}
... I don't see why such structures should be "wrong" per se, as nested lists of dicts are way easier to traverse in a generic way than nested dicts. I know there is dict2items, but that often unnecessarily bloats templates
The dictionary's lower complexity O(1) should be chosen over list O(n) when possible. See TimeComplexity. The benefit of lower complexity exceeds the cost of Ansible filter dict2items, or Python method .items(). The tools (filters, tests, methods, ...) are designed to cover rational choices.

Thank you for explaining this, I was completely ignorant about these types of complexity in this context :sweat_smile: