ansible / proposals

Repository for sharing and tracking progress on enhancement proposals for Ansible.
Creative Commons Zero v1.0 Universal
93 stars 19 forks source link

Support for squashing and pure entries #71

Open dagwieers opened 6 years ago

dagwieers commented 6 years ago

Proposal: Redesign Aggregates

Author: Dag Wieers <@dagwieers> IRC: dag

Date: 2017/09/15

Motivation

During AnsibleFest SF 2017 some issues came to light regarding the Aggregates functionality that was designed as part of the ansible-network roadmap.

This proposal does not question the merit of this needed functionality, but some design decisions require improvements before this is ready for wider consumption.

Participants wondered whether this functionality would have been acceptable when coming from the community and being reviewed by Ansible core developers. The conclusion was: probably not.

Problems

What problems exist that this proposal will solve?

Solution proposal

Documentation (optional)

Hold off with documenting and promoting this as part of v2.4, so we don't have to support this going forward in light of a user-breaking redesign.

Anything else?

The above feedback was provided during the AnsibleFest SF 2017 Network Working Group and was largely unanimous by participants.

This proposal was made to collect feedback from other participants in order to design a better user interface for this functionality before it becomes a wider standard.

jmcgill298 commented 6 years ago

+1

bcoca commented 6 years ago

So from the implementation there seems to be mixing 2 things that already exist:

My thoughts:

Currently I'm leaning towards having loop_control: optimize: false|true to actually 'pretemplate all the loop invocations' and pass this for 'batch execution' on the target , vs having to have each module code both options. This still won't be the most optimal way in some cases (i.e package managers) but can be applied to most other actions w/o having to recode the modules themselves.

Not sure how well that would apply to network actions as they do their own 'magic' mangling on the connection side.

privateip commented 6 years ago

One point of clarification, aggregates (or whatever they end up being called) are significantly more than an optimization. Aggregates allow the task to define the aggregate set of configure and only the aggregate set of configuration that should be present on a target. So if the aggregate is A, B, C and the actual target has A, B, D, then the module should add C and remove D. This is impossible to accomplish using with_* loops today and should be built into the design of the new loop feature.

gundalow commented 6 years ago

The Core Network team has agreed that we will not add the technical guide (https://gist.github.com/gundalow/c2a4fb51a093a72bc4a380535aa1d835) for Aggregates into 2.4.0 to avoid confusing the end users.

bcoca commented 6 years ago

@privateip so it works like most 'list options' in modules, which also have a common pattern of changing that behaviour via an additional append=no|yes option.

gundalow commented 6 years ago

All, We plan to conclude the discussion on Wednesday 4th October, so please ensure all your thoughts are added by then.

ganeshrn commented 6 years ago

@bcoca

net_vlan:
  vlan_id: "{{ item.vlan_id }}"
  name: "{{ item.name }}"
  description: "{{ item.description }}"
  purge: True
loop:  
  - { vlan_id: 100, name: test_vlan_1, description: test vlan-1 }
  - { vlan_id: 200, name: test_vlan_2, description: test vlan-2 }
  - { vlan_id: 300, name: test_vlan_3, description: test vlan-3 }
loop_control:
  optimize: True

In above example with proposed loop and optimize option set to True will the task executor send the entire list of dict under loop as a task arg to module?

So the validation of the entries within each dict (option validation) and batch processing of the list needs be handled in the module.

In case purge/append option is enabled module needs to handle deleting all the vlan and configure only the vlan given under loop. This essentially requires module to be called only once with a full list.

dagwieers commented 6 years ago

@ganeshrn Please don't delete all, followed by adding, that could have an impact. IMO it's better to selectively delete, update or add. Which obviously requires support by the module (which is something it should advertise). If a module supports this methodology, I don't see why someone would disable this. It should be the only modus operandi. (i.e. 'absent' skipping the add/update and 'present' skipping the delete)

In my opinion, the purge-option is in fact a state. I don't know yet what the best name would be, but next to absent and present (which only affects the items listed) that third state would affect the existing config. It wouldn't need an absent-alternative because an empty input would mean cleaning everything.

bcoca commented 6 years ago

@ganeshrn that is one way, we probably need 2 diff modes depending on module support, basically an extension of current squashing but 'explicit' vs 'implicit' behaviour.

The other way would be to 'pretemplate' the loop and execute all of this remotely (@dagwieers i believe you proposed this already) , but this has a couple of caveats, mainly that loop items cannot depend on previous items execution (loop forks have similar issues).

@dagwieers I'm favoring the additional options because i have not been able to find a 'state' that reflects what you are saying, .. but even append does not seem to cover this well either .. best I can come up with is a declared state as an 'make sure it looks like this and ONLY like this'?

dagwieers commented 6 years ago

I don't think declared is a bad choice. Maybe as-is which is less loaded and ready to become new terminology ?

We define everything as a whole, full, total or rather in-full, altogether or entire. If a verb can do, maybe supplant, displace, supersede or take-over.

The problem is that both absent and present describe the thing you specify, whereas this new "state" also should describe the state of what already exists, and that's what makes everything feel unfit. purged-and-replaced is what we mean, but that's wrong because we really don't want to give away that notion.

We can't describe this new state by something that is identified with the things we specify and also affects existing stuff. supersede or supplant comes closest in that respect, but I like declared or as-is more.

ganeshrn commented 6 years ago

@dagwieers Agreed, sequence of add/delete is important. As the existing configuration is replaced with what is given in the task, the state can also be described as replace.

bcoca commented 6 years ago

state=present supplant=False|True? .. or mabye a purge=first|last|None?

I think we all agree on function ...so we are left with UI .. mostly naming ... one of the hard parts of computing!

dagwieers commented 6 years ago

An additional option makes room for 4 different combinations where we only really need 3.

On top of that, we then also have to ensure that purge is not used with any other state that people may provide (e.g. query or list).

This really is a separate state, next to absent/present (which only affects what is specified). Maybe we have to introduce new terminology altogether from a different field. (Like "idempotence" comes from math)

Reading up on idempotence I stumbled on pure

If any existing configuration/state would influence the resulting state, it is not considered pure.

itdependsnetworks commented 6 years ago

This is my point in https://github.com/ansible/proposals/issues/75 about defining states. I actually like the use of append (as opposed to my update-present). Purge/replace seems to line up with a declarative, but to me it is a state.

bcoca commented 6 years ago

not everything needs to be a state, we can have options that are incompatible with setting state or setting it to 'non present'

itdependsnetworks commented 6 years ago

Agreed it does not "need" to be a state. I am just of the opinion that it the cleanest way of doing it.

dagwieers commented 6 years ago

purge/replace are imperative and definitely not states. purged and replaced are states, but not about what was specified, like absent or present. declared is, but doesn't indicate what happens to existing config.

pure however is a state both about what is specified, but also what remains. And it matches the definition in computer science.

calfonso commented 6 years ago

Here are the notes from the IRC working group session where we discussed this proposal. https://meetbot.fedoraproject.org/ansible-network/2017-10-11/network_working_group.2017-10-11-16.01.html

dagwieers commented 6 years ago

My proposal for what users interface with in Ansible is exactly the same as the new loop syntax in v2.5:

- net_vlan:
    vlan_id: "{{ item.vlan_id }}"
    name: "{{ item.name }}"
    description: "{{ item.description }}"
    state: pure
  loop:  
    - { vlan_id: 100, name: test_vlan_1, description: test vlan-1 }
    - { vlan_id: 200, name: test_vlan_2, description: test vlan-2 }
    - { vlan_id: 300, name: test_vlan_3, description: test vlan-3 }

So state: pure means only these items need to exist and any existing item not listed will be purged. And state: present would ensure everything exists, but in a single transaction (if supported). And state: absent would ensure everything does not exist, but in a single transaction (if supported).

So this clearly is something we have to support in Ansible core, and add to modules over time. The argument spec would add something like: supports_squashing=True if the module supports squashing.

Obviously, this only works if the module supports such declarations in a single transaction and Ansible understands that if the module supports this, it no longer needs to run actions for individual items, but can send the whole list in a single call to the module.

This would equally works for state: present because the module would have indicated it supports "squashing". So playbooks don't need to be adapted, if modules start supporting squashing, and they have a newer Ansible with squashing support, it would start using it. If not, it works as before.

privateip commented 6 years ago

@dagwieers i fail to see how you support squashing with this proposal. Each iteration of net_vlan would be separate call. What argument in the module will receive the list of hashes? Can you elaborate on how this might work?

dagwieers commented 6 years ago

@privateip No, if the module advertises supports_squashing=True the core system will send the whole dataset at once to the module (which at that point supports this). If it does not advertise this, it works as before, iterating over every item.

The main advantage is that existing playbooks will start doing it more efficiently as soon as a module (and Ansible) supports this.

Updated to use the new terminology squashing over aggregation. I agree it's better.

privateip commented 6 years ago

I'm still not clear what the receiver is in this scenario. Obviously it has to be the module but what in the module. A new argument?

Also, this would require that all modules are loaded (imported) on the controller side (I'm assuming in the action plugin) prior to passing the arguments. Just trying to clarify my understanding, please correct any assumption here that is wrong

dagwieers commented 6 years ago

Obviously it is the module, and yes, it would be some new internal argument. Those details are part of the implementation. And yes, the module would have to be loaded/inspected to understand what it supports, which it already does for supports_check_mode BTW.

But it wouldn't need a specific action plugin, the default action_plugin would do this out of the box. (Modules that require its own action plugin would have to support this too if they want to support squashing, so yes it would need some supporting call for this, or if possible, part of the existing interface).

privateip commented 6 years ago

And yes, the module would have to be loaded/inspected to understand what it supports, which it already does for supports_check_mode BTW.

No, supports_check_mode is provided to the instance of AnsibleModule via _ANSIBLE_ARGS, the module code is not loaded on the controller side.

Basically this proposal moves the functionality of the current aggregates argument from being a per module option to being loop and then implements a new check to see if the module supports the feature by loading the module code on the controller side first before sending to the remote device. In other words, the functionality isn't altered. This proposal just moves some things around to feel a little more natural, which is not to say its right, wrong or indifferent, just clarifying the proposal.

dagwieers commented 6 years ago

No, supports_check_mode is provided to the instance of AnsibleModule via _ANSIBLE_ARGS, the module code is not loaded on the controller side.

Check plugins/action/__init__.py, it's the controller that will skip running a module in check-mode if the module does not support check-mode. So the controller knows beforehand that the module does not need to run at the remote node. This would not be different.

Update: Alright, I misinterpreted what was going on there. It seems the individual modules are run remotely and return skipped. The plugins/action/__init__.py apparently only relates to action plugins. So I guess that can be optimized now as well. No need to run them remotely in check-mode if we know they don't support check-mode anyway.

bcoca commented 6 years ago

@dagwieers no, that check is for action plugins support for checkmode, modules are assumed to all support it at that point (normal action plugin), then module itself must decide to return 'skipped' if in checkmode and does not support it basic.py code does this automatically depending on argspec definition.

dagwieers commented 6 years ago

@bcoca Yup, that's what I figured (see the update).

cognifloyd commented 6 years ago

Here's a couple of suggestions.

- name: Reserve mgmt vlans
  net_vlan:
    name: reserved_vlan
    state: active
  loop:
    - { vlan_id: 4 }
    - { vlan_id: 5, name: mgmt }
    - { vlan_id: 6 }
  loop_control:
    purge: yes
    as: module # defaults to as: action
    template_mode: override # defaults to template: loop_var

as: allows specifying when the looping occurs. Actions run controller-side, so as: action is what ansible currently does with loops (looping on the controller). as: module however, would loop on the remote node. When executing on the remote node, all loop data would get templated (see below) and then passed to the remote node as a block. Then, the module can specify what kind of as: module looping it supports.

Templating would need an additional something on the controller side to either template in the item/loop_var (current behavior), or specify a list of dicts that override defaults provided in the module parameters (like aggegates, shown above). The default mode could be specified by the module to allow the module to say that overrides from loops are the most natural approach. But, doing that makes it a little magical, so perhaps overriding behavior needs to be more explicit.

Or here's another take that uses a new loop_params key to do override style loops instead of loop_var style loops. That could be useful even without remote module-side looping.

- name: Reserve mgmt vlans
  net_vlan:
    name: reserved_vlan
    state: active
  loop_params:
    - { vlan_id: 4 }
    - { vlan_id: 5, name: mgmt }
    - { vlan_id: 6 }
  loop_control:
    purge: yes
    as: module # defaults to as: action

edit: as is probably not the best term as others have suggested. Instead we could say something like layer... see https://github.com/ansible/proposals/issues/71#issuecomment-417145525 below.

cognifloyd commented 6 years ago

So to sum up: I think some pieces of 'aggegates' could be made more common/useful if they're broken down a bit more.

cognifloyd commented 6 years ago

The idea with template_mode: override (first example) or loop_params (second example) is to duplicate this in a more generic way: https://gist.github.com/gundalow/c2a4fb51a093a72bc4a380535aa1d835#local-overrides-of-global-module-values

@bcoca mentioned in #ansible-devel:

i would make that [loop_params] a loop_control option, not a new keyword loop_control: - items_are_options_dict: no|yes

That's a more verbose flag than template_mode: override, but using a flag instead of magic keywords is probably a good idea. The benefit of some other loop key would be to be terse, and make it clear that loop_control: loop_var does not apply to this kind of loop.

jmcgill298 commented 6 years ago

@cognifloyd your idea makes sense, but I suggest a couple of implementation changes

cognifloyd commented 6 years ago

Or, instead of as we could use remote: yes (defaulting to remote: no). That's simple.

cognifloyd commented 6 years ago

Also, something like exclusive: yes or inclusive: yes (not sure which makes more sense) could be used in place of purge: yes. I think that would address this from @dagwieers:

purge/replace are imperative and definitely not states. purged and replaced are states, but not about what was specified, like absent or present. declared is, but doesn't indicate what happens to existing config.

jmcgill298 commented 6 years ago

I wanted to refrain from using words that could be used to indicate which host is doing the looping (looping happens in the module, not on the host, i.e. connection: local).

privateip commented 6 years ago

@dag We are no longer actively developing this feature in core and will move any implementation herein into roles on as-needed basis. Any objections to closing this proposal?

dagwieers commented 6 years ago

I don't see how roles can solve this.

privateip commented 6 years ago

Roles solve this by their implementation. If I pass a set of values into a role, the implementation of the role can perform the necessary adds and removes to normalize the desired data set with the configured data set on the device. So you are correct in the sense that roles do not inherently solve this but the implementation of tasks of a role do.

Since we are no longer developing modules with an aggregates argument, that was the basis for my suggestion to close this. I realize there may still be a desire to discuss this as a future feature so if you want to keep this open for that reason that makes total sense.

dagwieers commented 6 years ago

@privateip Yes, I would like to keep this open, as a role-based implementation does not offer performance/efficiency improvements.

Looping over a task for hundreds of items is in most cases hard to impossible unless it is core/module-based implementation. I have several customers/users asking for this to make Ansible usable in large setups.

cognifloyd commented 6 years ago

Here's a variation of an earlier example. Here I swapped purge for inclusive (see https://github.com/ansible/proposals/issues/71#issuecomment-351220972) and replaced as with layer:

- name: Reserve mgmt vlans
  net_vlan:
    name: reserved_vlan
    state: active
  loop:
    - { vlan_id: 4 }
    - { vlan_id: 5, name: mgmt }
    - { vlan_id: 6 }
  loop_control:
    inclusive: yes
    layer: task # defaults to layer: playbook

A loop layer determines the layer that is responsible for looping over a task. By default, the playbook handles the looping (ie layer: playbook), meaning that the task is repeated for each loop item. Alternately, the entire loop can be handed over to the task, allowing the task to loop over each item itself (ie layer: task). This only works if the action or module has documented in their DOCUMENTATION string that they support task layer looping.

In the case of layer: task looping ansible would have some kind of helper utilities to help the task run and report results for each loop item. It would also allow the task (be it a module or an action) to optimize the running of all of the items so that network modules or other api modules can take advantage of batch APIs or running multiple commands in one connection.

So, I think this is easy enough to explain in user facing documentation as I attempted to do above. What does everyone think?

caphrim007 commented 6 years ago

@cognifloyd do you have any opinions on what

"This only works if the action or module has documented in their DOCUMENTATION string that they support task layer looping."

would look like?

cognifloyd commented 6 years ago

The documentation is already parsed for options, so it shouldn't be a big deal to also put some more metadata in the documentation.

DOCUMENTATION = '''
---
module: my_sweet_module
short_description: bla bla bla
version_added: "2.7"
description:
  - Describe it more completely here
supports:
  - task_layer_loop
  - check_mode
options:
  option_a:
    required: true
    type: string
  option_b:
    type: string
'''

or another option:

supports:
  - check_mode: yes
  - loop_layer: task  # defaults to playbook, as everything supports playbook layer looping

or even:

supports:
  check_mode: yes
  loop_layer: task

or maybe, if a supports list isn't appealing, something like

supports_check_mode: yes
supports_task_layer_loop: yes

I like a list/dict because it makes it feels cleaner (less verbose, but more clear) and is more extensible for future features without a lot of supports_* flags.

Note that I also included an check_mode as documenting it like this would allow skipping a task in check_mode without ever running the module. But, I'm not asking to add supports_check_mode to the documentation... It's an example of a possible extension of the doc concept. I am asking for an accessible way to internalize looping, allowing tasks (be they modules, actions, or some combination) to optimize the looping internally.

cognifloyd commented 6 years ago

I suppose we could call the two layers play and task instead of playbook and task.

loop_control:
  layer: play # the default (supported by ALL modules)
loop_control:
  layer: task
DOCUMENTATION = '''
...
supports:
  loop_layer: play # the default (current behavior)
dagwieers commented 6 years ago

@cognifloyd I think that syntax is too verbose and complex. If we expect a lot of modules over time would support squashing and pure, having to specify all that is a drag.

Another issue I have with loop: now is the fact the the playbook nor the task is looping, so it adds to the confusion. To the user we simply hand off a dataset to the module, so I would even refrain from using the loop keyword here. Bolting this on top of the loop-mechanism is wrong. (Major change of opinion since last year)

So rather than use loop or with_items my new proposal is this:

- net_vlan:
    description: Production VLAN
    state: pure
  dataset:  
    - { vlan_id: 100, name: prod_vlan_1 }
    - { vlan_id: 200, name: test_vlan_2, description: Test VLAN }
    - { vlan_id: 300, name: test_vlan_3, description: Test VLAN }

So the fact that dataset is provided, means we are sending this as a whole to the module. (How exactly is not relevant now, I would do this internally and not expose it as a parameter). At this point Ansible will know what it needs to do, there's no need here to check if the module actually supports it. (If it doesn't, we'll have to make catch that and make it clear to the user though)

This would also work with state: present and state: absent obviously.

cognifloyd commented 6 years ago

First, what does it take to move this proposal to an "accepted" state?

An alternative to the loop keyword that implies that the data is handled by the action/module itself would be fine with me (vs creating a loop mode flag that makes loops go through completely different code paths, potentially creating end-user confusion at the behavioral differences). I'm not opposed to using dataset: as that keyword.

As for state: pure I think that's something that each individual module would implement. The first few core-managed modules to adopt that state would add that state to various places in the documentation so that it can become a well-known state name that modules can reuse when they have a relevant use-case. I don't think it would require any core changes to support state: pure.

More thoughts on dataset: implementaiton in ansible core.

- name: my sweet task name
  my_sweet_module:
    this_is: a top level arg
  dataset:
    - { this_is: "an arg in a dataset entry" }

In the action/module that I'm writing (that triggered looking this proposal up last week to see if anything changed), I ended up creating something like an dataset or aggregate arg (though I didn't use the term aggregate or dataset. I used the plural of the primary identifier. ie if the primary identifier were thing I called the aggregate/dataset arg things. Or if it were vlan maybe it would be called vlans.)

Here are some questions that might have to be answered once we get to implementation (I hope asking now will not delay accepting this proposal).

For my action/module, I had my primary identifier arg name and several other args including aliases. If I only specified one name (ie I didn't use the dataset) then using the top-level aliases arg was fine. However, if a list of names was supplied (ie I supplied a dataset) with a top-level aliases arg, I would want to throw a validation error as aliases must be unique (defined per item in the dataset). With a dataset, the aliases had to be supplied in the dataset, or not at all. For the other args, I merged them with my dataset so that each item in the dataset also had all of the top-level args included. Then, I looped over the now-merged-and-validated dataset to generate the external API call.

So, I would imagine that the ansible-core-provided dataset handling would handle merging the top-level task args with the dataset as well as handling basic validation. If it just merged them, and I had to add my extra 'uniqueness' constraint, that would probably be fine.

Also, for my use-case I think it makes more sense for the dataset entries to have precedence over the top-level args. Does that also make sense for the network appliance use case? Or do the network modules need to have top-level args override entries in the dataset? How do network module aggregates currently work with merging top-level and aggregate args?

dagwieers commented 6 years ago

@cognifloyd The official decision was that this functionality is implemented using roles: https://github.com/ansible/proposals/issues/71#issuecomment-382003915

But to me that is no solution for the needs I have, so that's why I'd like to first look at a representation in the playbook, then see if it could cover the different needs and look at making a basic implementation. Obviously, this impacts core (a bit), module_utils (mostly) and modules. There's no simple solution, especially for implementing state: pure which requires a complete redesign of the module.

I am not going into details of your questions, because that's going to add to bikeshedding mostly :-) But I do have some opinions. Most of the heavy lifting will be in module_utils and be as transparant to the module writer, as it is today. So parameter handling will just happen, and you'll get an object that's easy and transparant to iterate over so you can basically reuse as much of the existing logic as possible. That will be the goal.

zyv commented 5 years ago

It would be nice to have something like this for our Bitbucket modules:

The rationale here is that these modules use API calls to manipulate individual entries, and can be significantly sped up, if the proposal by @dagwieers is accepted.