Open dagwieers opened 7 years ago
+1
So from the implementation there seems to be mixing 2 things that already exist:
squashing: when modules can take a single instance or a list of parameters to execute the actions in a more optimized way (i.e yum, apt, etc) .. which is 'detected' by with_ to 'automagically' switch to lists when it is known the module AND specific options support this. FYI, we are moving to loop:
and will be removing this 'magic' in the future.
suboptions: this is more a developer feature, which allows to use argspec to define/validate module suboptions as it has been able to do for options.
My thoughts:
Currently I'm leaning towards having loop_control: optimize: false|true
to actually 'pretemplate all the loop invocations' and pass this for 'batch execution' on the target , vs having to have each module code both options. This still won't be the most optimal way in some cases (i.e package managers) but can be applied to most other actions w/o having to recode the modules themselves.
Not sure how well that would apply to network actions as they do their own 'magic' mangling on the connection side.
One point of clarification, aggregates (or whatever they end up being called) are significantly more than an optimization. Aggregates allow the task to define the aggregate set of configure and only the aggregate set of configuration that should be present on a target. So if the aggregate is A, B, C and the actual target has A, B, D, then the module should add C and remove D. This is impossible to accomplish using with_*
loops today and should be built into the design of the new loop feature.
The Core Network team has agreed that we will not add the technical guide (https://gist.github.com/gundalow/c2a4fb51a093a72bc4a380535aa1d835) for Aggregates into 2.4.0 to avoid confusing the end users.
@privateip so it works like most 'list options' in modules, which also have a common pattern of changing that behaviour via an additional append=no|yes
option.
All, We plan to conclude the discussion on Wednesday 4th October, so please ensure all your thoughts are added by then.
@bcoca
net_vlan:
vlan_id: "{{ item.vlan_id }}"
name: "{{ item.name }}"
description: "{{ item.description }}"
purge: True
loop:
- { vlan_id: 100, name: test_vlan_1, description: test vlan-1 }
- { vlan_id: 200, name: test_vlan_2, description: test vlan-2 }
- { vlan_id: 300, name: test_vlan_3, description: test vlan-3 }
loop_control:
optimize: True
In above example with proposed loop
and optimize
option set to True
will the task executor send the entire list of dict under loop as a task arg to module?
So the validation of the entries within each dict (option validation) and batch processing of the list needs be handled in the module.
In case purge/append option is enabled module needs to handle deleting
all the vlan and configure only the vlan given under loop
. This essentially requires module to be called only once with a full list.
@ganeshrn Please don't delete all, followed by adding, that could have an impact. IMO it's better to selectively delete, update or add. Which obviously requires support by the module (which is something it should advertise). If a module supports this methodology, I don't see why someone would disable this. It should be the only modus operandi. (i.e. 'absent' skipping the add/update and 'present' skipping the delete)
In my opinion, the purge-option is in fact a state. I don't know yet what the best name would be, but next to absent and present (which only affects the items listed) that third state would affect the existing config. It wouldn't need an absent-alternative because an empty input would mean cleaning everything.
@ganeshrn that is one way, we probably need 2 diff modes depending on module support, basically an extension of current squashing but 'explicit' vs 'implicit' behaviour.
The other way would be to 'pretemplate' the loop and execute all of this remotely (@dagwieers i believe you proposed this already) , but this has a couple of caveats, mainly that loop items cannot depend on previous items execution (loop forks have similar issues).
@dagwieers I'm favoring the additional options because i have not been able to find a 'state' that reflects what you are saying, .. but even append
does not seem to cover this well either .. best I can come up with is a declared
state as an 'make sure it looks like this and ONLY like this'?
I don't think declared is a bad choice. Maybe as-is which is less loaded and ready to become new terminology ?
We define everything as a whole, full, total or rather in-full, altogether or entire. If a verb can do, maybe supplant, displace, supersede or take-over.
The problem is that both absent and present describe the thing you specify, whereas this new "state" also should describe the state of what already exists, and that's what makes everything feel unfit. purged-and-replaced is what we mean, but that's wrong because we really don't want to give away that notion.
We can't describe this new state by something that is identified with the things we specify and also affects existing stuff. supersede or supplant comes closest in that respect, but I like declared or as-is more.
@dagwieers Agreed, sequence of add/delete is important.
As the existing configuration is replaced with what is given in the task, the state can also be described as replace
.
state=present supplant=False|True
? .. or mabye a purge=first|last|None
?
I think we all agree on function ...so we are left with UI .. mostly naming ... one of the hard parts of computing!
An additional option makes room for 4 different combinations where we only really need 3.
On top of that, we then also have to ensure that purge is not used with any other state that people may provide (e.g. query or list).
This really is a separate state, next to absent/present (which only affects what is specified). Maybe we have to introduce new terminology altogether from a different field. (Like "idempotence" comes from math)
Reading up on idempotence I stumbled on pure
If any existing configuration/state would influence the resulting state, it is not considered pure.
This is my point in https://github.com/ansible/proposals/issues/75 about defining states. I actually like the use of append (as opposed to my update-present). Purge/replace seems to line up with a declarative, but to me it is a state.
not everything needs to be a state, we can have options that are incompatible with setting state or setting it to 'non present'
Agreed it does not "need" to be a state. I am just of the opinion that it the cleanest way of doing it.
purge/replace are imperative and definitely not states. purged and replaced are states, but not about what was specified, like absent or present. declared is, but doesn't indicate what happens to existing config.
pure however is a state both about what is specified, but also what remains. And it matches the definition in computer science.
Here are the notes from the IRC working group session where we discussed this proposal. https://meetbot.fedoraproject.org/ansible-network/2017-10-11/network_working_group.2017-10-11-16.01.html
My proposal for what users interface with in Ansible is exactly the same as the new loop syntax in v2.5:
- net_vlan:
vlan_id: "{{ item.vlan_id }}"
name: "{{ item.name }}"
description: "{{ item.description }}"
state: pure
loop:
- { vlan_id: 100, name: test_vlan_1, description: test vlan-1 }
- { vlan_id: 200, name: test_vlan_2, description: test vlan-2 }
- { vlan_id: 300, name: test_vlan_3, description: test vlan-3 }
So state: pure
means only these items need to exist and any existing item not listed will be purged.
And state: present
would ensure everything exists, but in a single transaction (if supported).
And state: absent
would ensure everything does not exist, but in a single transaction (if supported).
So this clearly is something we have to support in Ansible core, and add to modules over time. The argument spec would add something like: supports_squashing=True
if the module supports squashing.
Obviously, this only works if the module supports such declarations in a single transaction and Ansible understands that if the module supports this, it no longer needs to run actions for individual items, but can send the whole list in a single call to the module.
This would equally works for state: present
because the module would have indicated it supports "squashing". So playbooks don't need to be adapted, if modules start supporting squashing, and they have a newer Ansible with squashing support, it would start using it. If not, it works as before.
@dagwieers i fail to see how you support squashing with this proposal. Each iteration of net_vlan
would be separate call. What argument
in the module will receive the list of hashes? Can you elaborate on how this might work?
@privateip No, if the module advertises supports_squashing=True
the core system will send the whole dataset at once to the module (which at that point supports this). If it does not advertise this, it works as before, iterating over every item.
The main advantage is that existing playbooks will start doing it more efficiently as soon as a module (and Ansible) supports this.
Updated to use the new terminology squashing over aggregation. I agree it's better.
I'm still not clear what the receiver is in this scenario. Obviously it has to be the module but what in the module. A new argument?
Also, this would require that all modules are loaded (imported) on the controller side (I'm assuming in the action plugin) prior to passing the arguments. Just trying to clarify my understanding, please correct any assumption here that is wrong
Obviously it is the module, and yes, it would be some new internal argument. Those details are part of the implementation. And yes, the module would have to be loaded/inspected to understand what it supports, which it already does for supports_check_mode
BTW.
But it wouldn't need a specific action plugin, the default action_plugin would do this out of the box. (Modules that require its own action plugin would have to support this too if they want to support squashing, so yes it would need some supporting call for this, or if possible, part of the existing interface).
And yes, the module would have to be loaded/inspected to understand what it supports, which it already does for supports_check_mode BTW.
No, supports_check_mode
is provided to the instance of AnsibleModule via _ANSIBLE_ARGS
, the module code is not loaded on the controller side.
Basically this proposal moves the functionality of the current aggregates
argument from being a per module option to being loop
and then implements a new check to see if the module supports the feature by loading the module code on the controller side first before sending to the remote device. In other words, the functionality isn't altered. This proposal just moves some things around to feel a little more natural, which is not to say its right, wrong or indifferent, just clarifying the proposal.
No, supports_check_mode is provided to the instance of AnsibleModule via _ANSIBLE_ARGS, the module code is not loaded on the controller side.
Check plugins/action/__init__.py
, it's the controller that will skip running a module in check-mode if the module does not support check-mode. So the controller knows beforehand that the module does not need to run at the remote node. This would not be different.
Update: Alright, I misinterpreted what was going on there. It seems the individual modules are run remotely and return skipped. The plugins/action/__init__.py
apparently only relates to action plugins. So I guess that can be optimized now as well. No need to run them remotely in check-mode if we know they don't support check-mode anyway.
@dagwieers no, that check is for action plugins support for checkmode, modules are assumed to all support it at that point (normal action plugin), then module itself must decide to return 'skipped' if in checkmode and does not support it basic.py
code does this automatically depending on argspec definition.
@bcoca Yup, that's what I figured (see the update).
Here's a couple of suggestions.
- name: Reserve mgmt vlans
net_vlan:
name: reserved_vlan
state: active
loop:
- { vlan_id: 4 }
- { vlan_id: 5, name: mgmt }
- { vlan_id: 6 }
loop_control:
purge: yes
as: module # defaults to as: action
template_mode: override # defaults to template: loop_var
as:
allows specifying when the looping occurs. Actions run controller-side, so as: action
is what ansible currently does with loops (looping on the controller). as: module
however, would loop on the remote node.
When executing on the remote node, all loop data would get templated (see below) and then passed to the remote node as a block. Then, the module can specify what kind of as: module
looping it supports.
purge: yes
to remove anything not in the loop.Templating would need an additional something on the controller side to either template in the item/loop_var (current behavior), or specify a list of dicts that override defaults provided in the module parameters (like aggegates, shown above). The default mode could be specified by the module to allow the module to say that overrides from loops are the most natural approach. But, doing that makes it a little magical, so perhaps overriding behavior needs to be more explicit.
Or here's another take that uses a new loop_params
key to do override style loops instead of loop_var style loops. That could be useful even without remote module-side looping.
- name: Reserve mgmt vlans
net_vlan:
name: reserved_vlan
state: active
loop_params:
- { vlan_id: 4 }
- { vlan_id: 5, name: mgmt }
- { vlan_id: 6 }
loop_control:
purge: yes
as: module # defaults to as: action
edit: as
is probably not the best term as others have suggested. Instead we could say something like layer
... see https://github.com/ansible/proposals/issues/71#issuecomment-417145525 below.
So to sum up: I think some pieces of 'aggegates' could be made more common/useful if they're broken down a bit more.
loop_params
for a loop that specifies parameters instead of a loop that uses a var;purge: yes
that allow the module to add a some layer of assertions across all elements in the loop. (Hmm. perhaps the module could provide standardized loop_assertions
that ansible can call for it)The idea with template_mode: override
(first example) or loop_params
(second example) is to duplicate this in a more generic way:
https://gist.github.com/gundalow/c2a4fb51a093a72bc4a380535aa1d835#local-overrides-of-global-module-values
@bcoca mentioned in #ansible-devel:
i would make that [loop_params] a loop_control option, not a new keyword loop_control: - items_are_options_dict: no|yes
That's a more verbose flag than template_mode: override
, but using a flag instead of magic keywords is probably a good idea. The benefit of some other loop key would be to be terse, and make it clear that loop_control: loop_var
does not apply to this kind of loop.
@cognifloyd your idea makes sense, but I suggest a couple of implementation changes
with_items
instead adding yet another loop mechanism (keeping default behavior to work as it does currently)as
as it does not convey any meaning to users. ('loop_owner'?)as
options to task
and module
(again, I don't think action
conveys any meaning to the users)Or, instead of as
we could use remote: yes
(defaulting to remote: no
). That's simple.
Also, something like exclusive: yes
or inclusive: yes
(not sure which makes more sense) could be used in place of purge: yes
. I think that would address this from @dagwieers:
purge/replace are imperative and definitely not states. purged and replaced are states, but not about what was specified, like absent or present. declared is, but doesn't indicate what happens to existing config.
I wanted to refrain from using words that could be used to indicate which host is doing the looping (looping happens in the module, not on the host, i.e. connection: local
).
@dag We are no longer actively developing this feature in core and will move any implementation herein into roles on as-needed basis. Any objections to closing this proposal?
I don't see how roles can solve this.
Roles solve this by their implementation. If I pass a set of values into a role, the implementation of the role can perform the necessary adds and removes to normalize the desired data set with the configured data set on the device. So you are correct in the sense that roles do not inherently solve this but the implementation of tasks of a role do.
Since we are no longer developing modules with an aggregates
argument, that was the basis for my suggestion to close this. I realize there may still be a desire to discuss this as a future feature so if you want to keep this open for that reason that makes total sense.
@privateip Yes, I would like to keep this open, as a role-based implementation does not offer performance/efficiency improvements.
Looping over a task for hundreds of items is in most cases hard to impossible unless it is core/module-based implementation. I have several customers/users asking for this to make Ansible usable in large setups.
Here's a variation of an earlier example. Here I swapped purge
for inclusive
(see https://github.com/ansible/proposals/issues/71#issuecomment-351220972) and replaced as
with layer
:
- name: Reserve mgmt vlans
net_vlan:
name: reserved_vlan
state: active
loop:
- { vlan_id: 4 }
- { vlan_id: 5, name: mgmt }
- { vlan_id: 6 }
loop_control:
inclusive: yes
layer: task # defaults to layer: playbook
A loop layer
determines the layer that is responsible for looping over a task. By default, the playbook handles the looping (ie layer: playbook
), meaning that the task is repeated for each loop item. Alternately, the entire loop can be handed over to the task, allowing the task to loop over each item itself (ie layer: task
). This only works if the action or module has documented in their DOCUMENTATION string that they support task layer looping.
In the case of layer: task
looping ansible would have some kind of helper utilities to help the task run and report results for each loop item. It would also allow the task (be it a module or an action) to optimize the running of all of the items so that network modules or other api modules can take advantage of batch APIs or running multiple commands in one connection.
So, I think this is easy enough to explain in user facing documentation as I attempted to do above. What does everyone think?
@cognifloyd do you have any opinions on what
"This only works if the action or module has documented in their DOCUMENTATION string that they support task layer looping."
would look like?
The documentation is already parsed for options, so it shouldn't be a big deal to also put some more metadata in the documentation.
DOCUMENTATION = '''
---
module: my_sweet_module
short_description: bla bla bla
version_added: "2.7"
description:
- Describe it more completely here
supports:
- task_layer_loop
- check_mode
options:
option_a:
required: true
type: string
option_b:
type: string
'''
or another option:
supports:
- check_mode: yes
- loop_layer: task # defaults to playbook, as everything supports playbook layer looping
or even:
supports:
check_mode: yes
loop_layer: task
or maybe, if a supports
list isn't appealing, something like
supports_check_mode: yes
supports_task_layer_loop: yes
I like a list/dict because it makes it feels cleaner (less verbose, but more clear) and is more extensible for future features without a lot of supports_*
flags.
Note that I also included an check_mode
as documenting it like this would allow skipping a task in check_mode without ever running the module. But, I'm not asking to add supports_check_mode to the documentation... It's an example of a possible extension of the doc concept. I am asking for an accessible way to internalize looping, allowing tasks (be they modules, actions, or some combination) to optimize the looping internally.
I suppose we could call the two layers play
and task
instead of playbook
and task
.
loop_control:
layer: play # the default (supported by ALL modules)
loop_control:
layer: task
DOCUMENTATION = '''
...
supports:
loop_layer: play # the default (current behavior)
@cognifloyd I think that syntax is too verbose and complex. If we expect a lot of modules over time would support squashing and pure, having to specify all that is a drag.
Another issue I have with loop: now is the fact the the playbook nor the task is looping, so it adds to the confusion. To the user we simply hand off a dataset to the module, so I would even refrain from using the loop keyword here. Bolting this on top of the loop-mechanism is wrong. (Major change of opinion since last year)
So rather than use loop or with_items my new proposal is this:
- net_vlan:
description: Production VLAN
state: pure
dataset:
- { vlan_id: 100, name: prod_vlan_1 }
- { vlan_id: 200, name: test_vlan_2, description: Test VLAN }
- { vlan_id: 300, name: test_vlan_3, description: Test VLAN }
So the fact that dataset is provided, means we are sending this as a whole to the module. (How exactly is not relevant now, I would do this internally and not expose it as a parameter). At this point Ansible will know what it needs to do, there's no need here to check if the module actually supports it. (If it doesn't, we'll have to make catch that and make it clear to the user though)
This would also work with state: present
and state: absent
obviously.
First, what does it take to move this proposal to an "accepted" state?
An alternative to the loop
keyword that implies that the data is handled by the action/module itself would be fine with me (vs creating a loop mode flag that makes loops go through completely different code paths, potentially creating end-user confusion at the behavioral differences). I'm not opposed to using dataset:
as that keyword.
As for state: pure
I think that's something that each individual module would implement. The first few core-managed modules to adopt that state
would add that state to various places in the documentation so that it can become a well-known state name that modules can reuse when they have a relevant use-case. I don't think it would require any core changes to support state: pure
.
More thoughts on dataset:
implementaiton in ansible core.
- name: my sweet task name
my_sweet_module:
this_is: a top level arg
dataset:
- { this_is: "an arg in a dataset entry" }
In the action/module that I'm writing (that triggered looking this proposal up last week to see if anything changed), I ended up creating something like an dataset
or aggregate
arg (though I didn't use the term aggregate
or dataset
. I used the plural of the primary identifier. ie if the primary identifier were thing
I called the aggregate/dataset arg things
. Or if it were vlan
maybe it would be called vlans
.)
Here are some questions that might have to be answered once we get to implementation (I hope asking now will not delay accepting this proposal).
dataset
?For my action/module, I had my primary identifier arg name
and several other args including aliases
. If I only specified one name
(ie I didn't use the dataset) then using the top-level aliases
arg was fine. However, if a list of names
was supplied (ie I supplied a dataset) with a top-level aliases
arg, I would want to throw a validation error as aliases must be unique (defined per item in the dataset). With a dataset, the aliases
had to be supplied in the dataset, or not at all. For the other args, I merged them with my dataset
so that each item in the dataset also had all of the top-level args included. Then, I looped over the now-merged-and-validated dataset to generate the external API call.
So, I would imagine that the ansible-core-provided dataset handling would handle merging the top-level task args with the dataset as well as handling basic validation. If it just merged them, and I had to add my extra 'uniqueness' constraint, that would probably be fine.
Also, for my use-case I think it makes more sense for the dataset entries to have precedence over the top-level args. Does that also make sense for the network appliance use case? Or do the network modules need to have top-level args override entries in the dataset
? How do network module aggregates currently work with merging top-level and aggregate args?
@cognifloyd The official decision was that this functionality is implemented using roles: https://github.com/ansible/proposals/issues/71#issuecomment-382003915
But to me that is no solution for the needs I have, so that's why I'd like to first look at a representation in the playbook, then see if it could cover the different needs and look at making a basic implementation. Obviously, this impacts core (a bit), module_utils (mostly) and modules. There's no simple solution, especially for implementing state: pure
which requires a complete redesign of the module.
I am not going into details of your questions, because that's going to add to bikeshedding mostly :-) But I do have some opinions. Most of the heavy lifting will be in module_utils and be as transparant to the module writer, as it is today. So parameter handling will just happen, and you'll get an object that's easy and transparant to iterate over so you can basically reuse as much of the existing logic as possible. That will be the goal.
It would be nice to have something like this for our Bitbucket modules:
bitbucket_pipeline_known_host
(ansible/ansible#54238)bitbucket_pipeline_variable
(ansible/ansible#54049)bitbucket_access_key
(ansible/ansible#54592)The rationale here is that these modules use API calls to manipulate individual entries, and can be significantly sped up, if the proposal by @dagwieers is accepted.
Proposal: Redesign Aggregates
Author: Dag Wieers <@dagwieers> IRC: dag
Date: 2017/09/15
Motivation
During AnsibleFest SF 2017 some issues came to light regarding the Aggregates functionality that was designed as part of the ansible-network roadmap.
This proposal does not question the merit of this needed functionality, but some design decisions require improvements before this is ready for wider consumption.
Various participants questioned the naming of this functionality, as the name does not seem to adequately describe the purpose of the functionality.
The interface was questioned as well, especially the following aspects of the functionality:
Some believe that rather than a parameter to the module, it should be a new state.
Some believe that this functionality should not be part of the modules, but instead be an integral part of Ansible core instead. There is no need to require a whole new interface, it could fit right into with_items, as long as the module indicate it can handle more than one item in a single run (e.g. supports_aggregation). This would then also replace the existing special cases related to yum, apt and others that already offer similar functionality.
Purging would be optional, you could add/remove multiple items in a single run without purging existing items. This would also be something that the module would need to indicate (e.g. supports_purging).
Handling failures is also something that should be considered better. If processing an item fails, should it continue with subsequent items (as with_item does). Should it also perform the purge when adding/modifying an item fails (probably not). So this requires we understand the various use-cases and problem-causes.
This functionality is useful in a wider context than just networking, so this should have been discussed with use-cases outside of networking only.
Participants wondered whether this functionality would have been acceptable when coming from the community and being reviewed by Ansible core developers. The conclusion was: probably not.
Problems
What problems exist that this proposal will solve?
Solution proposal
Documentation (optional)
Hold off with documenting and promoting this as part of v2.4, so we don't have to support this going forward in light of a user-breaking redesign.
Anything else?
The above feedback was provided during the AnsibleFest SF 2017 Network Working Group and was largely unanimous by participants.
This proposal was made to collect feedback from other participants in order to design a better user interface for this functionality before it becomes a wider standard.