Open rwagnergit opened 3 years ago
Files identified in the description: None
If these files are incorrect, please update the component name
section of the description or use the !component
bot command.
Possibly related to https://github.com/ansible/ansible/issues/70653.
Issue also exists in 2.11 (devel branch)
I'm looking over https://github.com/ansible/ansible/commit/8c213c93345db5489c24458880ec3ff81b334dbd and I'm not sure what the right thing to do is here.
I'm pretty sure that commit was addressing a symptom and not the problem.
The task definition should be templated using the host vars of the host in the loop, and as such, when post_validate
runs, I expect that the task is finalized.
However, in https://github.com/ansible/ansible/commit/1da47bfa8c6711e19902e4a1460d3276d33664e1 we made a change to not template vars
during post_validate
, and I'm questioning that change as well.
Because of the decision to not post_validate
vars
, we've delayed evaluating vars
until after we have substituted the vars of the delegated host, which is now incorrect, since the task definition should reflect the current host, not the delegated host.
I think we need to revert both https://github.com/ansible/ansible/commit/8c213c93345db5489c24458880ec3ff81b334dbd and https://github.com/ansible/ansible/commit/1da47bfa8c6711e19902e4a1460d3276d33664e1 and then implement a different fix for the issue that https://github.com/ansible/ansible/commit/1da47bfa8c6711e19902e4a1460d3276d33664e1 was attempting to solve.
Hi, as this is my first time tracking an issue with ansible, how / when can we figure out if this has been reverted as sivel is proposing ?
will this issue be updated with the fix details?
thanks
might be fixed by #72419
@bcoca and @sivel (and anyone running into this) - following #ansible-meeting this morning, you suggested using set_fact to create the variable I need in ansible_connection. I tried that, but I cannot set a fact for localhost using set_fact:
I tried:
- set_fact:
use_ansible_connection: "{{ (whoami_control_machine_output.stdout == 'awx') | ternary('ssh','local') }}"
- set_fact:
use_ansible_connection: "{{ (whoami_control_machine_output.stdout == 'awx') | ternary('ssh','local') }}"
delegate_to: localhost
To examine hostvars['localhost'], I used:
debug: var=hostvars['localhost']['use_ansible_connection']
but that yields "VARIABLE IS NOT DEFINED!" in both cases.
However, setting the variable with add_host works:
- add_host:
name: 'localhost'
use_ansible_connection: "{{ (whoami_control_machine_output.stdout == 'awx') | ternary('ssh','local') }}"
and then I can use:
ansible_connection: "{{ hostvars['localhost']['use_ansible_connection'] }}"
This works in 2.10. It's a clunky workaround, though, so I'm still holding out hope for bcoca's PR :-)
Just adding some more info here in case anyone else is following.
I'll be damned. Thanks @bcoca. Just needed:
- set_fact:
use_ansible_connection: "{{ (whoami_control_machine_output.stdout == 'awx') | ternary('ssh','local') }}"
delegate_to: localhost
delegate_facts: yes
And then use it as above:
ansible_connection: "{{ hostvars['localhost']['use_ansible_connection'] }}"
I tested this workaround successfully in 2.9.17 and 2.10.5.
I think that since this is three years later, it's safe to assume that you cannot use delegation with an inventory in which variables such as ansible_host
or ansible_become_pass
are templated? At least not unless you know that the controller and target use the same dependent variables, which of course you won't know.
I'm a relatively fresh ansible user and it's incredibly difficult to predict what ansible does with templated variable definitions at any moment of time.
My personal best practices are becoming:
This is a rather frustrating experience.
I think that since this is three years later, it's safe to assume that you cannot use delegation with an inventory in which variables such as ansible_host or ansible_become_pass are templated?
@mutech no, you should be able to do so and have them templated in the 'correct host context'. All connection/become/shell options are templated in the DELEGATED host context, not in the inventory_hostname
one, the only variable that still refers to the original host is inventory_hostname
, even inventory_hostname_short
refers to the delegated host.
All other variables and options are templated in the inventory_host
context. I hope this clarifies templating for you.
I'll look at the documentation to be clearer, templating should be widely used to avoid copy and paste.
Of course inventory_hostname
is what I use for ansible_hostname
and querying things like become passwords from the keyring.
How do you know that? I did not even know that inventory_hostname_short
exists and I'm pretty sure I read everything about variables (and forgot it again because it's too much). That might be a life safer for my current problem, I was actually about to go all copy and paste in my fury (spend a whole afternoon trying to delegate a single task "cleanly").
Just found the documentation, though I have no idea what the short version of the hostname is. I decided at one point to use unqualified hostnames in my inventory (to avoid having to quote everything). So I might be lucky and short hostnames might be the same as long ones.
But that would still be a hack in my view. I would expect that a task running in delegate_to has exactly the same scope as it would have if it would run in a playbook on its own (maybe with the exception that there might be an outer scope providing the view of the playbook, f.e. as in "
That in my opinion the only way how you can get a reliable and understandable context for a delegated task.
How do you know that?
I've either written and/or read and/or modified the code ... but that is cheating and I could not find docs, so I opened: https://github.com/ansible/ansible-documentation/pull/527
FYI:
inventory_hostname_short = inventory_hostname.split('.')[0]
so if no '.' they are basically the same.
I'll update https://docs.ansible.com/ansible/latest/reference_appendices/special_variables.html#term-inventory_hostname_short to make it clearer.
"
.inventory_hostname").
That is what hostvars[inventory_hostname]
should provide. Do you have examples in which this is insufficient?
Correct me if I'm wrong (I didn't have the time to read the source), what happens when a task is executed with delegate_to
is that some variables are adapted to the delegation targets context, and some are not, e.g. inventory_hostname still is whatever it was before the delegation, while inventory_hostname_short is set to the delegation context, did I get that right?
The problem with that is, that without reading the source (every time I'm not certain that I did that for the ansible version I'm running on at that moment in time) I don't really know which variables I can use in my setup.
That's why I'm suggesting that a simple rule (simple for the user, for the ansible implementation it's probably a big change, especially considering that it's breaking code all over the place) such as all variables get set as if the task was executed in a playbook with the delegation target as host.
Using hostvars[some-inv-host] is not enough, because these variables are subject to templating, thus they might use references from "the other" context, hell I can imagine that even entries from hostvars[delegating-host] are invalid, if some variables from hostvars[delegation-target] change because of connection variables updated for delegation.
I don't see how that can ever work reliably without scoping (making sure that templating always happens in a well defined scope/context). By well defined I mean both well documented by ansible and well understood by users.
That is what hostvars[inventory_hostname] should provide. Do you have examples in which this is insufficient?
My setup looks like this (inventory pseudo code):
all:
vars:
become_key: "ansible-become:{{ inventory_hostname }} {{ ansible_user }}"
ansible_become_pass: >-
{{ lookup('community.general.keyring',
become_key, errors='ignore') | default(omit) }}
# various configuration data that is shared between home and hosted
children/home:
vars:
ansible_user: mu
become_key: "ansible-become:home {{ ansible_user }}"
children/hosted:
vars:
ansible_user: admin
The general problem is, that I often need "the view of a host", for example when I need to create a cloud-init fragment, I need the ansible_become_pass
for some VPS and its ansible-user. I would like to use hostvars[vps-name].ansible_become_pass
, but that does not work, because it's a template using inventory_hostname (and has to be, unless I copy&paste). delegate_to
would not work, because lookup is local.
Stangely, delegate_to
in conjunction with become: yes
sometimes work with this or similar setups and sometimes doesn't. It works in the home group context, because I have ldap and thus always the same pw, but cross group delegation also sometimes works (or it did in the past when I first tested this or a similar setup).
All this is not about the original problem with delegate_to
, but as I understand it, it has the same root cause (lack of context/scope when using templated variables in different context like delegation/imported or included roles/playbook items/etc.).
I have to admit that I am probably using ansible in an unintended way. I have a lot of configuration data in my inventory (the whole network configuration, how networks are connected, service specifications, service deployments. It's pretty much what you find in etcd in a k8s cluster). Most of my playbooks are named using the theme of 'make it so' and the 'it' is something like update, DNS, nextcloud, freeipa, etc. I also have some state in the inventory (hostvars/groupvars folders) which is really nice because --diff --check provides a good overview of what's going on.
is that some variables are adapted to the delegation targets context, and some are not,
It is not the variables themselves that are changed, it is the configuration options for the connection related plugins that are templated using a different set of variables. Any other option/parameter/variable/field will continue to use the variables in the inventory_hostname
context.
For example remote_user
is commonly set via either ansible.cfg, command line, environment variable or the ansible_user
variable. For this setting we always consider the host we are connecting to and all variables, except inventory_hostname
will be sourced from that host. 'That host' defaults to inventory_hostname
, but can be the delegated host.
Also you assume ansible_user
and other variables will reflect the configuration, this is sometimes true, but not guaranteed, specially for become information. These variables are meant as a high precedence way of setting it, not reflecting the configuration. I would switch to the config
lookup.
become_key: "ansible-become:home {{q('config', 'remote_user', plugin_type='connection', plugin_name='ssh'}}"
^ this also takes into account all configuration sources, include higher precedence variables like ansible_ssh_user
, which for the ssh
plugin would override ansible_user
.
The only caveat, is that it always uses inventory_hostname
context (I should add host=
paramter for being able to specify context).
Is there a way to evaluate a template in the context of a host? F.e:
- hosts: [ localhost ]
tasks:
- debug:
msg: "{{ lookup('???', 'some_variable', inventory_hostname='some_host') }}"
This is supposed to print exactly the same output (even if some_variable
is defined as "{{ inventory_hostname }}-{{ ansible_user }}-..."
as this:
- hosts: [ some_host ]
tasks:
- debug:
msg: "{{ some_variable }}"
If there is a lookup '???' that does this, most of my issues go away. For my configuration, the only key that is relevant (as far as I can see) is inventory_hostname
. I believe that ansible_user
is (at least in my setup) deterministic, but you are right, the config lookup would be a better way to access the configured user. Using the connection plugin type and ssh as plugin ensures that variables are set so that delegate_to would be taking into account?
(I should add host= paramter for being able to specify context).
I'm not quite clear if I understand what the config
lookup does. If it would do what the example above shows, then yes, please pretty please add that parameter :-)
Btw., Thanks a lot for taking the time to go into so much detail. That's very much appreciated!
There is not currently a lookup that can do this, they don't get the required info to reconstruct the variable view.
I'm looking into a way to update the config
lookup to do this. This lookup is designed to access configuration the same way the plugins do, so the resolution would be the same, the one problem is 'host context' which we currently provide to the plugin indirectly, via the variables available.
I have to apologize for my initial snappy comment, looking at how you reacted to it, I feel a bit like a neanderthaler grunting in displeasure. Huge thanks for your efforts, especially in reaction to that kind of attitude!
Just for clarification, looking at this:
- hosts: [ localhost ]
vars:
some_extra_var_a: ...
some_extra_var_b: value_to_be_overridden
tasks:
- debug:
msg: "{{ lookup('???', 'some_variable', inventory_hostname='some_host', some_extra_var_b='...') }}"
- hosts: [ some_host ]
vars:
some_extra_var_a: ...
some_extra_var_b: ...
tasks:
- debug:
msg: "{{ some_variable }}"
Assuming that some_extra_var_a
/b
would be defined the same, would that too result in some_extra_var_b
overriding whatever might be defined for b
?
That's not a feature I need, but I think that if that's not working, similar problems might pop up in other use cases (once someone discovers config
had this semantics)
I'm not sure what you are trying to do , but var precedence would have the vars you declare at play override host vars. So you would already be overriding them w/o having to have a special facility on the lookup.
I'm still thinking of the delegation problem, where you might need the delegated to host's view on things.
Having config
set inventory_hostname
would take care of obtaining the correct configuration of the target from the inventory, being able to override variables set by the playbook would take care of changes made by the playbook for the sake of the controller (localhost or more generally the delegation source) and override them to suit the target.
This is a bit construed, because if you keep the configuration data mostly in the inventory, everything should really depend only on the inventory_hostname
and not on additional parameters set on the outside.
An example for why you might want something like this could be to have a parameter valued test
and production
where you might want the actual target configuration be identical, but the controller would need to know the difference, f.e. to setup a virtual network environment for testing while the target would use the same network parameters.
If the config lookup only sees the inventory data (and not whatever variables have been set in the playbook), there should be no problem.
I find that a bit difficult to explain, I hope my point comes across.
What I think you are missing is that playbook variables override host vars, this does not change for delegation.
The config lookup sees the 'current variable context', it includes, but is not limited to inventory data. This context normally belongs to the inventory_hostname
, what I'm trying to do is to see the delegated/arbitrary host context
, which would still include overrides from the play itself.
So it would present the data as it would behave in the context of the play, not ignore it. It should work as close as the actual engine resolution for the config values does, otherwise it will be misinforming the user.
I'm not sure if I'm stretching your patience with this discussion or my inquiries, please tell me if so or when this gets too much off-topic.
I love Ansible's Inventory concept, because it's so expressive and you can beautifully sort configuration data into groups and override them for sub-groups and hosts. You can even put some external state (f.e. hosted DNS or VPS info) into group-vars files. Awesome.
Once you use this however, things get naturally complex. The author of configuration data has only the inventory available, they can't know what variables playbooks or roles use. The author of a role does not know whether their tasks are used in a delegated context and what the implications of that are. And - as this issue shows - the author of a playbook needing to delegate a task to another host often also does not know what the implications of that delegation are.
Now I understand (to some extend) how variable resolution works in Ansible. I can't really remember the process (it's too complex), but yes, the playbook can override variables and late evaluation of templates will have the most often expected results.
When you told me about the config
lookup, I hoped that this would deliver the inventory authors view on data, that is the view that is "untained" by whatever playbooks and/or roles define.
I actually have no idea what the config
lookup is actually doing. The documentation says:
Lookup current Ansible configuration values
I would take that to mean that it parses ansible.cfg
and other sources of ansible configurations. I don't really understand why you would need to specify a plugin and plugin type for that though. From what you wrote, I understand that it does that in the current context, meaning with all the variable definitions and other deviations created by that context. So it actually seems to be a variable lookup that evaluates templates!? You proposed to exclude or further override inventory_hostname
so that it can be used in a delegation context to obtain the variables for that host. Again, that's great, but it solves only part of the problem that delegation poses, because I guess most people understand delegation to work as "execute that task as if it was executed in the target hosts context".
The root of this whole thing is probably the expectation that when you write a task as a playbook author, that this mostly behaves like a method call, the execution depends on the parameters passed (and the external state). In the case of delegate_to
, this is or was not true, because of "magic variables" (I remember them being called that) that are not all following the delegation. In the case of roles, it's even less true, because they don't have parameters, they operate on a global state inherited from whatever calls them. The define an interface (argument_spec), but that interface is incomplete, because of templates. So you never really know what happens when you import and much less include a role.
My original snappy comment about copy & paste was about controlling who get's to see which information, because I simply cannot manage the complexity involved with variable evaluation in various contexts. I found a way to code myself through this jungle in most cases, but every now and then I scream at my screen and don't get it why this or that variable is wrong and why I can't make it work without analyzing the entirety of my (huge) inventory, roles, collections, tasks, playbooks, keyring or vault definitions. I even tried to debug code running on the target to find out that this is utterly impossible.
The inventory is a holy grail for me, because it ideally has a single dependency (if the author is doing the right thing), inventory_hostname
. This is the basis on which every playbook operates before it overrides variables. It would be great to be able to access this particular information from anywhere.
Understanding how configuration and variables work in Ansible is not easy, we have made an effort to document both:
Sadly variables started with a basic design, but then grew organically and to meet 'real life' needs and got very complicated. But if we try to remove this complexity, we also remove the ability of people to function in many of real life's complex scenarios.
We normally suggest people start with 1-3 ways of defining things and expand as they get more experience, there is a lot of nuance and no lack of undefined behavior (we have been reducing it for a long time, but still some left), specially when you add roles into the mix.
The "magic variables" (really hate the name) are mostly 'variables that give the user access to engine information', sadly we have overlap with 'variables that let the user SET the engine information' like ansible_user
. But this is inconsistent and not always correctly reflects the engine, it is also why i created the config
lookup.
my current actions:
config
lookup docs from your feedback #81951, hope this clarifies somethings. config
is also 'keywords' that appear in the play, so it is not always 'correct', but that was an option I've been looking to pass in also.config
option to use 'alternate hostname' or at least be 'delegate to aware'.I understand your want of a 'subsetted view' of ansible variables and configuration items, but that is not currently possible and I would even say, can be misleading to users as 'that view' can then be modified/overridden elsewhere. But this is also why the config
lookup has a show_origin
option, to allow the user to track down the source of the 'current' resolution.
You know that I do not have a premium support contract, right? This is how it must feel to get the VIP treatment :-) Thanks for your work!
[..] but that is not currently possible and I would even say, can be misleading to users [..]
I would not propose to change existing functionality for that purpose. Using an isolated mechanism such as a query or lookup, maybe a new one, should however be a stable change not affecting the rest of the ecosystem?
as 'that view' can then be modified/overridden elsewhere
I'm not sure I understand that part. A lookup would only provide a view on existing data, that can of course be manipulated by existing means. The difference to the current situation would - as far as I understand - be only that now you wouldn't know about such overrides because you can't see them, you only see the final values when templates are accessed (at the module/action level). If you meant that the view itself (the lookup) can be modified, I don't see that as a problem as the one doing that has to take over responsibility for that change (know what they're doing/test/...)
But I guess that's academic if it's not currently feasible to provide that view anyway.
Ansible, despite the problems I have with it, is a great tool, probably even the best of breed. But I feel it's falling short of what it could become which sadly is also what it is currently being used for. It's incredibly hard to create reusable components (trying to evade terms like module or role) unless you're an expert with a strict catalog of best practices engraved into you mind.
I think it's time for a version 3 with loads of breaking changes, as much as I hated to see python2/3 cataclysms in Ansible. If you're looking for a rookie users point of view, I had quite a few more suggestions resulting from my experience. :-) I'm getting more and more comfortable with time, but it's a steep learning curve and a stony path.
updating config lookup docs from your feedback https://github.com/ansible/ansible/pull/81951, hope this clarifies somethings.
Great, now it's clear to me what config does!
You know that I do not have a premium support contract, right? This is how it must feel to get the VIP treatment :-) Thanks for your work!
People pay for support?!?! .. kidding aside, I responded to your points as they were hitting something most devs that have worked on core are aware of but we have not been good at explaining and distributing that knowledge to the users, from this ticket alone I think we have made some docs much better and clearer. This is not 'me' supporting 'you', this is core devs working with community members to make things better for the Community (in capital I include both 'free' users and 'paid' support subscribers). IMHO, this is the great strength of OSS.
Also once I post these explanations I can link to them when topic comes back up!
as 'that view' can then be modified/overridden elsewhere
What i mean is that a 'view of configuration just accounting for inventory' is misleading as not accounting for other config, environment, CLI parameters extra vars, play vars, role vars, ... will end up being misleading users to think 'this is how it will work', while there are dozens of ways to override it that are not obvious. Again, this is why config
includes a show_origin
to return the 'winner' of the configuration precedence battle royal.
But I guess that's academic if it's not currently feasible to provide that view anyway.
Maybe I spoke too soon, we don't have anything that does this now .. but I have thought of ways to get this, it is just a lot of work and change in core systems that might not be worth the return. Still, don't get your hopes up, core dev time is one of the scarcest resources on the planet.
Ansible, despite the problems I have with it, is a great tool, probably even the best of breed. But I feel it's falling short of what it could become which sadly is also what it is currently being used for. It's incredibly hard to create reusable components (trying to evade terms like module or role) unless you're an expert with a strict catalog of best practices engraved into you mind.
That is something we are aware of, plays and roles are not reusable 'by default', but can be made so by following a set of rules and parameterizing certain things. We have taken several steps to help with this and auto document, things like 'role args spec' are a step in that direction.
I think it's time for a version 3 with loads of breaking changes, as much as I hated to see python2/3 cataclysms in Ansible. If you're looking for a rookie users point of view, I had quite a few more suggestions resulting from my experience. :-) I'm getting more and more comfortable with time, but it's a steep learning curve and a stony path.
The more our user base expands the harder it is to make such changes, the 1-2 was hard enough and even the 2.x-2.6s with several 'adjustments' that came from the 2.0 shift. Even 2.9 to 2.10+/collections, which was 99.99999% backwards compatible was seen as a bit steep by many. Too many people still use 2.9 cause they think they need to change all plays to use FQCN .. they DO NOT!! . It is just that the 'devtools' (ansible-lint
) and docs do favor a 'collections world', but any big change has misunderstandings like this (I will say this one last time with_items
IS NOT DEPRECATED and you can still use yes/no
as booleans!!!).
The more our user base expands the harder it is to make such changes
You only have one chance to get something right in software engineering, and that's before there are users.
I very much appreciate Linus' attitude towards breaking changes, but he is arguing based on a solid foundation of time tested concepts and established standards (UNIX/Posix/...).
I don't think that Ansible can continue to evolve in small increments fixing pain points. At some point there will be an alternative that copies the awesomeness of Ansible and combines it with a concept that supports engineering requirements that Ansible cannot fulfill with its current architecture. There are many things that can be improved incrementally, but I think there are some hard limits, and what you say about core-dev hours is exactly what I mean.
The breaking points that cannot be easily fixed without a remake are these:
One of the best features of Ansible is that its agent-less. But with ansiballs and a distinct lack of marketing/library support for raw actions/modules, this feature is eroding. The reality of Ansible is that it actually uses agents, they are just not well defined. The agent is whatever python code gets uploaded to the target. It's true that there is no service running on the target and there is no dedicated installation, which is both good, but the amount of logic required to make that work and the side effects of this strategy are just too complex. You can debug code on the host. Permission issues as a result of cascading elevations which are sometimes necessary keep popping up. There is a never ending pain resulting from version requirements for python and nested dependencies. And Ansible is incredibly slow, compared to doing stuff via ssh/shell, even considering all the advantages. SSH connections work well. Try using something like LXD connections, it works - unless your setup is not exactly the mainstream configuration, otherwise you will fail.
On top of that, there are mysterious hickups. I keep seeing ansible tasks hang forever with nothing going on at the target side. If something like this happens, will there be secrets in one of the files on the target/.ansible folder? Will they be cleaned up? I don't know. In our time of constant assault of anything facing the internet, the obscurity of a .ansible folder is probably not enough systemic protection.
The concept of a "connection" is also not the right abstraction. An SSH session is so much more than a shell, an LXD connection is just a shell and no choice of user, but you might get that shell from any node in a cluster. An API could well be a connection, just not to a shell. And there is different shells, no shells, windows, file systems and the list goes on. Ansible connections are the best choice if you had to choose one, but why should there only be one type of communication endpoint? Answer is of course to make it simple and that works beautifully, until you try to work with something like an LXD connection and see that it's not working (and probably cannot really be made to work without incredible amounts of time nobody want to invest).
Roles are a misnomer. Ansible roles are not roles, because they are not assigned and revoked, they are executed. They are also not roles, because they have parameters, so they can at best be role assignments.
To my understanding, Ansible is (other than terraform/docker images/etc.) a tool that supports management of live machines. Ansible is not primarily (only also) a provisioning or deployment tool. If a user decides for whatever reason that they want to maintain a dedicated long lived server instead of provisioning an instance of a deployment, roles will over time almost certainly have to be revoked. I tried to use ansible roles for that purpose and this is just not working without a whole stack of support software/effort. I thought about creating meta-deb packages to record role assignments, until I found out that if I go that far, I don't need the roles and can just implement these packages. OS packages have a bad reputation because it's so complex to maintain them, but somebody has to face these complexities or somebody suffers the consequences.
In practice, roles are most often used as modules, for reuse. But they are not modules either. They lack any concept of information hiding. You can compensate for that by adhering to conventions, but you can also do that with 6502 assembler and I actually did that, but it was not a great experience, neither 6502 nor Ansible roles.
With all the code out there using roles, you can of course not change the semantics of roles. It also didn't work to say that logic should not be implemented in tasks or roles, because the paywall for writing modules is just to high for all those users who chose Ansible because they appreciate the simplicity and (development) efficiency of composing tasks in yaml.
Actions should be as easy to write as roles, just in python and not yaml. I should have said "are as easy to write" but that would be a lie. That can be changed by providing a backward compatible abstraction, but there must be a reason why this is not part of Ansible 2.15.4 (my current version). I know, your time is limited. But isn't that the problem? If there are more important things to be done than to implement one of the core principles of Ansible (don't implement logic with tasks, use modules), that hints at a serious long standing issue.
I grew up with Eiffel and "Design by contract" is one of my personal principles when developing stuff. I'm really suffering from the chaos that is interface specifications in ansible. If I want to create a clean plugin, I have to document the specification in README's, code documents, sometimes in yaml files, then I have to pass python typings or watered down versions of it and yet I still have to implement argument validation manually. I tried to use ansible-doc to consume this information, but that only works sometimes, maybe often, but not always. Very often, I have to look at the source code. If I was a core team member, I would know the details of why and how and where, because it would be part of my daily routine. I am not (though I would seriously consider an offer - just kidding).
Ansible has a wonderful ecosystem with loads of very competent module/collection developers, but like you, they only have so much time. I'm an Ansible rookie, but I spend decades developing software, so I'm really not afraid to expose myself to complexity. But the chaos and complexity of Ansible module development is quite a paywall to overcome (part of that is that I spent decades evading Python, you got me there though - Ansible motivated me to look at it, thanks for that).
Ansible has some support for handling secrets, but this is not a first class feature. In the end, secrets are just publicly visible variables that can be hidden behind figue leafs. There are plugins available that can be used to fix many of the problems, but I guess we all know that this is insufficient. I'm not aware that Ansible was the source of a major leak or security catastrophe, but I honestly find that surprising.
What I would expect would be plugin types for obtaining and deploying secrets, that prevent clear text secrets from ever appearing anywhere but at the deployment interface (f.e. when they are passed to a process or being written to a final TPM/disk/... location). I don't know if today secrets are actually put into Ansiballs but I guess I really don't want to know.
The question I'm asking myself is if such secret plugins would be at all possible to implement in a finite time as long as there is no direct secure communication channel between the source (a vault) and destination (process/file/...) of a secret. To make such a setting auditable, it had to be a first class feature.
I'm going to stop ranting here, but I could continue pointing out flaws you probably know much better than I do.
Despite all of this, Ansible does many things so well, that people are still more than just happy with it. But these features are well known, just as Ansibles flaws are obvious and for many of these flaws, there are established solutions.
I think that the Ansible team is best qualified to create a reincarnation of Ansible that preserves it's awesomeness and removes the worst flaws, so that you core team guys can spend time on awesome instead of having to stick figue leaves all over the place.
But that's of course just my 2 cents. I hope I'm not coming across as too patronizing. I just really love Ansible and at the same time I keep looking for alternatives because it's really torturing me.
On the flip side, if I would be the product owner of Ansible and had a huge budget, this is what I wanted to have:
A foundational abstraction of commands and queries, not unlike Ansible lookup/query on one and module/action on the other side, but with a single and more powerful specification mechanism for contracts (that's a lot of Eiffelisms, but Meyer really did a great job there).
JSON schema would be a good starting point for describing parameters, results and states, it should be extended by the most commonly seen data types and maybe some extensions specific to ansible. That would create an overlap with things like OpenAPI and that in turn would make it possible to look at API's as one kind of "connection", which could make a whole lot of custom modules in use today obsolete and add modules. It could also replace ansiballs, by using temporary service APIs encapsulated in SSH (and other capable connections). All tunneling, jump-hosted and otherwise obscure channels would be available to reach obscure devices and all existing (Open-)APIs not requiring weird authentication become valid targets.
Commands and queries would be applied to a "context" or "system". By default, the context would be the controller (actions) and while executing a playbook, there is a "current target" (modules). All contexts/systems are represented as a node in the inventory, but any such node can have multiple connections of different types. A system has an associated set of configurations or more generally information, that is strictly associated with that system and not to how or from where it is accessed (this is additional information that is available in a "connection context").
The contract of a command knows the state that is or might be affected by the command. That is to automate the implementation of diff functionality. As a result, diff functionality can be implemented by core using a defined data structure and the module/action had to provide the adapter. That of course would be done as a collection of queries that - in case of actions - might be lazy evaluated. Preconditions can be failure conditions or wait-conditions. No need for handlers if any task can handle events (~ wait conditions). If you know your contracts are complete, you know what can run in parallel.
In a check scenario, queries operate not (necessarily) on the real state of the affected systems, but on contracts of commands (looking at the state a command will produce if successful). This can be extended to implement flow analysis.
Ansible facts become well known entities, so that commands can declare as part of their contract which facts are or might be affected by them and which facts are affecting commands (pre/post conditions). Facts are just cached query results and as such always associated with a system and not with an execution context (like the running playbook). You don't need to declare which facts you need, by using a fact, a task depends on it and the gathering is automatic, as well as cache invalidation, at least as long as commands properly declare the facts they might influence.
You create new commands and queries by specifying their contract in yaml, and then you either implement them using tasks or generating boilerplate python plugin code (or just put code in yaml text blocks like you would for shell commands).
If "systems" are not primarily hosts but could also be things like f.e. an internet service or a database instance, they could have their own connection types and be target of commands and queries specific for their target type. They would benefit from all the awesomeness that is the ansible inventory.
Commands and queries can be published with a very small granularity, no need to assemble them into collections. The argument for that is the same as that for micro services vs. fat APIs. This requires high quality interface specifications, otherwise you just get a mess and are better served by collections.
Roles on the other hand could become inventory objects (if they are needed at all, groups basically do much of the same thing).
I think that most of what I suggest here are actually only minor conceptual shifts, but they would have a huge (as I think positive) impact. A lot of existing code (collections) could probably be reused with little to no change, they would just not benefit much, seeing as they don't provide the meta data of full blown contracts.
This is not a thought out concept, just what I would do if I had the money and time to work on it. Well if I had that money, I would not work on it, I would go sip magheritas or something. But that's another story :-)
Something like this would be a major overhaul, but do you think it would be too risky, considering the benefits and development perspectives this would provide? If I was Redhat, this is what I would do immediately, rather than to try quickfixing issues by encapsulating ansible in a tower.
I'm flagging these last things 'off topic' as they are unrelated to the issue,(not because I don't think you make interesting points, some of which we have been debating in core for years now, they would be better hosted at:
https://github.com/ansible/proposals and/or https://forum.ansible.com/c/project/7
If you want to continue this conversation and have a wider audience.
SUMMARY
Beginning in ansible 2.9.10, tasks which use ansible_connection that is dynamically evaluated (i.e., from a jinja2 expression) fail.
ISSUE TYPE
COMPONENT NAME
ansible_connection
ANSIBLE VERSION
CONFIGURATION
OS / ENVIRONMENT
Ubuntu 16.04 uname -a Linux localhost.localdomain.na.sas.com 4.4.0-193-generic #224-Ubuntu SMP Tue Oct 6 17:15:28 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
STEPS TO REPRODUCE
Create a hosts file containing a host that is NOT localhost
Then run the following playbook:
EXPECTED RESULTS
On ansible 2.9.9, the playbook runs without errors:
ACTUAL RESULTS
On Ansible > 2.9.9, it fails. Here is the output from 2.10.3:
Note that 2.9.10 and 2.9.11 fail with a different error: