Closed csuka closed 3 years ago
You can already do this with any_errors_fatal
at play level or an meta: end_play
task with a conditional.
You can already do this with
any_errors_fatal
at play level or anmeta: end_play
task with a conditional.
Unfortunately, it isn't as easy as that.
When using any_errors_fatal
the task would still execute on all hosts. This is undesired. I'd like Ansible to stop/abort further execution when a task has failed on one host.
And meta: end_play
is it's 'own' task. I don't see how that could help me out, since the failing task still would've been executed on all hosts.
You could iterate over the hosts manually in a run_once task, and delegate to the host. Then you can use the following construct (I'm using the file
module as a conditional fail):
---
- hosts: localhost
tasks:
- file:
path: /tmpASDF
# state: absent
state: "{{ 'absent' if item != 'host-2' else 'asdf' }}"
loop:
- host-1
- host-2
- host-3
- host-4
- host-5
when:
- item != 'host-4' and (result is undefined or result is not failed)
register: result
- meta: end_play
when: result is failed
- debug:
msg: Will not happen
In this case, host-4
is intentionally skipped, and host-2
always produces an error (except if the other state
is commented in, then it always succeeds).
If you have 100s of hosts this will be very inefficient, but if you only have a couple of hosts, this should do what you want.
any_errors_fatal + serial: 1 is what will behave as you expect
@bcoca please read my request carefully. I cannot use serial: 1, as I'm developing a role. Requiring serial: 1 could not be a requirement of using the role.
@bcoca that only works well if you can put that task in its own play. It's not possible to do that in a role.
@csuka I think I missed your workaround, which is pretty similar to mine :) For me, my approach also works when skipping the first item.
@felixfontein Ehm.. I don't quite get your playbook. You set hosts: localhost
, which is not functional in a role.
Also, you're statically defining hosts yourself, which should be done dynamically.
And I cannot get it working properly, can you perhaps share a pastebin of the desired outcome?
Besides these points, do you agree that this function should be a parameter available in Ansible?
@csuka I've created a small playbook which can run on any system without knowing anything about the hosts you have. You of course have to loop over the hosts, and add run_once
and delegate_to
accordingly. When running this over a group of hosts which include host_which_fails
, host_which_is_skipped
and some more, the playbook looks like:
---
- hosts: all
tasks:
- file:
path: /tmpASDF
# state: absent
state: "{{ 'absent' if item != 'host_which_fails' else 'asdf' }}"
run_once: yes
delegate_to: "{{ item }}"
loop: "{{ ansible_play_hosts }}"
when:
- item != 'host_which_is_skipped' and (result is undefined or result is not failed)
register: result
- meta: end_play
when: result is failed
- debug:
msg: Will not happen
I don't get it, it happens. I quote myself ;
"There is a workaround to solve this issue. However, this is not an elegant solution. And this workaround does not work when the first host has to skip this task, due to the definition of 'run_once'."
host-01:
host_which_fails: false
host_which_is_skipped: true
host-02:
host_which_fails: true
host_which_is_skipped: false
host-03:
host_which_fails: false
host_which_is_skipped: false
TASK [debug] *********************************************************************************
Tuesday 08 September 2020 21:23:39 +0200 (0:00:01.026) 0:00:02.365 *****
ok: [test-multi-01] => {
"msg": "Will not happen"
}
ok: [test-multi-02] => {
"msg": "Will not happen"
}
ok: [test-multi-03] => {
"msg": "Will not happen"
}
Hence my feature request....
Have you tried
---
- hosts: all
tasks:
- file:
path: /tmpASDF
# state: absent
state: "{{ 'absent' if item != 'host-03' else 'asdf' }}"
run_once: yes
delegate_to: "{{ item }}"
loop: "{{ ansible_play_hosts }}"
when:
- item != 'host-01' and (result is undefined or result is not failed)
register: result
- meta: end_play
when: result is failed
- debug:
msg: Will not happen
instead of replacing strings with variables?
I'm not sure how likely it would be to see a feature such as abort_on_fail
implemented. Under the current architecture of ansible, it would require that abort_on_fail
force the task to also use throttle: 1
, and add extra state checking specific to just that use case.
I think more realistically, this is something that could be implemented in a custom strategy plugin.
@sivel I disagree. There is no need to have the task force to use throttle: 1
. One could also decide to use throttle: 2
for example. So simply checking for throttle
is sufficient. It shouldn't be that hard, because Ansible already has this function for other parameters as well.
Again, this feature is desired to be used in a role. So using custom strategy plugins isn't desired. Don't we think this function has to be in Ansible Core?
@felixfontein besides the first host, the other hosts are running the same command, in this case /bin/false
. Meaning the configuration you provided wouldn't suffice, right?
Maybe your request isn't specific enough, but what would be the value for abort_on_fail
? Your example means that if any single host failed to execute a specific task, that the play would abort and prevent all other hosts from executing. Unfortunately, with throttle
/serial
> 1, there are already other hosts queued and executing, which means that we cannot ensure that we stop executing immediately, it would be "eventually abort if a currently executing host fails".
In which case, it sounds like setting any_errors_fatal
at the task level is what you really want.
@csuka if I replace file: ...
with shell: /bin/false
it still works fine for me, even if the first host is skipped.
@sivel that's exactly what I don't want. The idea is indeed "abort if a currently executing host fails", on task level.
However, I got what I desire indeed working, with the following configuration:
# note that test-multi-01 has host_which_is_skipped: true
---
- hosts:
- test-multi-01
- test-multi-02
- test-multi-03
tasks:
- set_fact:
host_which_is_skipped: "{{ inventory_hostname }}"
when: host_which_is_skipped
- shell: /bin/false
run_once: yes
delegate_to: "{{ item }}"
loop: "{{ ansible_play_hosts }}"
when:
- item != host_which_is_skipped
- result is undefined or result is not failed
register: result
- meta: end_play
when: result is failed
- debug:
msg: Will not happen
Thanks @felixfontein . Does this feature request still suffice? As in my opinion, this is still a feature I'd like to see in Ansible, and I do see the added value of it.
The idea is indeed "abort if a currently executing host fails", on task level.
You can use any_errors_fatal: true
on the task level.
Right, my mistake. But then, the task still fails on all hosts. And does not stop execution when the task has failed on a single host.
Ok, I have had a really difficult time understanding exactly what you want. I'm still not sure I fully understand, so let me try to explain based on what I have randomly guessed at.
You are running a task, and let us say there are 10 hosts in your inventory. If any one of those hosts fail, within the executing batch, you want to prevent any further hosts that haven't been queued to not execute, and for the play to abort immediately on any single host failing?
Assuming the above is correct, I think any_errors_fatal
could potentially be made to do that, without introducing a new keyword. Right now any_errors_fatal
is only consulted at the end of a full task execution, and not as a short circuit based on the host batch.
@sivel I apologise for not explaining it properly or clear enough. Your 'random guess' is indeed correct. And I managed to do that with the tasks I've 2 comments ago. I'll edit my initial request with your info to clear things up.
I'm going to close this for now as this seems an extremely rare corner case and more fitting to a custom strategy. If you want to revive this proposal feel free to bring it up in the core public irc meetings https://github.com/ansible/community/issues?q=is:open+label:meeting_agenda+label:core
Proposal: Abort further task execution, with multiple hosts
Author: K Csuka
Date: 2020-09-08
Motivation
Feels like this feature should be in core design
Problems
In short:
"You are running a task, and let us say there are 10 hosts in your inventory. If any one of those hosts fail, within the executing batch, you want to prevent any further hosts that haven't been queued to not execute, and for the play to abort immediately on any single host failing." This should be usable in an Ansible Role. So using
serial
is not possible.Background: I've been developing an Ansible role to configure a database. It is possible to setup a cluster with multiple hosts. The role also supports adding an arbiter.
The arbiter does not has the database application installed, but this host is required in the play.
Imagine three hosts. Host1 is the arbiter, host2 and host3 have the database application installed. Host2 and host3 are setup in a cluster. The hosts are setup by the Ansible role, and Ansible managed. Now, Ansible executes the role for a second/third/fourth/whatever time, and changes a config setting of the database. The database requires a rolling restart. Usually, one writes some thing along the lines of:
The downside of this Ansible configuration, is that if the database application fails to start on host2 due to whatever reason, Ansible will also restart the database application on host3. And that is undesired, because if the database application fails on host3 as well, then the cluster is lost. So, for this specific task I'd like Ansible to stop/abort/skip other tasks if the database application has failed to start on a single host in the play.
Solution proposal
Create an ansible task parameter, which aborts further task execution on the other hosts if the task has failed on a host. Example:
abort_on_fail
Would output, in this specific scenario:
Dependencies (optional)
Multiple hosts in play.
Testing (optional)
Skip further task execution, when a host fails a task, and the parameter
abort_on_fail
is set.Documentation (optional)
It would make sense, yes.
Anything else?
There is a workaround to solve this issue. However, this is not an elegant solution. And this workaround does not work when the first host has to skip this task, due to the definition of 'run_once'.