Closed t-woerner closed 3 years ago
Wouldn't it make sense to extend the existing play-level serial
keyword instead of creating a new one that operates just at the task level?
Yes, using serial
keyword would be good. But adding this functionality to PlaybookExecutor
would require a by far more invasive change. The current implementation is very simple and not invasive.
Cannot the same result be had with a intermediate play with serial: 1
?
i.e:
- play: all
tasks:
....
- play: all
serial: 1
tasks:
...
- play: all
tasks:
...
@bcoca You cannot do this in a role, so it is pretty limited. Besides serial has a downside that it blocks for a single batch if one host is very slow. Instead being able to use forks on a single task, block, role or play would be very useful. Rationale: https://github.com/ansible/ansible/issues/24037
@bcoca We have playbooks that are using several roles. The task that requires the special treatment is within tasks/main.yml of one of the roles. The role is already used in two playbooks and will be used later in another role using include_role also.
With strategy:free the expected behaviour should be very similar as with linear: As long as there is one task in the list of tasks that are currently handled or queued, the number of used workers is limited to max_concurrent. With free other tasks that are executed at the same time as a task using max_concurrent are affected as well.
If you are using serial
, then you are limiting the number of parallel execution already. If there is additionally max_concurrent
, then the number of the executed tasks with this setting are further limited if and only if max_concurrent < serial
(serial is seen a normal number here). If max_concurrent
is bigger than serial
, then there should be no effective change.
I do not see how max_concurrent
could affect maximum_failure_percentage
more than forks
is able to do this right now.
max_concurrent
is not able to increase the number of workers that are used to process the taks in the playbook. It is only able to limit the already defined numbers of workers.
max_concurrent
will be affecting per loop forks
if they are specified at the same time and if max_concurrent < per-loop-forks
.
@dagwieers alternatively:
- task:
delegate_to: '{{item}}'
with_items: '{{ansible_play_hosts}}'
run_once: true
@bcoca In our use-case we have a terminal server that can only handle 4 concurrent connections reliably. So we need forks: 4 on a per-task basis.
@mpdehaan actually hinted at this possibility here: https://groups.google.com/d/msg/ansible-project/rBcWzXjt-Xc/_QCTljBcCG0J
The problem is for non linear, since forks:1 on task X seems to force all other tasks to wait, that does not seem right to me.
@bcoca Using the work-a-round with delegate_to, with_items and run_once correctly is not that simple as long as you are using registered results from previous tasks as we do. Also we might need to register the results of one of the affected tasks later. Only the execution of the task on the first host succeeded for me as the tasks on all hosts got the settings for the first host. This succeeded task was marked as failed and the playboook processing stopped completely even for the succeeded one. Therefore this is not a possible solution for us.
@t-woerner to use previous results, you can us hostvars[item]['resultvar'
and to consume results from that task you can do registeredvar['results'][ansible_play_vars.index(inventory_hostname)]
.
As for the 'failed' status you can use the registered var to make it 'failed' only if all results failed, i.e:
failed_when: regvar is failed and select(regvar, 'is', 'failed')|length == regvar|length
@bcoca Yes, with non linear strategy all tasks are affected while a task with max_concurrent is processed. This is something that I expected. If you want to we might simply limit the use of max_concurrent to the linear strategy.
Even if this work-a-round will be working for me at some point with showing failed and changed correctly it will only be able to handle the max_concurrent:1
case. There is work under way in FreeIPA to increase the number of reliably deployable replicas in parallel to more than 1. Therefore we need a solution also for the max_concurrent:2
or max_concurrent:3
cases.
Well, those can be handled with serial play, but not 'mid role'. If nothing else, we narrowed down the use case that cannot be covered by existing methods.
The limitation to 'linear' does not seem to be required by design, can that just be a limitation of how you want to implement it?
No, the limitation is not needed by design. The question now is if an entry in the documentation is sufficient that using max_concurrent
or how we name it in the end will have an effect on other tasks for non linear strategies.
The solution with delegate_to doesnt work with my problem. I need a variable loaded from non loop module (vmware_guest) - the max_concurrent solve my problem. With delegate_to, the nios_host_record fails:
TASK [configure host entry] ** failed: [testevmbrim002] (item=testevmbrim002) => {"item": "testevmbrim002", "msg": "Failed to connect to the host via ssh: ssh_exchange_identification: Connection closed by remote host\r\n", "unreachable": true} failed: [testevmbrim002] (item=testevmbrim003) => {"item": "testevmbrim003", "msg": "Failed to connect to the host via ssh: ssh_exchange_identification: Connection closed by remote host\r\n", "unreachable": true} failed: [testevmbrim002] (item=testevmbrim001) => {"item": "testevmbrim001", "msg": "Failed to connect to the host via ssh: ssh_exchange_identification: Connection closed by remote host\r\n", "unreachable": true} fatal: [testevmbrim002]: UNREACHABLE! => {"changed": false, "msg": "All items completed", "results": [{"_ansible_ignore_errors": null, "_ansible_item_label": "testevmbrim002", "_ansible_item_result": true, "item": "testevmbrim002", "msg": "Failed to connect to the host via ssh: ssh_exchange_identification: Connection closed by remote host\r\n", "unreachable": true}, {"_ansible_ignore_errors": null, "_ansible_item_label": "testevmbrim003", "_ansible_item_result": true, "item": "testevmbrim003", "msg": "Failed to connect to the host via ssh: ssh_exchange_identification: Connection closed by remote host\r\n", "unreachable": true}, {"_ansible_ignore_errors": null, "_ansible_item_label": "testevmbrim001", "_ansible_item_result": true, "item": "testevmbrim001", "msg": "Failed to connect to the host via ssh: ssh_exchange_identification: Connection closed by remote host\r\n", "unreachable": true}]}
@rodrigobrim your failures seem unrelated to this discussion, that is a conneciton issue.
Also there are no 'loop modules', loops are produced by Ansible around any module/action.
implemented as the throttle
keyword
Proposal: Limit number of concurrent executions for a single task
Author: Thomas Woerner IRC: twoerner
Date: 2018-07-16
Motivation
In a playbook with lots of tasks there might be one or more tasks that do have issues with parallel execution because of access limitations or conflicts while executing the task.
For FreeIPA replica deployments we have exactly this issue. There is one task out of more than thirty that can not be executed more than once or twice in parallel right now. This is due to a access limitation on the server side and also a possible conflict while being executed in parallel.
The limitation of the whole playbook execution to use one worker with
forks:1
will result in a very long execution time. The remaining tasks can be executed in parallel.Problems
play_hosts
.forks:1
Solution proposal
Add a new attribute like
max_concurent
to task that will limit the number of concurrent executions of the current task. Add an additional check toStrategyBase._queue_task
to reset the current worker id to0
iftask.max_concurrent
is greater than0
and if the current worker id is greater thantask.max_concurrent
. This is the same that is done for the number of the globalforks
setting (StrategyBase._workers
).StrategyBase._workers
containsforks
amount of workers.max_concurrent
is a task specific version offorks
.max_concurrent
is not able to increase the number of workers that are used to process the taks in the playbook. It is only able to limit the already defined numbers of workers.The limitation of the concurrent task executions in the pull request is done in the same way as the single tasks are attached to the available workers. There should not be a behaviour change.
What is the expected behavior for nonlinear strategies?
With
strategy:free
the expected behaviour should be very similar as withlinear
: As long as there is one task in the list of tasks that are currently handled or queued, the number of used workers is limited tomax_concurrent
. Withfree
other tasks that are executed at the same time as a task usingmax_concurrent
are affected as well.What is the relationship to any_errors_fatal/play serail/run_once/etc ?
The only relationship I see is that
any_errors_fatal
andmax_concurrent
are attributes for task. There should not be a change with max_concurrent as max_concurrent should not change any error handling and behavior.The relationship to
serial
is that serial is doing something similar, but on the playbook level in a invasive way. It creates serialized batches and then usesTaskQueueManager
to run the playbook for each batch. There is no more relationship than doing something similar but on a different level.run_once
is running a task only on one host out of play_hosts. This is useful if you can expect exactly the same results from all hosts in play_hosts. max_concurrent on the other side is not limiting the execution to max_concurrent hosts. The task is executed on all hosts finally, but only by max_concurrent at the same time.forks
is serializing the playbook execution to forks hosts at a time. It is similar to max_concurrent, but it does this only on a playbook basis.What is the appropriate name+keyword of the feature?
I used
max_concurrent
, because it is describing best what it does. But as there isserial
for playbooks, we might also use serial as the final name for max_concurrent. But then we would need to also support the use of percentage to have it consistent.Example:
Testing
I tmight be needed to have additional tests to make sure that the serialization of tasks is also working with
strategy: free
.Documentation
A documentation is needed for the limitation of concurrent task executions for single tasks. It should be very similar to
forks
but limited to tasks only.