gdubicki commented 6 years ago

Proposal: New `max_fail` parameter for easier and more powerful allowing some hosts to fail

Author: Greg Dubicki <@gdubicki>

Date: 2018-07-23

Status: New
Proposal type: core design
Targeted release: 2.7
Associated PR: none (yet)
Estimated time to implement: a few days?

Motivation

Make using ansible for rolling updates easier and more powerful by better configuration of allowing some hosts to fail, with a new max_fail parameter, in place of max_fail_percentage.

Problems

Currently used max_fail_percentage parameter is hard to use and limited for environments with variable number of hosts in hostgroups.

Let's say you have this configuration for a rolling update playbook:

serial:
- 1 # test batch
- 30%
- 30%
- 100% # to ensure all the rest host are processed in 4th batch

Let's also assume that you are using this code in 5 datacenters, with various number of nodes, from 100 to 30 nodes, and the current number varies because of growth on one side & and current node issues on the other.

Let's assume that your capacity allows you to assume that a rolling update with up to 5 nodes failed in case of both datacenters is a successful one.

Firstly of all the current max_fail_percentage parameter does not allow you to set global maximum failed nodes, but only per batch.

Secondly you would have to create percentage number specific to each datacenter an maintain that number in sync with current number of nodes in each one...

Solution proposal

Let's introduce a new parameter called max_fail, that will allow both percentage values and nominal (like serial does). When set to a single value it would assume that this is the global maximum nodes that are allowed to fail, in whole group (NOT a batch). When set to a list, it would have to be a list of the same length as serial and each value would be a maximum nodes that are allowed to fail for appropriate batch.

Example code for problem specified in section above would be:

serial:
- 1 # test batch
- 30%
- 30%
- 100% # to ensure all the rest host are processed in 4th batch
max_fail: 5

...or, to ensure that we don't allow the only node in the first batch to fail but allow at most 2 nodes in all other batches to fail (note that this is NOT equivalent of above configuration):

serial:
- 1 # test batch
- 30%
- 30%
- 100% # to ensure all the remaining hosts are processed in 4th batch
max_fail:
- 0 # test batch has to pass
- 2
- 2
- 2

Let's deprecate max_fail_percentage in favour of max_fail
Let's not allow using both max_fail_percentage and max_fail to prevent disambiguity

Testing (optional)

Unit tests to verify expected behaviour.

Documentation (optional)

Documentation update is needed.

gdubicki commented 6 years ago

Please let me know what you think about it because I would like to make a PR, but not if you object to the idea and it doesn't have a chance to get merged..

justinotherguy commented 3 years ago

Any progress on this one?

Would look forward to a solution, too.

ansible / proposals

New `max_fail` parameter for easier and more powerful allowing some hosts to fail #133

Proposal: New `max_fail` parameter for easier and more powerful allowing some hosts to fail

Motivation

Problems

Solution proposal

Testing (optional)

Documentation (optional)

ansible / proposals

New `max_fail` parameter for easier and more powerful allowing some hosts to fail #133

Proposal: New max_fail parameter for easier and more powerful allowing some hosts to fail

Motivation

Problems

Solution proposal

Testing (optional)

Documentation (optional)

Proposal: New `max_fail` parameter for easier and more powerful allowing some hosts to fail