Open gdubicki opened 6 years ago
Please let me know what you think about it because I would like to make a PR, but not if you object to the idea and it doesn't have a chance to get merged..
Any progress on this one?
Would look forward to a solution, too.
Proposal: New
max_fail
parameter for easier and more powerful allowing some hosts to failAuthor: Greg Dubicki <@gdubicki>
Date: 2018-07-23
Motivation
Make using ansible for rolling updates easier and more powerful by better configuration of allowing some hosts to fail, with a new
max_fail
parameter, in place ofmax_fail_percentage
.Problems
Currently used
max_fail_percentage
parameter is hard to use and limited for environments with variable number of hosts in hostgroups.Let's say you have this configuration for a rolling update playbook:
Let's also assume that you are using this code in 5 datacenters, with various number of nodes, from 100 to 30 nodes, and the current number varies because of growth on one side & and current node issues on the other.
Let's assume that your capacity allows you to assume that a rolling update with up to 5 nodes failed in case of both datacenters is a successful one.
Firstly of all the current
max_fail_percentage
parameter does not allow you to set global maximum failed nodes, but only per batch.Secondly you would have to create percentage number specific to each datacenter an maintain that number in sync with current number of nodes in each one...
Solution proposal
max_fail
, that will allow both percentage values and nominal (likeserial
does). When set to a single value it would assume that this is the global maximum nodes that are allowed to fail, in whole group (NOT a batch). When set to a list, it would have to be a list of the same length asserial
and each value would be a maximum nodes that are allowed to fail for appropriate batch.Example code for problem specified in section above would be:
...or, to ensure that we don't allow the only node in the first batch to fail but allow at most 2 nodes in all other batches to fail (note that this is NOT equivalent of above configuration):
max_fail_percentage
in favour ofmax_fail
max_fail_percentage
andmax_fail
to prevent disambiguityTesting (optional)
Unit tests to verify expected behaviour.
Documentation (optional)
Documentation update is needed.