Pod disruption budget to manage availability

Describe the feature you'd like to have. The operator should maintain a pod disruption budget for the Gluster cluster pods to prevent voluntary disruptions from hurting service availability. After a Gluster pod is down for any reason, the data hosted on that pod will likely need to be healed before the next outage can be fully tolerated. Having a disruption budget will prevent kubernetes from voluntarily taking down a pod until the proper number are up and healthy.

What is the value to the end user? (why is it a priority?) Users expect storage to be continuously available, through both planned and unplanned events. Having properly maintained disruption budgets will prevent voluntary events (upgrades, etc.) from causing outages.

How will we know we have a good solution? (acceptance criteria)

The operator should manage a pod disruption budget object that refers to the gluster pods
The operator should update the min available number based on the size of the cluster

Additional context This item will need some investigation (and may not actually be usable):

We would like to consider a pod "disrupted"/unhealthy if it has pending heals on any of its volumes.
- Is having the health check reflect pending heals the correct approach?
- Would an extended period of unhealthy-ness cause the pod to be killed (we don't want that)?
As a first cut the operator would set min available to be (nodes - 1), but this is overly conservative.
- An alternate approach would be to have a budget per AZ, requiring (az_nodes - 1) to be available. This would permit more parallelism during upgrades.

gluster / anthill

Pod disruption budget to manage availability #32