gardener / gardener

Homogeneous Kubernetes clusters at scale on any infrastructure using hosted control planes.
https://gardener.cloud
Apache License 2.0
2.93k stars 480 forks source link

Use PDBs with `unhealthyPodEvictionPolicy` #8821

Closed timuthy closed 10 months ago

timuthy commented 1 year ago

How to categorize this issue?

/area robustness /area high-availability /kind enhancement

What would you like to be added: Since Kubernetes 1.27 (PDBUnhealthyPodEvictionPolicy defaulted to true) PDBs offer a Unhealthy Pod Eviction Policy. It should be considered using unhealthyPodEvictionPolicy: AlwaysAllow for (some) Gardener managed components.

Why is this needed: The default behavior for PDBs (also before 1.27) considers running pods (.status.phase="Running") as potential candidates violating against a PDB. This becomes a problem if a pod of a single replica deployment is in Running but not Ready state and has a matching PDB with maxUnavailable: 1 as endorsed by Gardener (ref).

For example, we saw kube-controller-manager pods being not Ready and thus blocked node roll-outs on seeds since the defined PDB was violated. The Shoot HA Best Practices documentation as well advertises advantages using AlwaysAllow in case of a zone outage.

timuthy commented 1 year ago

cc @vlerenc

vlerenc commented 1 year ago

Nice, thanks @timuthy for bringing this up!

voelzmo commented 11 months ago

We had this frozen in our backlog for a while, waiting for our Seeds to update to 1.27 – are you going to make this change before that happens?

timuthy commented 11 months ago

Gardener contains support for Kubernetes 1.24.x (ref) and it will probably take ~1 year to have 1.27.x as the least supported version. If we want to benefit from unhealthyPodEvictionPolicy: always today we'll have to check the runtime version and set it if possible.

Side note: The PDB API change was introduced with 1.26.x. So even if the feature gate is disabled, we can set unhealthyPodEvictionPolicy unconditionally as soon as the least supported version is 1.26.x.

shafeeqes commented 11 months ago

/assign