Closed timuthy closed 10 months ago
cc @vlerenc
Nice, thanks @timuthy for bringing this up!
We had this frozen in our backlog for a while, waiting for our Seeds to update to 1.27 – are you going to make this change before that happens?
Gardener contains support for Kubernetes 1.24.x
(ref) and it will probably take ~1 year to have 1.27.x
as the least supported version. If we want to benefit from unhealthyPodEvictionPolicy: always
today we'll have to check the runtime version and set it if possible.
Side note: The PDB API change was introduced with 1.26.x
. So even if the feature gate is disabled, we can set unhealthyPodEvictionPolicy
unconditionally as soon as the least supported version is 1.26.x
.
/assign
How to categorize this issue?
/area robustness /area high-availability /kind enhancement
What would you like to be added: Since Kubernetes 1.27 (
PDBUnhealthyPodEvictionPolicy
defaulted totrue
) PDBs offer a Unhealthy Pod Eviction Policy. It should be considered usingunhealthyPodEvictionPolicy: AlwaysAllow
for (some) Gardener managed components.Why is this needed: The default behavior for PDBs (also before 1.27) considers running pods (
.status.phase="Running"
) as potential candidates violating against a PDB. This becomes a problem if a pod of a single replica deployment is inRunning
but notReady
state and has a matching PDB withmaxUnavailable: 1
as endorsed by Gardener (ref).For example, we saw
kube-controller-manager
pods being notReady
and thus blocked node roll-outs on seeds since the defined PDB was violated. The Shoot HA Best Practices documentation as well advertises advantages usingAlwaysAllow
in case of a zone outage.