gardener / machine-controller-manager

Declarative way of managing machines for Kubernetes cluster
Apache License 2.0
256 stars 117 forks source link

Don't consider Crashloopbackoff pods as part of PDB #684

Open himanshu-kun opened 2 years ago

himanshu-kun commented 2 years ago

How to categorize this issue?

/area performance /area robustness /area usability /kind enhancement /priority 3

What would you like to be added: Currently upstream treats Crashloopbackoff pods as Unavailable and so if a PDB is configured with maxUnavailable=1 with 2 pods , 1 pod Pending and other Crashloopbackoff, then the pod eviction request is denied for such pod and node draining can't proceed. There is a discussion upstream to deal with this https://github.com/kubernetes/kubernetes/issues/72320 and a PR to ignore Crashloopbackoff pods from PDB is raised https://github.com/kubernetes/kubernetes/pull/105296

Testing is required after the PR gets merged and MCM starts using the corresponding k8s version Why is this needed: So that MCM draining is not stuck till drainTimeout for CrashLoopbackoff pods with PDB.

himanshu-kun commented 1 year ago

Post grooming discussion

The onus is on the customer now to configure the PDB in a way which allows to drain the CrashLoopBackoff pods also . This PR introduced spec.unhealthyPodEvictionPolicy recently. It is currently in alpha and needs to be enabled via feature gate PDBUnhealthyPodEvictionPolicy.

We need to update the gardener docs after testing this feature, to tell the customers how to do use this. Also need to update the DOD playbook for operators.

timuthy commented 11 months ago

FYI: https://github.com/gardener/gardener/issues/8821