Closed ashwani2k closed 1 year ago
We cannot set the failure policy to Fail
by default, because an unhealthy Kupid installation (due to but not limited to reasons such as scheduling issues for Kupid pod, OOMKills and misconfigured controllerinstallation) can lead to blocking any changes/updates to all deployments, statefulsets, daemonsets, etc in the cluster. Instead, we should work on #55 which aims to limit Kupid's webhook to handle only relevant resources and not every pod-group in the cluster.
I would like to reduce the scope of this issue to simply improving logging in Kupid for better debugging. We will track the task of adding support to filter resources for Kupid in #55 .
What would you like to be added: Currently for
Kupid
it ignore failure on applying the scheduling policies. Either we should fail in case the policy is not applied so that there are no side effects of pod being scheduled on workers which it was not defined in the scheduling policies or We should log such errors and raise alerts to bring to the operator of such scheduling of pods to have a possibility to react to the anomaly.Why is this needed: Now in real scenario this can have pros and cons -- Pro
etcd
are scheduled even if not on the worker pool the policy describes but whatever scheduler prescribes.Cons
etcd
. This wouldn't have happened had theetcd
pod was deployed on the intended worker as per the policy definitions.