gardener / kupid

Inject scheduling criteria into target pods orthogonally by policy definition.
Apache License 2.0
11 stars 19 forks source link

Handling failures in applying the Scheduling policies. #53

Closed ashwani2k closed 1 year ago

ashwani2k commented 1 year ago

What would you like to be added: Currently for Kupid it ignore failure on applying the scheduling policies. Either we should fail in case the policy is not applied so that there are no side effects of pod being scheduled on workers which it was not defined in the scheduling policies or We should log such errors and raise alerts to bring to the operator of such scheduling of pods to have a possibility to react to the anomaly.

Why is this needed: Now in real scenario this can have pros and cons -- Pro

Cons

shreyas-s-rao commented 1 year ago

We cannot set the failure policy to Fail by default, because an unhealthy Kupid installation (due to but not limited to reasons such as scheduling issues for Kupid pod, OOMKills and misconfigured controllerinstallation) can lead to blocking any changes/updates to all deployments, statefulsets, daemonsets, etc in the cluster. Instead, we should work on #55 which aims to limit Kupid's webhook to handle only relevant resources and not every pod-group in the cluster.

I would like to reduce the scope of this issue to simply improving logging in Kupid for better debugging. We will track the task of adding support to filter resources for Kupid in #55 .