foriequal0 / pod-graceful-drain

You don't need `lifecycle: { preStop: { exec: { command: ["sleep", "30"] } } }`
Apache License 2.0
246 stars 16 forks source link

Respect DeleteOptions #18

Open foriequal0 opened 3 years ago

mihasya commented 3 years ago

Hello, @foriequal0 ! We have rolled out this controller in both our staging and prod clusters and, indeed, so far it's working great. I'm scanning the issues list to see what might lie ahead, and am wondering if you could flesh out what gaps are currently known.

mihasya commented 3 years ago

I should add, if the issues are well defined and we can reproduce them locally, we are likely to prioritize contributing to this controller, as it has solved a big problem for us.

foriequal0 commented 3 years ago

Hello, @mihasya ! I'm glad to hear that! I think it is a good-to-have issue, rather than an urgent one. Anyway, let me give you detailed explanations.

Pod can be deleted with these options: https://kubernetes.io/docs/reference/kubernetes-api/workload-resources/pod-v1/#delete-delete-a-pod Currently, the controller uses Client from sigs.k8s.io/controller-runtime/pkg/client and its Delete() function. It can accept an optional client.DeleteOptions: https://github.com/kubernetes-sigs/controller-runtime/blob/master/pkg/client/options.go#L182 I think you can experiment with --grace-period flag in kubectl delete and kubectl apply. It seems that --dry-run is usually implemented in client side.

Also policyv1beta1.Eviction can be created with optional DeleteOptions in its fields: https://v1-18.docs.kubernetes.io/docs/reference/generated/kubernetes-api/v1.18/#eviction-v1beta1-policy

As you can see, these options are about gracePeriodSeconds, propagationPolicy, and preconditions.

Their handling is missing in 2 places I think.

  1. preconditions are not checked when the pod deletion/eviction is intercepted https://github.com/foriequal0/pod-graceful-drain/blob/main/internal/pkg/webhooks/pod_validator.go pod deletion is scheduled and the pod is successfully deleted even if it had faling preconditions.
  2. These options are not passed to the scheduled deletion: https://github.com/foriequal0/pod-graceful-drain/blob/main/internal/pkg/core/pod_mutate.go#L185

Also, it calls Delete() for both deletion/eviction. I assume that it might not respect pod disruption budget (I might be wrong).

I've stopped here since missing features are not critical in use cases that the controller is focused on. But if you think they are critical to your workflow, then feel free to implement it. I'm willing to accept it.