Heartbeat on progress - Githubissues

aws-samples / amazon-k8s-node-drainer

Gracefully drain Kubernetes pods from EKS worker nodes during autoscaling scale-in events.

Other

199 stars 56 forks source link

Heartbeat on progress #42

Closed JonathanO closed 3 years ago

JonathanO commented 3 years ago

Send heartbeats as long as we're managing to successfully call the k8s api for eviction.

This will prevent AWS continuing to terminate the instance as long as we're still trying to evict things.

AWS re-invokes the lambda if it times out, not sure how many times, but it's enough to give us >30mins of blocking termination.

geekifier commented 2 years ago

Why has this not been merged? Seems like a very important improvement: https://github.com/transferwise/amazon-eks-node-drainer/pull/12.

svozza commented 2 years ago

The PR was closed by the author. There are no commits so there's nothing to merge..

geekifier commented 2 years ago

@svozza I should have clarified, my question was to the author of the PR. It seems those changes are present in their fork, so I am not sure what happened.

svozza commented 2 years ago

Ah I see! My one issue is that the PR in forked repo has no tests, I'd like to see at least a happy path test before I'll merge.

JonathanO commented 2 years ago

We stopped using the lambda (we migrated to aws-node-termination-handler) before I got around to writing any tests. I'd not intended to raise the PR against the upstream project, as it wasn't clear whether it worked as expected. I don't remember if we ever actually used that PR for real. I vaguely recall that there's a limit to how many times the lambda can be re-run, which limited how long an instance could be kept alive for even with heartbeats.

Feel free to adopt my patch if it solves a problem for you.

svozza commented 2 years ago

There's also another project that also does node draining as well as other autoscaling related tasks: https://aws.amazon.com/blogs/aws/introducing-karpenter-an-open-source-high-performance-kubernetes-cluster-autoscaler.

bwagner5 commented 2 years ago

Just to note Karpenter does do node cordon and draining, but if you're using for spot instance and need the full 2-minutes to drain, then you'd still need something like this tool or https://github.com/aws/aws-node-termination-handler .