Closed JonathanO closed 3 years ago
Why has this not been merged? Seems like a very important improvement: https://github.com/transferwise/amazon-eks-node-drainer/pull/12.
The PR was closed by the author. There are no commits so there's nothing to merge..
@svozza I should have clarified, my question was to the author of the PR. It seems those changes are present in their fork, so I am not sure what happened.
Ah I see! My one issue is that the PR in forked repo has no tests, I'd like to see at least a happy path test before I'll merge.
We stopped using the lambda (we migrated to aws-node-termination-handler) before I got around to writing any tests. I'd not intended to raise the PR against the upstream project, as it wasn't clear whether it worked as expected. I don't remember if we ever actually used that PR for real. I vaguely recall that there's a limit to how many times the lambda can be re-run, which limited how long an instance could be kept alive for even with heartbeats.
Feel free to adopt my patch if it solves a problem for you.
There's also another project that also does node draining as well as other autoscaling related tasks: https://aws.amazon.com/blogs/aws/introducing-karpenter-an-open-source-high-performance-kubernetes-cluster-autoscaler.
Just to note Karpenter does do node cordon and draining, but if you're using for spot instance and need the full 2-minutes to drain, then you'd still need something like this tool or https://github.com/aws/aws-node-termination-handler .
Send heartbeats as long as we're managing to successfully call the k8s api for eviction.
This will prevent AWS continuing to terminate the instance as long as we're still trying to evict things.
AWS re-invokes the lambda if it times out, not sure how many times, but it's enough to give us >30mins of blocking termination.