aws / containers-roadmap

This is the public roadmap for AWS container services (ECS, ECR, Fargate, and EKS).
https://aws.amazon.com/about-aws/whats-new/containers/
Other
5.21k stars 320 forks source link

[EKS] [request]: Drain nodes gracefully during autoscaling terminations #430

Closed aaronmell closed 4 years ago

aaronmell commented 5 years ago

Tell us about your request What I would like to see happen, is when an instance is terminated during an autoscaling event, the aws-node daemon that is running should be able to detect the event, and gracefully drain the node, without having to use something like a lambda as described here https://github.com/aws-samples/amazon-k8s-node-drainer

Which service(s) is this request for? EKS

Tell us about the problem you're trying to solve. What are you trying to do, and why is it hard?

I'd like to run EKS without external dependencies, especially when those dependencies failing could cause a cluster to go down.

Are you currently working around this issue? Planning to implement the solution as described in https://github.com/aws-samples/amazon-k8s-node-drainer

Attachments If you think you might have additional information that you'd like to include via an attachment, please do - we'll take a look. (Remember to remove any personally-identifiable information.)

pawelprazak commented 4 years ago

As a workaround you can have a look at:

Disclaimer: I'm the author of the second one.

max-rocket-internet commented 4 years ago

the aws-node daemon that is running should be able to detect the event

aws-node is the CNI. It wouldn't be used for node draining. But luckily AWS already have a solution that might support ASG terminations soon: https://github.com/aws/aws-node-termination-handler/issues/14

rtripat commented 4 years ago

@max-rocket-internet @aaronmell EKS Managed node group performs a drain when AutoScalingGroup terminates an instance on scale-in or rebalancing. Does that satisfy the requirements? Please re-open if it doesn't.

aaronmell commented 4 years ago

We use a custom ami based on the ami AWS provides. We do this because we need to install some additional monitoring tools for compliance reasons. This makes managed node group a non-starter, because you can only use the official AMI. I think if Managed node groups would allow the use of a custom ami over a standard ami, it would be exactly what we are looking for, and we would probably migrate off our existing solution.

grrywlsn commented 4 years ago

Agreed with @aaronmell - the PR was around being able to do this before managed worker nodes were introduced. I'm also not intending to move to managed nodes, so would like this feature to be added still.

MarcusNoble commented 4 years ago

I also don't think this issue should be closed as using managed nodes isn't an option for many people due to no custom AMI support or userdata. I did come across this way of handling node draining on termination from Zalando's kubernetes-on-aws: https://github.com/zalando-incubator/kubernetes-on-aws/blob/449f8f3bf5c60e0d319be538460ff91266337abc/cluster/userdata-worker.yaml#L92-L120 It requires a kubeconfig on the worker nodes though to communicate with the API server to tell it to drain.

mikestef9 commented 4 years ago

@MarcusNoble I created a separate issue to track this feature request for self managed nodes #783