aws / aws-node-termination-handler

Gracefully handle EC2 instance shutdown within Kubernetes
https://aws.amazon.com/ec2
Apache License 2.0
1.63k stars 268 forks source link

Add option to remove node after cordon/drain #719

Open stevehipwell opened 2 years ago

stevehipwell commented 2 years ago

Describe the feature I'd like the option for NTH v2 to actually remove the node from the cluster (e.g. kubectl delete node) when cordon/drain has completed; the lifecycle would still terminate the instance.

Is the feature request related to a problem? The idiomatic way that controllers work is with caches and only responding to events so it's important that the node removal be an actual Kubernetes event so that other controllers know that it's happened.

Describe alternatives you've considered n/a

vkruoso commented 1 year ago

That would be great. I have a k3s cluster where machines are managed via spot fleet and the only thing missing is the node removal. Was kind of expecting that it would automatically remove it by default. Open to do a PR if you can point a good approach on the implementation.

dcarrion87 commented 1 year ago

That would be great. I have a k3s cluster where machines are managed via spot fleet and the only thing missing is the node removal. Was kind of expecting that it would automatically remove it by default. Open to do a PR if you can point a good approach on the implementation.

@vkruoso I started to set this up and ran into this too. Are you getting around this in a specific way at the moment?

dcarrion87 commented 1 year ago

@stevehipwell also curious if you solve this via a custom reaper?

stevehipwell commented 1 year ago

@dcarrion87 this is still an outstanding request with no solution.

dcarrion87 commented 1 year ago

@stevehipwell do you manually clean up nodes every now and then? We're thinking of putting in an additional reaper.

stevehipwell commented 1 year ago

@dcarrion87 we don't. If I had the time this would be something I'd like to contribute to NTH.

AFAIK Karpenter removes nodes it manages. So if Karpenter was part of the EKS control plane or could run on nodes it was managing that would be the best solution.

vkruoso commented 1 year ago

That would be great. I have a k3s cluster where machines are managed via spot fleet and the only thing missing is the node removal. Was kind of expecting that it would automatically remove it by default. Open to do a PR if you can point a good approach on the implementation.

@vkruoso I started to set this up and ran into this too. Are you getting around this in a specific way at the moment?

At this moment we remove those nodes manually once in a while.

dcarrion87 commented 1 year ago

Yeh fair enough. Karpenter won't work for this use case. I'm going to be implementing a separate reaper alongside the NTH using a combination AWS and Kubernetes API calls. I.e. If node matches rules and is terminated then delete it.

vkruoso commented 1 year ago

Yeh fair enough. Karpenter won't work for this use case. I'm going to be implementing a separate reaper alongside the NTH using a combination AWS and Kubernetes API calls. I.e. If node matches rules and is terminated then delete it.

Awesome. Please let me know if I can help in any way.

migueleliasweb commented 2 months ago

Just an idea:

Create a daemonset (or place a script in the host) that acts as the healthcheck target for the EC2 machine. This script would check if the node has been cordoned. If it has, it answers false thus making the EC2 healthchecks fail.

This will trigger NTH to drain the instance whilst AWS itself will kill the EC2 in the end.