Open stevehipwell opened 2 years ago
That would be great. I have a k3s cluster where machines are managed via spot fleet and the only thing missing is the node removal. Was kind of expecting that it would automatically remove it by default. Open to do a PR if you can point a good approach on the implementation.
That would be great. I have a k3s cluster where machines are managed via spot fleet and the only thing missing is the node removal. Was kind of expecting that it would automatically remove it by default. Open to do a PR if you can point a good approach on the implementation.
@vkruoso I started to set this up and ran into this too. Are you getting around this in a specific way at the moment?
@stevehipwell also curious if you solve this via a custom reaper?
@dcarrion87 this is still an outstanding request with no solution.
@stevehipwell do you manually clean up nodes every now and then? We're thinking of putting in an additional reaper.
@dcarrion87 we don't. If I had the time this would be something I'd like to contribute to NTH.
AFAIK Karpenter removes nodes it manages. So if Karpenter was part of the EKS control plane or could run on nodes it was managing that would be the best solution.
That would be great. I have a k3s cluster where machines are managed via spot fleet and the only thing missing is the node removal. Was kind of expecting that it would automatically remove it by default. Open to do a PR if you can point a good approach on the implementation.
@vkruoso I started to set this up and ran into this too. Are you getting around this in a specific way at the moment?
At this moment we remove those nodes manually once in a while.
Yeh fair enough. Karpenter won't work for this use case. I'm going to be implementing a separate reaper alongside the NTH using a combination AWS and Kubernetes API calls. I.e. If node matches rules and is terminated then delete it.
Yeh fair enough. Karpenter won't work for this use case. I'm going to be implementing a separate reaper alongside the NTH using a combination AWS and Kubernetes API calls. I.e. If node matches rules and is terminated then delete it.
Awesome. Please let me know if I can help in any way.
Just an idea:
Create a daemonset (or place a script in the host) that acts as the healthcheck target for the EC2 machine. This script would check if the node has been cordoned. If it has, it answers false
thus making the EC2 healthchecks fail.
This will trigger NTH to drain the instance whilst AWS itself will kill the EC2 in the end.
Describe the feature I'd like the option for NTH v2 to actually remove the node from the cluster (e.g.
kubectl delete node
) when cordon/drain has completed; the lifecycle would still terminate the instance.Is the feature request related to a problem? The idiomatic way that controllers work is with caches and only responding to events so it's important that the node removal be an actual Kubernetes event so that other controllers know that it's happened.
Describe alternatives you've considered n/a