aws / karpenter-provider-aws

Karpenter is a Kubernetes Node Autoscaler built for flexibility, performance, and simplicity.
https://karpenter.sh
Apache License 2.0
6.75k stars 951 forks source link

Karpenter should support managing non-Karpenter nodes #6601

Open stevehipwell opened 3 months ago

stevehipwell commented 3 months ago

Description

What problem are you trying to solve? When operating Karpenter in an EKS cluster I'd like to make use of karpenter capabilities to manage other non-managed nodes (self-managed ASGs to be precise). This is currently a highly significant capability as Karpenter isn't part of the EKS control plane, nor can it self-manage the nodes it runs on, and running Karpenter on Fargate has too many limitations. I'd like to see the following capabilities supported allowing Karpenter to automate node lifecycle management to align non-Karpenter nodes to Karpenter nodes.

At a high level I'd like to see Karpenter update the AMI ID in ASG launch templates in alignment with a NodeClass, which would mirror the Karpenter behaviour to the ASG. I'd then like to also see Karpenter trigger an ASG instance refresh based on a NodePool disruption configuration. This would result in the node lifecycle for an EKS cluster being consistent instead of requiring a large amount of manual intervention.

I don't think there are any new core concepts for Karpenter here, but there would need to be a mapping between ASGs and NodePool resources.

This functionality would still require NTH to properly handle the ASG lifecycle events but it might also be possible to handle these from Karpenter in the future.

How important is this feature to you? Being able to manage the lifecycle of non-Karpenter nodes (or not require them in the first place) is very important as the current EKS architecture makes it impossible to automate this without a custom operator.

njtran commented 2 months ago

Seems interesting. Is this different than this? https://github.com/kubernetes-sigs/karpenter/issues/920

njtran commented 2 months ago

Also I think this would be something that would need to be natively supported in upstream Karpenter, and then in the cloud providers. Can you cut this issue into the upstream one if it's different than https://github.com/kubernetes-sigs/karpenter/issues/920?

stevehipwell commented 2 months ago

@njtran I don't think this is covered in https://github.com/kubernetes-sigs/karpenter/issues/920 and I also don't think it's an upstream issue at this point (maybe some of this might be abstractable in the future). The requirement is specific to AWS where Karpenter can't be run on the control plane and so you're required to have non-Karpenter nodes.

njtran commented 2 months ago

Would the idea be to create NodeClaims in response to nodes elsewhere in the cluster, rather than responding to created NodeClaims and creating nodes?

stevehipwell commented 2 months ago

@njtran I don't think that'd be required here. The important link here is between Karpenter configuration and an AWS ASG, so I think a CRD to handle the relationship between a NodeClass and an ASG via a tag selector.

Karpenter could then modify the ASG launch template when the NodeClass had it's AMI changed. The new CRD should have a maintenance schedule cron in it for Karpenter to trigger an ASG instance refresh to update the nodes.