Support AWS Warm Pools for karpenter

aws / karpenter-provider-aws

Karpenter is a Kubernetes Node Autoscaler built for flexibility, performance, and simplicity.

https://karpenter.sh

Apache License 2.0

6.27k stars 866 forks source link

Support AWS Warm Pools for karpenter #4354

Open myloginid opened 1 year ago

myloginid commented 1 year ago

Description

What problem are you trying to solve? Improve time for EKS autoscaling using AWS EC2 Warm Pools feature to use pre provisioned nodes that can be stopped when not in use and started faster than new ec2 instances and thus improve autoscaling time

How important is this feature to you? We are not using karepenter today so this is not as important. However this seems to be a feature that other users would also be able to utilize

References

https://docs.aws.amazon.com/autoscaling/ec2/userguide/ec2-auto-scaling-warm-pools.html#create-a-warm-pool-console

runningman84 commented 1 year ago

isn't cluster overprovisioning a solution here?

sftim commented 1 year ago

Karpenter doesn't rely on EC2 Autoscaling and is actually provider agnostic in its core. However, it's plausible to have something similar happen. Karpenter would need to launch instances that were ready to register as nodes, suspend those machines, and then defrost them on demand.

It's not a small feature, and it might be easier to write a whole new cluster autoscaler to achieve that (borrowing code and design from existing implementations).

myloginid commented 1 year ago

isn't cluster overprovisioning a solution here?

It costs too much.

ellistarn commented 1 year ago

It's not a small feature, and it might be easier to write a whole new cluster autoscaler to achieve that (borrowing code and design from existing implementations).

I'm not sure I agree with this assertion. It's work, but fits naturally into Karpenter if you introduce a new cloud provider concept for it.

Typically, we push users to understand why their start time is. For nitro-based EC2 instances, launch times should be ~30 seconds. If they're not, it's worth digging into the initialization logic to understand what the holdup is. Usually there are simpler solutions than boot+suspend.

sftim commented 1 year ago

@ellistarn you're thinking of someone implementing a custom Provisioner plugin that manages a pool of hibernated servers? Or maybe a provisioner backed by setting the desired size for an EC2 autoscaling group?

Both of those are definitely feasible!

ellistarn commented 1 year ago

We could pretty easily build a concept of Machine.hibernated into our existing CP model, and then implement it using EC2 APIs. We definitely wouldn't back it with ASGs.

darren-recentive commented 10 months ago

@ellistarn That sounds awesome, is this feature request being tracked on anywhere so that we can follow?

CarlosDomingues commented 9 months ago

My company uses Karpenter + Knative to dynamically allocate pods. Being able to hibernate instances instead of stopping them would be a awesome!

andrewleech commented 6 months ago

I'm looking for something like this as even more critical when using Windows nodes, these can easily take 10min to startup, depending on the size of the docker image being used.

For our use case it'd be even better if nodes that have been used could be stopped rather than terminated at the end of the keepalive period, then simply restarted rather than build new from scratch.

riiv-hexagon commented 3 months ago

This would be great

Bryce-Soghigian commented 2 months ago

thomas-beznik commented 2 weeks ago

We would also greatly need this! Is there any ETA for this? Thank you!