Open myloginid opened 1 year ago
isn't cluster overprovisioning a solution here?
Karpenter doesn't rely on EC2 Autoscaling and is actually provider agnostic in its core. However, it's plausible to have something similar happen. Karpenter would need to launch instances that were ready to register as nodes, suspend those machines, and then defrost them on demand.
It's not a small feature, and it might be easier to write a whole new cluster autoscaler to achieve that (borrowing code and design from existing implementations).
isn't cluster overprovisioning a solution here?
It costs too much.
It's not a small feature, and it might be easier to write a whole new cluster autoscaler to achieve that (borrowing code and design from existing implementations).
I'm not sure I agree with this assertion. It's work, but fits naturally into Karpenter if you introduce a new cloud provider concept for it.
Typically, we push users to understand why their start time is. For nitro-based EC2 instances, launch times should be ~30 seconds. If they're not, it's worth digging into the initialization logic to understand what the holdup is. Usually there are simpler solutions than boot+suspend.
@ellistarn you're thinking of someone implementing a custom Provisioner plugin that manages a pool of hibernated servers? Or maybe a provisioner backed by setting the desired size for an EC2 autoscaling group?
Both of those are definitely feasible!
We could pretty easily build a concept of Machine.hibernated
into our existing CP model, and then implement it using EC2 APIs. We definitely wouldn't back it with ASGs.
@ellistarn That sounds awesome, is this feature request being tracked on anywhere so that we can follow?
My company uses Karpenter + Knative to dynamically allocate pods. Being able to hibernate instances instead of stopping them would be a awesome!
I'm looking for something like this as even more critical when using Windows nodes, these can easily take 10min to startup, depending on the size of the docker image being used.
For our use case it'd be even better if nodes that have been used could be stopped rather than terminated at the end of the keepalive period, then simply restarted rather than build new from scratch.
This would be great
We would also greatly need this! Is there any ETA for this? Thank you!
Description
What problem are you trying to solve? Improve time for EKS autoscaling using AWS EC2 Warm Pools feature to use pre provisioned nodes that can be stopped when not in use and started faster than new ec2 instances and thus improve autoscaling time
How important is this feature to you? We are not using karepenter today so this is not as important. However this seems to be a feature that other users would also be able to utilize
References