aws / karpenter-provider-aws

Karpenter is a Kubernetes Node Autoscaler built for flexibility, performance, and simplicity.
https://karpenter.sh
Apache License 2.0
6.75k stars 949 forks source link

Warm Up Nodes Options (Hibernation) #3798

Open abebars opened 1 year ago

abebars commented 1 year ago

Tell us about your request

Allow Kaprneter to provision more nodes in a hibernated state which would decrease the new nodes provisioning time for rapid scaling.

Tell us about the problem you're trying to solve. What are you trying to do, and why is it hard?

Kaprneter is excellent for optimizing the cluster capacity; however, on the other side, applications that require rapid scaling will need to wait until the new nodes are provisioned. There is a proposal here to add headroom logic, But that means we are still going to have running nodes with no workloads which they are being charged for.

Another option is to support Hibernation (Stopped Instances), which will be bootstrapped and ready to join the clusters once needed. This feature is already supported out of the box as Warm Pool for Auto Scaling Group

Are you currently working around this issue?

Using Low Priority Pods could be less practical from a cost-saving perspective. Similar to https://github.com/aws/karpenter/issues/3240

Additional Context

No response

Attachments

No response

Community Note

jonathan-innis commented 1 year ago

It seems like you may need some combination of aws/karpenter-core#749 with an option to specify that manual node provisioning as a "warm pool?"

Do you know what the capacity is going to look like and you want the warm pool to be right-sized? Or are you just looking to specify some constraints on a manually provisioned warm pool that would look like being able to manually launch Karpenter capacity like listed in aws/karpenter-core#749.

abebars commented 1 year ago

It seems like you may need some combination of aws/karpenter-core#749 with an option to specify that manual node provisioning as a "warm pool?"

Do you know what the capacity is going to look like and you want the warm pool to be right-sized? Or are you just looking to specify some constraints on a manually provisioned warm pool that would look like being able to manually launch Karpenter capacity like listed in aws/karpenter-core#749.

@jonathan-innis I think having a manual node could be helpful to some sort but it doesn't really align well with the provisioner idea unless it's referencing it in some sort. so if we are doing a manual node I would expect something like

apiVersion: karpenter.sh/v1alpha5
kind: NodeGroup
metadata:
  name: default
spec:
  replicas: 2
  provisionerRef:
    name: my-provisioner

However, I am looking for something more like

apiVersion: karpenter.sh/v1alpha5
kind: Provisioner
metadata:
  name: default
spec:
  ......
  # Resource limits constrain the total size of the cluster.
  # Limits prevent Karpenter from creating new instances once the limit is exceeded.
  limits:
    resources:
      cpu: "1000"
      memory: 1000Gi
 # Buffer will be added to the total number required to ensure there is extra space for scaling
 # This can be an absolute number or percentage of the total provisioned nodes 
  buffer:
    resources:
      cpu: "10" OR "10%"
      memory: 100Gi OR "10%"
      warm: true # If this is true, nodes will be hibernated; otherwise, nodes are available and in a ready state. 
jonathan-innis commented 1 year ago

Yeah, I think this is being tracked over here aws/karpenter#3240. Do you mind including your use-case over there? I think this issue looks like a duplicate of the discussion that's occurring over there.

jonathan-innis commented 1 year ago

Closing this as a duplicate of aws/karpenter#3240

a7i commented 1 year ago

@jonathan-innis Why was this closed as duplicate? This issue is about a similar option as Warm Pools for ASGs in Karpenter. The duplicate issue referenced is for overprovisioner.

andrewleech commented 1 year ago

I agree that this is not a duplicate of https://github.com/aws/karpenter/issues/3240

That one is about keeping extra nodes active all the time, ready to pick up jobs.

This issue is about having some nodes (AWS instances) in shutdown state rather than terminated, such that when a new node is needed the existing machine can be restarted rather than needing to create a new machine from scratch.

I use karpenter for managing Gitlab CI build machines, so when a new build job comes in it starts a new machine to run that build job, then shuts the machine down again afterwards. For most of the day there are no machines running, just occasional ones started when a git commit is pushed.

Currently, I have a ~1.5 minute delay to a build job while it's creating and provisioning the machine, but at least I'm only paying money while the job is running.

I'm in the process of getting going with the new windows support for windows build jobs - it's looking like up to 20 minutes to provision a windows machine and pull a (rather large) docker build image.

With aws/karpenter#3240 I'd basically end up with at least one "warm" machine running 24/7, incurring significant cost.

With the proposal in this issue, I'd have one shutdown machine in AWS ready to restart when a job comes in, which should start up significantly faster, but only cost a little bit of storage fee when shut down.

FernandoMiguel commented 1 year ago

@andrewleech you can bake EBS snapshots with most common images you frequently need, and attach those to karpenter nodes, avoiding having to download them on every new node. should improve your boot time considerably

andrewleech commented 1 year ago

Thanks @FernandoMiguel that's interesting, I didn't realise that was possible.

On windows I guess almost everything is based on one of two windows base/core images so it'd certainly be good to have them preloaded, though we use a range of different things in Linux so not sure what I'd load there, worth thinking about though.

However on any OS it would mean extra processes needed to create and maintain those snapshots (security updates etc).

It's definitely worth testing at least to see how much time it saves, vs the initial time to just create the machine.

andrewleech commented 9 months ago

I've tested building a custom windows AMI (using AWS image builder) for my windows nodes with a bunch of container images pre-pulled with crictl.

I was also able to enable AWS Fast Start on the image.

Using this image is faster with Karpenter, but there's still a ~ 6 minute start up time.

The pod logs show the pre-pulled images are all being used, so that did help. I was really hoping for a lot faster though.

jonathan-innis commented 9 months ago

Apologize for missing the back-and-forth here and not re-opening this one earlier. You're correct that I misclassified this one on first glance.

The pod logs show the pre-pulled images are all being used, so that did help. I was really hoping for a lot faster though

Would shutdown instances still help here or are there other areas that are bottlenecking that you can see?

Bryce-Soghigian commented 6 months ago

Another Data point: Cluster autoscaler managed on AKS has a "deallocate" scale down mode. Where rather than deleting vms, we put them in "deallocated mode" which essentially is the same as hibernation. Then when you need to scale up you wake up one of the hibernated instances.

Jack is taking a stab at upstreaming the change here for reference.

Some users who require 1s latency are ok paying for the os disk with the tradeoff that the VM will start immediately when they need it.

Would shutdown instances still help here or are there other areas that are bottlenecking that you can see?

I am also curious the full breakdown of the bottlenecks you are facing. If the bottleneck is with image pull, hibernated instances may not save you as much time, and something optimizing image pull may make more sense like you tried but you can probably go deeper.

Hibernated instances may save you 30-45s, but for some larger container images such as sagemathinc/cocalc that take 405.3s to Start the image, can be reduced to 2.9s using things like Artifact Streaming and overlaybd Screenshot 2024-04-02 at 10 37 45 PM

Source

Solving at the node bootstrapping layer is just one layer of potential latency. Haven't dove deep on the aws side but imagine similar things are achievable via completely optimizing image pull

myloginid commented 5 months ago

Given the number of upvotes on this and linked issues, will this feature be made available soon?

jtdoepke commented 4 months ago

Here's a blog post showing how using shutdown instances can decrease boot time: https://depot.dev/blog/faster-ec2-boot-time

I imagine something like that, combined with pre-loading images, could make adding new nodes very fast.