aws / karpenter-provider-aws

Karpenter is a Kubernetes Node Autoscaler built for flexibility, performance, and simplicity.
https://karpenter.sh
Apache License 2.0
6.58k stars 915 forks source link

Support Maximum Cost Limits #3244

Open sidewinder12s opened 1 year ago

sidewinder12s commented 1 year ago

Tell us about your request

Support Maximum Cost of launched nodes as a limit both at the provisioner and global config.

Tell us about the problem you're trying to solve. What are you trying to do, and why is it hard?

Many users would like to control cost or set a cost ceiling on their clusters, especially in development environments.

Are you currently working around this issue?

Users have to manually calculate maximum spend by what their provisioner's requirements/limits. And with spot its even less controlled cost.

Additional Context

No response

Attachments

No response

Community Note

runningman84 commented 1 year ago

Another cool idea would be to specify limit the amount of usd per core or memory.

spring1843 commented 1 year ago

Thank you for your suggestion there are a few considerations and questions to be asked here.

What do you think should occur when this maximum is reached? Should Karpenter stop launching new nodes? This may not be desirable by users.

Also it seems like what you are requesting is about historical cost to show how much have these instances cost which is calculated by minute, it's different from the predictive cost that is currently used by Karpenter, so it would have to be able to separate the cost of instances Karpenter has launched and exclude other costs.

Finally how are you dealing with this problem right now? Are you using the AWS budget alerts etc?

sidewinder12s commented 1 year ago

Yes, I think if someone specifies a maximum cost limit it'd stop launching nodes until the cost of new nodes comes under the limit. This might be complicated a bit by Spot.

By other costs do you mean EBS, EIP, etc? Or some other cost? I think using a simple calculation of only the instance cost may work for many folks.

Outside Karpenter, we use ASGs and self managed node groups. Since we set max parameters on the ASGs, calculating maximum cost is somewhat simple. If we had more complicated ASG configs (Mixed Instances + Spot) it might be more complicated to calculate but you do still have the maximum toggle.

When I thought about this, I just found how open ended Provisioners are might make doing the same kind of calculation much more complicated in Karpenter.

My inspiration for this was while working on deploying Karpenter to some of our purely test EKS clusters it'd be nice if I could just set a simple global cost cap of say $500 or $1000 dollars a month. Maybe that'd need to be translated into maximum hourly cost of the cluster.

ellistarn commented 1 year ago

Since we set max parameters on the ASGs, calculating maximum cost is somewhat simple

Isn't this the same as multiple provisioners, each with a limit?

Not saying that we shouldn't build in a global limit feature.

sidewinder12s commented 1 year ago

I think if the provisioners are sufficiently limited on what they can provision and provisioners supported a simple node count limit, yes I think that would accomplish simple cost limiting.

If you have a provisioner that is allowed a wide range of instance types and/or spot however those calculations become unwieldy. My initial thought for this was as I was setting up some 'default' provisioners that'd allow:

Guessing the potential cost of that becomes a lot more difficult, though maybe you'd never combine that many constraints into a real provisioner. But at least in a test environment I thought it'd be nice if I could just say pick anything, don't let the cluster cost more than 1-2K a month.

gaussye commented 1 year ago

any plan for the feature request? We also want to control the cost for each provisioner especially for spot instances.

njtran commented 10 months ago

Just putting my thoughts here:

Since NodePools define limits, but pricing is defined by the cloud provider, cost based limits may need to be defined in the EC2NodeClass here. I'm not sure if it's possible for the karpenter-core provisioning controller to be aware of these limits, so it may need to just gate the instance creation on machine launch, which could cause churn.

cloudbow commented 8 months ago

This will be definitely a huge improvement as we operate a SaaS model and we have a control center which is configuring nodegroups based on customer requirements. If we can directly set costs instead of complex nodegroups config it will be really great.

zaphinath commented 2 months ago

This would be great to help control spot recall. If we could outbid the spot request we can keep spots and prevent a bounce. Often times it appears that a spot gets reclaimed, then we end up back on the same spot instance type. These waves could be prevented if we could just have a highest bid price.