aws / karpenter-provider-aws

Karpenter is a Kubernetes Node Autoscaler built for flexibility, performance, and simplicity.
https://karpenter.sh
Apache License 2.0
6.84k stars 963 forks source link

feat: add cpu sustained clock speed label to instance metadata #7043

Open aidan-canva opened 2 months ago

aidan-canva commented 2 months ago

Description Some workloads are sensitive to variations instance CPU clock speed - either preferring a specific threshold or at least ensuring consistency across replicas. This PR adds the EC2 SustainedClockSpeedInGhz value as a Karpenter label (karpenter.k8s.aws/instance-cpu-sustained-clock-speed-mhz) so that workloads can add their preference.

The upstream value from the EC2 API is in Ghz and represented as a float (ie 2.4). nodeSelectors only support ints or strings and most use-cases for this will want to leverage the Gt or Lt operators to set minimum/maximum values. To make this usable, this implementation converts the Ghz value into Mhz and represents it as an int.

How was this change tested?

Does this change impact docs? Do codegen docs count? website/content/en/preview/reference/instance-types.md has been updated to reflect this new instance attribute.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

netlify[bot] commented 2 months ago

Deploy Preview for karpenter-docs-prod ready!

Name Link
Latest commit e36c1a9db20b28fc61b941cc725f7574293c0f9d
Latest deploy log https://app.netlify.com/sites/karpenter-docs-prod/deploys/673d3a427ebce90008158dd9
Deploy Preview https://deploy-preview-7043--karpenter-docs-prod.netlify.app
Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

rschalo commented 1 month ago

Thanks for your contribution! Running the test workflows.

coveralls commented 1 month ago

Pull Request Test Coverage Report for Build 11924912359

Details


Totals Coverage Status
Change from base Build 11924046371: 0.01%
Covered Lines: 5689
Relevant Lines: 6899

πŸ’› - Coveralls
njtran commented 1 month ago

@aidan-canva are you able to fix the CI errors?

github-actions[bot] commented 1 month ago

This PR has been inactive for 14 days. StaleBot will close this stale PR after 14 more days of inactivity.

aidan-canva commented 3 weeks ago

@njtran apologies for the delay, I've been on vacation for a period. I've just pushed a fix which should hopefully address the CI failures. Can I kick checks off myself?

github-actions[bot] commented 1 week ago

This PR has been inactive for 14 days. StaleBot will close this stale PR after 14 more days of inactivity.

anteliano commented 8 hours ago

Without this feature it is very tricky to avoid the slow instance types like m5a which is 2.5Ghz while seemingly similar m5 is on 3.1Ghz, the difference is significant for many workloads.

aidan-canva commented 6 hours ago

I'm still motivated to get this PR merged - I believe it was in a mergable stable 3 weeks ago and just waiting for a repo owner to trigger the CI checks. Since then, there are now some merge conflicts that need to be resolved (I can do that) - but it seems wasteful to do it and not get an indication someone can help get this merged.

rschalo commented 3 hours ago

Hi @aidan-canva, apologies for the delay and timing. We're working on the next minor right now, think you could have this ready for review by Thursday morning and targeting merge Thursday EOD? Also, mind removing the instance type generation from the diff for this PR? We can do a fast-follow PR.

aidan-canva commented 2 hours ago

Hi @aidan-canva, apologies for the delay and timing. We're working on the next minor right now, think you could have this ready for review by Thursday morning and targeting merge Thursday EOD? Also, mind removing the instance type generation from the diff for this PR? We can do a fast-follow PR.

@rschalo - No worries, I appreciate its a hectic time of year for most of AWS!

I've just merged main and cleaned up some merge conflicts and validated this is passing tests via make test. I've also reset website/content/en/preview/reference/instance-types.md back to main to avoid further drift/conflicts.

Should be ready for a CI run now.