aws / karpenter-provider-aws

Karpenter is a Kubernetes Node Autoscaler built for flexibility, performance, and simplicity.
https://karpenter.sh
Apache License 2.0
6.83k stars 963 forks source link

Allow configurable on-demand fallback #3402

Open cep21 opened 1 year ago

cep21 commented 1 year ago

Tell us about your request

Because not all instance types with N CPU are equivalent, smart fallback to the best instance generation family if having to run on-demand.

Tell us about the problem you're trying to solve. What are you trying to do, and why is it hard?

Current karpenter docs ask for Provisioners with many different instance types and families. This is great, since it allows pulling instances from a larger spot pool.

However, if karpenter has to fall back to an on demand node (because there is no spot capacity), it doesn't make much sense for us to use old generation node types. Why bother, when I get better CPU for my money on the latest generation?

Are you currently working around this issue?

We can have multiple provisioners. If this is preferred, then the docs should probably clarify "It is likely not optimal to mix old generations and on-demand". Right now, most people probably follow the docs and have in the same provisioner "spot+on-demand" along with "many instance generations"

Additional Context

No response

Attachments

No response

Community Note

FernandoMiguel commented 1 year ago

even on-demand, karpenter picks the cheaper instance for the resources requested. the way to influence that, as you already know, is using multiple providers with different weights.

karpenter will never know what instances are ideal for you. one team could need more memory focused (R family), others faster CPU (C family).

it's up to practitioners to tell karpenter what they need, no?

Timer commented 1 year ago

We just ran into this today so I was happy to see an issue created for it!

In our case, we have a single provisioner and only use on-demand capacity (no spot). It'd be great to somehow be able to tell Karpenter to prefer newer generations, falling back to older only when the instance type is unavailable in the on-demand capacity pool for our AZ(s).

Specifically, we're seeing Karpenter provision c5.12xlarge instead of c6i.12xlarge because they cost the same on-demand:

❯ ec2-instance-selector --vcpus 48 --region us-west-1 --sort-by '.OndemandPricePerHour' -o table-wide -a x86_64 --max-results 100
Instance Type  VCPUs   Mem (GiB)  Hypervisor  Current Gen  Hibernation Support  CPU Arch  Network Performance  ENIs    GPUs    GPU Mem (GiB)  GPU Info   On-Demand Price/Hr  Spot Price/Hr (30d avg)  
-------------  -----   ---------  ----------  -----------  -------------------  --------  -------------------  ----    ----    -------------  --------   ------------------  -----------------------  
c5.12xlarge    48      96         nitro       true         true                 x86_64    12 Gigabit           8       0       0              none       $2.544              $0.74014                 
c6i.12xlarge   48      96         nitro       true         true                 x86_64    18.75 Gigabit        8       0       0              none       $2.544              $0.88263       
FernandoMiguel commented 1 year ago

we do this with two providers

  weight: 50
  requirements:
    - key: "topology.kubernetes.io/zone"
      operator: In
      values: ${azs}
    - key: "karpenter.sh/capacity-type"
      operator: In
      values: ${capacity-type}
    - key: "karpenter.k8s.aws/instance-category"
      operator: NotIn
      values:
        - "a"
        - "t"
    - key: "karpenter.k8s.aws/instance-family"
      operator: NotIn
      values:
        - "z1d"
    - key: "karpenter.k8s.aws/instance-size"
      operator: NotIn
      values:
        - "metal"
    - key: "karpenter.k8s.aws/instance-hypervisor"
      operator: In
      values:
        - "nitro"
    - key: "karpenter.k8s.aws/instance-generation"
      operator: In
      values:
        - "6"
        - "7"
[...]
  requirements:
    - key: "topology.kubernetes.io/zone"
      operator: In
      values: ${azs}
    - key: "karpenter.sh/capacity-type"
      operator: In
      values: ${capacity-type}
    - key: "karpenter.k8s.aws/instance-category"
      operator: NotIn
      values:
        - "a"
        - "t"
    - key: "karpenter.k8s.aws/instance-family"
      operator: NotIn
      values:
        - "z1d"
    - key: "karpenter.k8s.aws/instance-size"
      operator: NotIn
      values:
        - "metal"
    - key: "karpenter.k8s.aws/instance-hypervisor"
      operator: In
      values:
        - "nitro"
    - key: "karpenter.k8s.aws/instance-generation"
      operator: NotIn
      values:
        - "1"
        - "2"

that way gen 6 and 7 are prefered, and if not available, use a bigger pool

cep21 commented 1 year ago

karpenter picks the cheaper instance for the resources requested

It's a cheaper instance, but a more expensive cluster since a penny saved for an older generation isn't worth the performance hit in almost every case.

one team could need more memory focused (R family), others faster CPU

Agreed! Karpenter can know the cluster needs more CPUs so pick an instance family with higher CPU/Memory ratio (Same for memory), but it cannot go too deep comparing one instance family with another.

However, if you're inside the C or R family, it's almost never worth picking an older generation.

we do this with two providers

That's the current workaround, but instance generations go up to 7. Combined with instance types, preferring larger instances, and capacity-type: the number of provisioners is very large. A simple algorithm that will almost always work is some item that can say "Within an instance family, prefer newer generations".

runningman84 commented 1 year ago

Especially for on demand I would always prefer to have the lastest gen if the price is equal to older gen stuff.

FernandoMiguel commented 1 year ago

Karpenter can know the cluster needs more CPUs so pick an instance family with higher CPU/Memory ratio (Same for memory), but it cannot go too deep comparing one instance family with another. A simple algorithm that will almost always work

these are things humans with knowledge of their services know, but not app knows. karpenter can never decide for you what is best for your services, since all it knows about them is kube resources requests. not performance affinity.

runningman84 commented 1 year ago

Because AWS the generations are with number it should be easy to sort them and have m6 in front of m5... this might only be a problem when two digit generations comes up but even than the instance generation is already somehow parsed and available as a label.

sftim commented 1 year ago

Perhaps we can help make a thing (a controller?) that further customizes your provisioner(s), based on what matters for your cluster? Could be something outside of Karpenter, I'm thinking like some contrib code.

stijndehaes commented 11 months ago

Could preferredDuringSchedulingIgnoredDuringExecution work?

By setting a preferrence for karpenter.k8s.aws/instance-generation=6 we might be able to tell Karpenter to prefer geneneration 6, but if no generation 6 is available in can fall back on another generation?

By setting:

apiVersion: v1
kind: Pod
metadata:
  name: with-node-affinity
spec:
  affinity:
    nodeAffinity:
      preferredDuringSchedulingIgnoredDuringExecution:
      - weight: 1
        preference:
          matchExpressions:
          - key: karpenter.k8s.aws/instance-generation
            operator: In
            values:
            - "6"
  containers:
  - name: with-node-affinity
    image: registry.k8s.io/pause:2.0