Open cep21 opened 1 year ago
even on-demand, karpenter picks the cheaper instance for the resources requested. the way to influence that, as you already know, is using multiple providers with different weights.
karpenter will never know what instances are ideal for you. one team could need more memory focused (R family), others faster CPU (C family).
it's up to practitioners to tell karpenter what they need, no?
We just ran into this today so I was happy to see an issue created for it!
In our case, we have a single provisioner and only use on-demand capacity (no spot). It'd be great to somehow be able to tell Karpenter to prefer newer generations, falling back to older only when the instance type is unavailable in the on-demand capacity pool for our AZ(s).
Specifically, we're seeing Karpenter provision c5.12xlarge
instead of c6i.12xlarge
because they cost the same on-demand:
❯ ec2-instance-selector --vcpus 48 --region us-west-1 --sort-by '.OndemandPricePerHour' -o table-wide -a x86_64 --max-results 100
Instance Type VCPUs Mem (GiB) Hypervisor Current Gen Hibernation Support CPU Arch Network Performance ENIs GPUs GPU Mem (GiB) GPU Info On-Demand Price/Hr Spot Price/Hr (30d avg)
------------- ----- --------- ---------- ----------- ------------------- -------- ------------------- ---- ---- ------------- -------- ------------------ -----------------------
c5.12xlarge 48 96 nitro true true x86_64 12 Gigabit 8 0 0 none $2.544 $0.74014
c6i.12xlarge 48 96 nitro true true x86_64 18.75 Gigabit 8 0 0 none $2.544 $0.88263
we do this with two providers
weight: 50
requirements:
- key: "topology.kubernetes.io/zone"
operator: In
values: ${azs}
- key: "karpenter.sh/capacity-type"
operator: In
values: ${capacity-type}
- key: "karpenter.k8s.aws/instance-category"
operator: NotIn
values:
- "a"
- "t"
- key: "karpenter.k8s.aws/instance-family"
operator: NotIn
values:
- "z1d"
- key: "karpenter.k8s.aws/instance-size"
operator: NotIn
values:
- "metal"
- key: "karpenter.k8s.aws/instance-hypervisor"
operator: In
values:
- "nitro"
- key: "karpenter.k8s.aws/instance-generation"
operator: In
values:
- "6"
- "7"
[...]
requirements:
- key: "topology.kubernetes.io/zone"
operator: In
values: ${azs}
- key: "karpenter.sh/capacity-type"
operator: In
values: ${capacity-type}
- key: "karpenter.k8s.aws/instance-category"
operator: NotIn
values:
- "a"
- "t"
- key: "karpenter.k8s.aws/instance-family"
operator: NotIn
values:
- "z1d"
- key: "karpenter.k8s.aws/instance-size"
operator: NotIn
values:
- "metal"
- key: "karpenter.k8s.aws/instance-hypervisor"
operator: In
values:
- "nitro"
- key: "karpenter.k8s.aws/instance-generation"
operator: NotIn
values:
- "1"
- "2"
that way gen 6 and 7 are prefered, and if not available, use a bigger pool
karpenter picks the cheaper instance for the resources requested
It's a cheaper instance, but a more expensive cluster since a penny saved for an older generation isn't worth the performance hit in almost every case.
one team could need more memory focused (R family), others faster CPU
Agreed! Karpenter can know the cluster needs more CPUs so pick an instance family with higher CPU/Memory ratio (Same for memory), but it cannot go too deep comparing one instance family with another.
However, if you're inside the C or R family, it's almost never worth picking an older generation.
we do this with two providers
That's the current workaround, but instance generations go up to 7. Combined with instance types, preferring larger instances, and capacity-type: the number of provisioners is very large. A simple algorithm that will almost always work is some item that can say "Within an instance family, prefer newer generations".
Especially for on demand I would always prefer to have the lastest gen if the price is equal to older gen stuff.
Karpenter can know the cluster needs more CPUs so pick an instance family with higher CPU/Memory ratio (Same for memory), but it cannot go too deep comparing one instance family with another. A simple algorithm that will almost always work
these are things humans with knowledge of their services know, but not app knows. karpenter can never decide for you what is best for your services, since all it knows about them is kube resources requests. not performance affinity.
Because AWS the generations are with number it should be easy to sort them and have m6 in front of m5... this might only be a problem when two digit generations comes up but even than the instance generation is already somehow parsed and available as a label.
Perhaps we can help make a thing (a controller?) that further customizes your provisioner(s), based on what matters for your cluster? Could be something outside of Karpenter, I'm thinking like some contrib code.
Could preferredDuringSchedulingIgnoredDuringExecution
work?
By setting a preferrence for karpenter.k8s.aws/instance-generation=6
we might be able to tell Karpenter to prefer geneneration 6, but if no generation 6 is available in can fall back on another generation?
By setting:
apiVersion: v1
kind: Pod
metadata:
name: with-node-affinity
spec:
affinity:
nodeAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 1
preference:
matchExpressions:
- key: karpenter.k8s.aws/instance-generation
operator: In
values:
- "6"
containers:
- name: with-node-affinity
image: registry.k8s.io/pause:2.0
Tell us about your request
Because not all instance types with N CPU are equivalent, smart fallback to the best instance generation family if having to run on-demand.
Tell us about the problem you're trying to solve. What are you trying to do, and why is it hard?
Current karpenter docs ask for Provisioners with many different instance types and families. This is great, since it allows pulling instances from a larger spot pool.
However, if karpenter has to fall back to an on demand node (because there is no spot capacity), it doesn't make much sense for us to use old generation node types. Why bother, when I get better CPU for my money on the latest generation?
Are you currently working around this issue?
We can have multiple provisioners. If this is preferred, then the docs should probably clarify "It is likely not optimal to mix old generations and on-demand". Right now, most people probably follow the docs and have in the same provisioner "spot+on-demand" along with "many instance generations"
Additional Context
No response
Attachments
No response
Community Note