The nodepool managed by karpenter is sitting at <20% utilization in terms of both CPU and memory requests. When I looked into the logs I came accross the following lines:
{"level":"INFO","time":"2024-06-27T19:38:01.744Z","logger":"controller","message":"disrupting via drift replace, terminating 1 nodes (33 pods) ip-10-3-157-84.eu-central-1.compute.internal/t3.xlarge/on-demand and replacing with 0 spot and 4 on-demand, from types t3.xlarge","commit":"490ef94","controller":"disruption","command-id":"1a2a64c0-2cef-4649-a0d7-014db1ff2409"}
...
{"level":"INFO","time":"2024-06-27T19:41:48.006Z","logger":"controller","message":"disrupting via drift replace, terminating 1 nodes (29 pods) ip-10-3-98-245.eu-central-1.compute.internal/t3.xlarge/on-demand and replacing with 0 spot and 3 on-demand, from types t3.xlarge","commit":"490ef94","controller":"disruption","command-id":"326df360-67ed-499b-b335-aa9a7bde2c54"}
...
{"level":"INFO","time":"2024-06-27T19:44:49.537Z","logger":"controller","message":"disrupting via drift replace, terminating 1 nodes (28 pods) ip-10-3-235-126.eu-central-1.compute.internal/t3.xlarge/on-demand and replacing with 0 spot and 3 on-demand, from types t3.xlarge","commit":"490ef94","controller":"disruption","command-id":"c86a04a9-c3d5-45b5-bb90-9c565ad7ba62"}
So karpenter replaced 3 nodes with 10 nodes of the same type after drift disruption. While I understand that the initial calculation may have been wrong, I'd expect consolidation to occur afterwards however for every node karpenter kept emitting the Unconsolidatable event:
Normal Unconsolidatable 10m (x73 over 20h) karpenter Can't replace with a cheaper node
When I manually drained 3/4 nodes in one zone everything fit on the one remaining node.
Expected Behavior:
Reproduction Steps (Please include YAML):
apiVersion: karpenter.sh/v1beta1
kind: NodePool
metadata:
name: general
spec:
disruption:
consolidationPolicy: WhenUnderutilized
expireAfter: Never
limits:
cpu: 1k
memory: 1000Gi
template:
metadata: {}
spec:
nodeClassRef:
name: default
requirements:
- key: karpenter.sh/capacity-type
operator: In
values:
- on-demand
- key: kubernetes.io/arch
operator: In
values:
- amd64
- key: karpenter.k8s.aws/instance-family
operator: In
values:
- t3
- key: karpenter.k8s.aws/instance-size
operator: In
values:
- xlarge
- key: kubernetes.io/os
operator: In
values:
- linux
taints:
- effect: NoSchedule
key: general
weight: 100
In this particular case the limits are way too high as there will never be a need for even 10% of that, but even after fixing that the node pool would still be overprovisioned.
Versions:
Chart Version: 0.37.0
Kubernetes Version (kubectl version): 1.30
Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
If you are interested in working on this issue or have submitted a pull request, please leave a comment
Description
Observed Behavior:
The nodepool managed by karpenter is sitting at <20% utilization in terms of both CPU and memory requests. When I looked into the logs I came accross the following lines:
So karpenter replaced 3 nodes with 10 nodes of the same type after drift disruption. While I understand that the initial calculation may have been wrong, I'd expect consolidation to occur afterwards however for every node karpenter kept emitting the
Unconsolidatable
event:When I manually drained 3/4 nodes in one zone everything fit on the one remaining node.
Expected Behavior:
Reproduction Steps (Please include YAML):
In this particular case the limits are way too high as there will never be a need for even 10% of that, but even after fixing that the node pool would still be overprovisioned.
Versions:
0.37.0
kubectl version
):1.30