aws / karpenter-provider-aws

Karpenter is a Kubernetes Node Autoscaler built for flexibility, performance, and simplicity.
https://karpenter.sh
Apache License 2.0
6.15k stars 849 forks source link

Disruption via drift results in 4x the nodes being provisioned which aren't consolidated afterwards #6426

Open AleksaC opened 4 days ago

AleksaC commented 4 days ago

Description

Observed Behavior:

The nodepool managed by karpenter is sitting at <20% utilization in terms of both CPU and memory requests. When I looked into the logs I came accross the following lines:

{"level":"INFO","time":"2024-06-27T19:38:01.744Z","logger":"controller","message":"disrupting via drift replace, terminating 1 nodes (33 pods) ip-10-3-157-84.eu-central-1.compute.internal/t3.xlarge/on-demand and replacing with 0 spot and 4 on-demand, from types t3.xlarge","commit":"490ef94","controller":"disruption","command-id":"1a2a64c0-2cef-4649-a0d7-014db1ff2409"}
...
{"level":"INFO","time":"2024-06-27T19:41:48.006Z","logger":"controller","message":"disrupting via drift replace, terminating 1 nodes (29 pods) ip-10-3-98-245.eu-central-1.compute.internal/t3.xlarge/on-demand and replacing with 0 spot and 3 on-demand, from types t3.xlarge","commit":"490ef94","controller":"disruption","command-id":"326df360-67ed-499b-b335-aa9a7bde2c54"}
...
{"level":"INFO","time":"2024-06-27T19:44:49.537Z","logger":"controller","message":"disrupting via drift replace, terminating 1 nodes (28 pods) ip-10-3-235-126.eu-central-1.compute.internal/t3.xlarge/on-demand and replacing with 0 spot and 3 on-demand, from types t3.xlarge","commit":"490ef94","controller":"disruption","command-id":"c86a04a9-c3d5-45b5-bb90-9c565ad7ba62"}

So karpenter replaced 3 nodes with 10 nodes of the same type after drift disruption. While I understand that the initial calculation may have been wrong, I'd expect consolidation to occur afterwards however for every node karpenter kept emitting the Unconsolidatable event:

Normal  Unconsolidatable   10m (x73 over 20h)  karpenter  Can't replace with a cheaper node

When I manually drained 3/4 nodes in one zone everything fit on the one remaining node.

Expected Behavior:

Reproduction Steps (Please include YAML):

apiVersion: karpenter.sh/v1beta1
kind: NodePool
metadata:
  name: general
spec:
  disruption:
    consolidationPolicy: WhenUnderutilized
    expireAfter: Never
  limits:
    cpu: 1k
    memory: 1000Gi
  template:
    metadata: {}
    spec:
      nodeClassRef:
        name: default
      requirements:
        - key: karpenter.sh/capacity-type
          operator: In
          values:
            - on-demand
        - key: kubernetes.io/arch
          operator: In
          values:
            - amd64
        - key: karpenter.k8s.aws/instance-family
          operator: In
          values:
            - t3
        - key: karpenter.k8s.aws/instance-size
          operator: In
          values:
            - xlarge
        - key: kubernetes.io/os
          operator: In
          values:
            - linux
      taints:
        - effect: NoSchedule
          key: general
  weight: 100

In this particular case the limits are way too high as there will never be a need for even 10% of that, but even after fixing that the node pool would still be overprovisioned.

Versions:


jonathan-innis commented 1 day ago

The nodepool managed by karpenter is sitting at <20% utilization in terms of both CPU and memory requests

Can you share the deployment specs that you are using here?