aws / karpenter-provider-aws

Karpenter is a Kubernetes Node Autoscaler built for flexibility, performance, and simplicity.
https://karpenter.sh
Apache License 2.0
6.7k stars 939 forks source link

New nodes are not provisioned for StatefulSet with podAntiAffinity topologyKey: failure-domain.beta.kubernetes.io/zone #6440

Open DanniRiggin opened 3 months ago

DanniRiggin commented 3 months ago

Description

Observed Behavior: When I deploy a simple StatefulSet with podAntiAffinity using the label failure-domain.beta.kubernetes.io/zone as the topologyKey, if one of the pods in my StatefulSet needs a new node, it does not get provisioned. Looking in the karpenter logs, it seems to have an issue with this key as it prints out:

unsatisfiable topology constraint for pod anti-affinity, key=failure-domain.beta.kubernetes.io/zone (counts = us-east-1a: 1 , podDomains = topology.kubernetes.io/zone Exists, nodeDomains = topology.kubernetes.io/zone Exists

Expected Behavior: According to the docs under well known labels - failure-domain.beta.kubernetes.io/zone should be mapped to the stable equivalent (topology.kubernetes.io/zone), so nodes should continue to be provisioned when necessary for these pods in the StatefulSet.

Reproduction Steps (Please include YAML):

  1. Deploy the following StatefulSet:
    apiVersion: apps/v1
    kind: StatefulSet
    metadata:
    labels:
    run: curl
    name: curl-1
    spec:
    selector:
    matchLabels:
      run: curl
    replicas: 3
    podManagementPolicy: OrderedReady
    template:
    metadata:
      labels:
        run: curl
    spec:
      affinity:
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            - labelSelector:
                matchExpressions:
                - key: run
                  operator: In
                  values:
                  - curl
              topologyKey: failure-domain.beta.kubernetes.io/zone
      containers:
      - args:
        - sleep
        - "3000"
        image: curlimages/curl
        name: curl
        resources:
          requests:
            cpu: 1000m
            memory: 1000Mi
          limits:
            memory: 5000Mi
        securityContext:
          runAsUser: 1001
      dnsPolicy: ClusterFirst
      restartPolicy: Always
      terminationGracePeriodSeconds: 1
    status: {}
  2. If none of the pods needed a new node and all get scheduled, continue to deploy the StatefulSet in different namespaces until eventually a new node needs to be provisioned to fit one of the pods.
  3. Once you have a pending pod, note in the karpenter logs or on the pod events that there is an error for unsatisfiable topology constraint for pod anti-affinity, key=failure-domain.beta.kubernetes.io/zone
  4. Repeat the process, but change the topologyKey to topology.kubernetes.io/zone
  5. Once you have a pod that needs a new node, note that karpenter nominates a nodeClaim for that pod and the new node comes up as expected

Versions:

jigisha620 commented 3 months ago

Is there are reason for using the deprecated label?

DanniRiggin commented 3 months ago

We don't have to use the deprecated label and our issue is fixed by using topology.kubernetes.io/zone. Reported this in case others run into the same issue because it was a pain point in debugging this issue since the docs say that the deprecated label works but it does not.