Karpenter not honoring/working with topology spread constraints or pod affinity

Description

Observed Behavior:

This will be rather long as I describe the different scenarios we've tested. We have a deployment that, by default, does not specify topologySpreadConstraints or affinity and using the k8s default constraints with no node selectors or tolerations, the three replicas get deployed across our three AZs in us-west-2 to our untainted node pool. We are attempting to provide a dedicated node pool for this deployment and cannot seem to get karpenter to honor or work with different combinations of topologySpreadConstraints and/or affinity configurations. Below are the node pool and and subnet sections of the node class definitions, pod nodeSelector and tolerations, along with the behavior we are seeing with the different scenarios.

Node pool:

apiVersion: karpenter.sh/v1beta1
kind: NodePool
metadata:
  annotations:
    karpenter.sh/nodepool-hash: "11712266115069733881"
    karpenter.sh/nodepool-hash-version: v2
  creationTimestamp: "2024-08-08T19:31:29Z"
  generation: 2
  name: deployment-node-pool
  resourceVersion: "995009893"
  uid: df0ab507-2874-4483-832a-f1d26b551bc9
spec:
  disruption:
    budgets:
    - nodes: 10%
    consolidateAfter: 30s
    consolidationPolicy: WhenEmpty
    expireAfter: Never
  limits:
    memory: 128Gi
  template:
    metadata:
      labels:
        deployment-dedicated: "true"
        node.blah/include-target-group-deployment-http: "true"
        node.blah/include-target-group-deployment-https: "true"
    spec:
      nodeClassRef:
        name: blah-default-node-class
      requirements:
      - key: kubernetes.io/arch
        operator: In
        values:
        - amd64
      - key: karpenter.sh/capacity-type
        operator: In
        values:
        - on-demand
      - key: karpenter.k8s.aws/instance-family
        operator: In
        values:
        - m5
      taints:
      - effect: NoSchedule
        key: deployment-dedicated
status:
  resources:
    cpu: "6"
    ephemeral-storage: 1572089868Ki
    memory: 23945624Ki
    pods: "87"

Node class snippet:

...
  subnetSelectorTerms:
  - id: subnet-0c94a1cdcd52dd53f
  - id: subnet-0f1fc6fa5767fdfa9
  - id: subnet-0a1c2721b7e0ff43a
...
status:
...
  subnets:
  - id: subnet-0a1c2721b7e0ff43a
    zone: us-west-2c
    zoneID: usw2-az3
  - id: subnet-0f1fc6fa5767fdfa9
    zone: us-west-2b
    zoneID: usw2-az2
  - id: subnet-0c94a1cdcd52dd53f
    zone: us-west-2a
    zoneID: usw2-az1

The first thing we tried is a topologySpreadConstraints definition as follow (tried with ScheduleAnyway and DoNotSchedule):

  topologySpreadConstraints:
    - maxSkew: 1
      topologyKey: topology.kubernetes.io/zone
      whenUnsatisfiable: ScheduleAnyway
      labelSelector:
        matchLabels:
          product: blah
    - maxSkew: 1
      topologyKey: kubernetes.io/hostname
      whenUnsatisfiable: ScheduleAnyway
      labelSelector:
        matchLabels:
          product: blah

This resulted in karpenter spinning up the first node and scheduling all three pods on it. We then attempted the following:

  affinity:
    podAntiAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
      - labelSelector:
          matchExpressions:
          - key: "product"
            operator: In
            values:
            - "blah"
        topologyKey: topology.kubernetes.io/zone

  topologySpreadConstraints:
    - maxSkew: 1
      topologyKey: topology.kubernetes.io/zone
      whenUnsatisfiable: ScheduleAnyway
      labelSelector:
        matchLabels:
          product: blah
    - maxSkew: 1
      topologyKey: kubernetes.io/hostname
      whenUnsatisfiable: ScheduleAnyway
      labelSelector:
        matchLabels:
          product: blah

which resulted in karpenter not responding at all and the first pod of the deployment never getting scheduled. The following schedules all three pods in the deployment, but does not spread them across AZs (the only change from above is the affinity topology key):

  affinity:
    podAntiAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
      - labelSelector:
          matchExpressions:
          - key: "product"
            operator: In
            values:
            - "blah"
        topologyKey: kubernetes.io/hostname

  topologySpreadConstraints:
    - maxSkew: 1
      topologyKey: topology.kubernetes.io/zone
      whenUnsatisfiable: ScheduleAnyway
      labelSelector:
        matchLabels:
          product: blah
    - maxSkew: 1
      topologyKey: kubernetes.io/hostname
      whenUnsatisfiable: ScheduleAnyway
      labelSelector:
        matchLabels:
          product: blah

Below are from the pods:

  nodeSelector:
    blah-dedicated: "true"
    node.blah/include-target-group-blah-http: "true"
    node.blah/include-target-group-blah-https: "true"

  tolerations:
  - effect: NoSchedule
    key: deployment-dedicated
    operator: Exists

Expected Behavior: Karpenter would honor the topologySpreadConstraints or affinity settings and spread pods across three nodes in three AZs.

Reproduction Steps (Please include YAML):

Versions:

Chart Version: 0.37.0
Kubernetes Version (kubectl version): v1.28.11-eks-db838b0
Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
If you are interested in working on this issue or have submitted a pull request, please leave a comment

aws / karpenter-provider-aws

Karpenter not honoring/working with topology spread constraints or pod affinity #6694

Description