aws / karpenter-provider-aws

Karpenter is a Kubernetes Node Autoscaler built for flexibility, performance, and simplicity.
https://karpenter.sh
Apache License 2.0
6.59k stars 915 forks source link

during drift, karpenter destroy old new before new node is ready #6591

Closed christophercorn closed 1 week ago

christophercorn commented 1 month ago

Description

Observed Behavior: when changing instance-category, karpenter will spin up a node but delete the old new before the new node is ready. at times i've even seen it completely delete the old node before the new node has even begun to spin up.

this seems like the same issue as: https://github.com/aws/karpenter-provider-aws/issues/3979

Expected Behavior: disruption controller states that the before spinning down a node, a new node must come up first

"Pre-spin any replacement nodes needed as calculated in Step (2), and wait for them to become ready. If a replacement node fails to initialize, un-taint the node(s), and restart from Step (1), starting at the first disruption method again."

https://karpenter.sh/v0.32/concepts/disruption/#disruption-controller

Reproduction Steps (Please include YAML):

apiVersion: v1
items:
- apiVersion: apps/v1
  kind: Deployment
  metadata:
    annotations:
      deployment.kubernetes.io/revision: "4"
    creationTimestamp: "2024-07-20T23:44:50Z"
    generation: 4
    labels:
      app.kubernetes.io/instance: karpenter
      app.kubernetes.io/managed-by: Helm
      app.kubernetes.io/name: karpenter
      app.kubernetes.io/version: 0.37.0
      helm.sh/chart: karpenter-0.37.0
    name: karpenter
    namespace: karpenter
    resourceVersion: "1994557037"
    uid: 4d2e5f1a-8709-4d24-81a7-7e13192b17f5
  spec:
    progressDeadlineSeconds: 600
    replicas: 2
    revisionHistoryLimit: 10
    selector:
      matchLabels:
        app.kubernetes.io/instance: karpenter
        app.kubernetes.io/name: karpenter
    strategy:
      rollingUpdate:
        maxSurge: 25%
        maxUnavailable: 1
      type: RollingUpdate
    template:
      metadata:
        creationTimestamp: null
        labels:
          app.kubernetes.io/instance: karpenter
          app.kubernetes.io/name: karpenter
      spec:
        affinity:
          nodeAffinity:
            requiredDuringSchedulingIgnoredDuringExecution:
              nodeSelectorTerms:
              - matchExpressions:
                - key: karpenter.sh/nodepool
                  operator: DoesNotExist
          podAntiAffinity:
            requiredDuringSchedulingIgnoredDuringExecution:
            - labelSelector:
                matchLabels:
                  app.kubernetes.io/instance: karpenter
                  app.kubernetes.io/name: karpenter
              topologyKey: kubernetes.io/hostname
        containers:
        - env:
          - name: KUBERNETES_MIN_VERSION
            value: 1.19.0-0
          - name: KARPENTER_SERVICE
            value: karpenter
          - name: LOG_LEVEL
            value: info
          - name: METRICS_PORT
            value: "8000"
          - name: HEALTH_PROBE_PORT
            value: "8081"
          - name: SYSTEM_NAMESPACE
            valueFrom:
              fieldRef:
                apiVersion: v1
                fieldPath: metadata.namespace
          - name: MEMORY_LIMIT
            valueFrom:
              resourceFieldRef:
                containerName: controller
                divisor: "0"
                resource: limits.memory
          - name: FEATURE_GATES
            value: Drift=true,SpotToSpotConsolidation=false
          - name: BATCH_MAX_DURATION
            value: 10s
          - name: BATCH_IDLE_DURATION
            value: 1s
          - name: ASSUME_ROLE_DURATION
            value: 15m
          - name: CLUSTER_NAME
            value: data-staging-naos
          - name: VM_MEMORY_OVERHEAD_PERCENT
            value: "0.075"
          - name: RESERVED_ENIS
            value: "0"
          image: public.ecr.aws/karpenter/controller:0.37.0@sha256:157f478f5db1fe999f5e2d27badcc742bf51cc470508b3cebe78224d0947674f
          imagePullPolicy: IfNotPresent
          livenessProbe:
            failureThreshold: 3
            httpGet:
              path: /healthz
              port: http
              scheme: HTTP
            initialDelaySeconds: 30
            periodSeconds: 10
            successThreshold: 1
            timeoutSeconds: 30
          name: controller
          ports:
          - containerPort: 8000
            name: http-metrics
            protocol: TCP
          - containerPort: 8081
            name: http
            protocol: TCP
          readinessProbe:
            failureThreshold: 3
            httpGet:
              path: /readyz
              port: http
              scheme: HTTP
            initialDelaySeconds: 5
            periodSeconds: 10
            successThreshold: 1
            timeoutSeconds: 30
          resources: {}
          securityContext:
            allowPrivilegeEscalation: false
            capabilities:
              drop:
              - ALL
            readOnlyRootFilesystem: true
            runAsGroup: 65532
            runAsNonRoot: true
            runAsUser: 65532
            seccompProfile:
              type: RuntimeDefault
          terminationMessagePath: /dev/termination-log
          terminationMessagePolicy: File
        dnsPolicy: ClusterFirst
        nodeSelector:
          dedicated: karpenter
          kubernetes.io/os: linux
        priorityClassName: system-cluster-critical
        restartPolicy: Always
        schedulerName: default-scheduler
        securityContext:
          fsGroup: 65532
        serviceAccount: karpenter
        serviceAccountName: karpenter
        terminationGracePeriodSeconds: 30
        tolerations:
        - effect: NoSchedule
          key: dedicated
          operator: Equal
          value: karpenter
        topologySpreadConstraints:
        - labelSelector:
            matchLabels:
              app.kubernetes.io/instance: karpenter
              app.kubernetes.io/name: karpenter
          maxSkew: 1
          topologyKey: topology.kubernetes.io/zone
          whenUnsatisfiable: ScheduleAnyway
  status:
    availableReplicas: 2
    conditions:
    - lastTransitionTime: "2024-07-22T20:17:57Z"
      lastUpdateTime: "2024-07-25T19:49:40Z"
      message: ReplicaSet "karpenter-655879ff67" has successfully progressed.
      reason: NewReplicaSetAvailable
      status: "True"
      type: Progressing
    - lastTransitionTime: "2024-07-25T21:31:30Z"
      lastUpdateTime: "2024-07-25T21:31:30Z"
      message: Deployment has minimum availability.
      reason: MinimumReplicasAvailable
      status: "True"
      type: Available
    observedGeneration: 4
    readyReplicas: 2
    replicas: 2
    updatedReplicas: 2
kind: List
metadata:
  resourceVersion: ""
k get nodepools test-amd64 -o yaml
apiVersion: karpenter.sh/v1beta1
kind: NodePool
metadata:
  annotations:
    karpenter.sh/nodepool-hash: "14511052297762961874"
    karpenter.sh/nodepool-hash-version: v2
    kubernetes.io/description: NodePool for amd64 workloads
  creationTimestamp: "2024-07-22T23:31:29Z"
  generation: 13
  name: test-amd64
  resourceVersion: "1994675095"
  uid: 2be6a7df-7d82-4843-811a-19357290be02
spec:
  disruption:
    budgets:
    - nodes: 10%
    consolidateAfter: 30s
    consolidationPolicy: WhenEmpty
    expireAfter: 720h
  limits:
    cpu: 1000
    memory: 1000Gi
  template:
    metadata:
      labels:
        dedicated: cctest
    spec:
      nodeClassRef:
        apiVersion: karpenter.k8s.aws/v1beta1
        kind: EC2NodeClass
        name: test-us-west-2c
      requirements:
      - key: kubernetes.io/arch
        operator: In
        values:
        - amd64
      - key: kubernetes.io/os
        operator: In
        values:
        - linux
      - key: karpenter.sh/capacity-type
        operator: In
        values:
        - on-demand
      - key: karpenter.k8s.aws/instance-category
        operator: In
        values:
        - r
      - key: karpenter.k8s.aws/instance-cpu-manufacturer
        operator: In
        values:
        - amd
      - key: karpenter.k8s.aws/instance-generation
        operator: Gt
        values:
        - "1"
      taints:
      - effect: NoSchedule
        key: dedicated
        value: cctest
status:
  resources:
    cpu: "12"
    ephemeral-storage: 62877636Ki
    memory: 97426728Ki
    pods: "174"
k get ec2nodeclass -o yaml              
apiVersion: v1
items:
- apiVersion: karpenter.k8s.aws/v1beta1
  kind: EC2NodeClass
  metadata:
    annotations:
      karpenter.k8s.aws/ec2nodeclass-hash: "15611794696223056619"
      karpenter.k8s.aws/ec2nodeclass-hash-version: v2
      kubernetes.io/description: General purpose EC2NodeClass for running Amazon Linux
        2 nodes
    creationTimestamp: "2024-07-22T23:31:29Z"
    finalizers:
    - karpenter.k8s.aws/termination
    generation: 2
    name: test-us-west-2c
    resourceVersion: "1994426266"
    uid: 9f3a2d34-0d31-4c4b-a352-916d2115f8df
  spec:
    amiFamily: AL2
    metadataOptions:
      httpEndpoint: enabled
      httpProtocolIPv6: disabled
      httpPutResponseHopLimit: 2
      httpTokens: required
    role: nodes.cluster-api-provider-aws.sigs.k8s.io
    securityGroupSelectorTerms:
    - tags:
        Name: eks-cluster-sg-data-staging-naos-1621224044
    - tags:
        Name: data-staging-naos-allow-vpn-2021080923252716850000000e
    - tags:
        Name: data-staging-naos-node-eks-additional
    subnetSelectorTerms:
    - tags:
        Name: data-staging-naos-private-us-west-2c
  status:
    amis:
    - id: ami-00b978331c1a6a827
      name: amazon-eks-node-1.27-v20240703
      requirements:
      - key: kubernetes.io/arch
        operator: In
        values:
        - amd64
      - key: karpenter.k8s.aws/instance-gpu-count
        operator: DoesNotExist
      - key: karpenter.k8s.aws/instance-accelerator-count
        operator: DoesNotExist
    - id: ami-07665cc91afa1b56c
      name: amazon-eks-arm64-node-1.27-v20240703
      requirements:
      - key: kubernetes.io/arch
        operator: In
        values:
        - arm64
      - key: karpenter.k8s.aws/instance-gpu-count
        operator: DoesNotExist
      - key: karpenter.k8s.aws/instance-accelerator-count
        operator: DoesNotExist
    - id: ami-0f1afab026acbacc0
      name: amazon-eks-gpu-node-1.27-v20240703
      requirements:
      - key: kubernetes.io/arch
        operator: In
        values:
        - amd64
      - key: karpenter.k8s.aws/instance-gpu-count
        operator: Exists
    - id: ami-0f1afab026acbacc0
      name: amazon-eks-gpu-node-1.27-v20240703
      requirements:
      - key: kubernetes.io/arch
        operator: In
        values:
        - amd64
      - key: karpenter.k8s.aws/instance-accelerator-count
        operator: Exists
    conditions:
    - lastTransitionTime: "2024-07-25T19:49:32Z"
      message: ""
      reason: Ready
      status: "True"
      type: Ready
    instanceProfile: data-staging-naos_4537089810712467039
    securityGroups:
    - id: sg-04d2ae5ae0a2e0d7d
      name: eks-cluster-sg-data-staging-naos-1621224044
    - id: sg-0566b2f5cc12187d2
      name: data-staging-naos-node-eks-additional
    subnets:
    - id: subnet-041ae50bda723e476
      zone: us-west-2c
      zoneID: usw2-az3
kind: List
metadata:
  resourceVersion: ""

karpenter related log entries

karpenter-6b785d6cb7-qvlmx controller {"level":"INFO","time":"2024-07-24T00:23:05.682Z","logger":"controller.disruption","message":"disrupting via drift replace, terminating 1 nodes (1 pods) ip-10-129-90-75.us-west-2.compute.internal/c6a.large/on-demand and replacing with on-demand node from types m5a.large, m6a.large, m5ad.large, r5a.large, r6a.large and 55 other(s)","commit":"6b868db","command-id":"02393ae8-52ee-4bf0-9ce3-ce081a934367"}
karpenter-6b785d6cb7-qvlmx controller {"level":"INFO","time":"2024-07-24T00:23:05.710Z","logger":"controller.disruption","message":"created nodeclaim","commit":"6b868db","nodepool":"test-amd64","nodeclaim":"test-amd64-mkvrb","requests":{"cpu":"486m","ephemeral-storage":"1304Mi","memory":"756Mi","pods":"10"},"instance-types":"m5a.12xlarge, m5a.16xlarge, m5a.24xlarge, m5a.2xlarge, m5a.4xlarge and 55 other(s)"}
karpenter-6b785d6cb7-qvlmx controller {"level":"INFO","time":"2024-07-24T00:23:08.330Z","logger":"controller.nodeclaim.lifecycle","message":"launched nodeclaim","commit":"6b868db","nodeclaim":"test-amd64-mkvrb","provider-id":"aws:///us-west-2c/i-0bdfa9539de03d1e2","instance-type":"m5a.large","zone":"us-west-2c","capacity-type":"on-demand","allocatable":{"cpu":"1930m","ephemeral-storage":"17Gi","memory":"6903Mi","pods":"29","vpc.amazonaws.com/pod-eni":"9"}}
karpenter-6b785d6cb7-qvlmx controller {"level":"INFO","time":"2024-07-24T00:23:39.735Z","logger":"controller.nodeclaim.lifecycle","message":"registered nodeclaim","commit":"6b868db","nodeclaim":"test-amd64-mkvrb","provider-id":"aws:///us-west-2c/i-0bdfa9539de03d1e2","node":"ip-10-129-89-193.us-west-2.compute.internal"}
karpenter-6b785d6cb7-qvlmx controller {"level":"INFO","time":"2024-07-24T00:24:21.340Z","logger":"controller.nodeclaim.lifecycle","message":"initialized nodeclaim","commit":"6b868db","nodeclaim":"test-amd64-mkvrb","provider-id":"aws:///us-west-2c/i-0bdfa9539de03d1e2","node":"ip-10-129-89-193.us-west-2.compute.internal","allocatable":{"cpu":"1930m","ephemeral-storage":"18242267924","hugepages-1Gi":"0","hugepages-2Mi":"0","memory":"7179184Ki","pods":"29"}}
karpenter-6b785d6cb7-qvlmx controller {"level":"INFO","time":"2024-07-24T00:24:21.987Z","logger":"controller.disruption.queue","message":"command succeeded","commit":"6b868db","command-id":"02393ae8-52ee-4bf0-9ce3-ce081a934367"}
karpenter-6b785d6cb7-qvlmx controller {"level":"INFO","time":"2024-07-24T00:24:22.014Z","logger":"controller.node.termination","message":"tainted node","commit":"6b868db","node":"ip-10-129-90-75.us-west-2.compute.internal"}
karpenter-6b785d6cb7-qvlmx controller {"level":"INFO","time":"2024-07-24T00:24:31.072Z","logger":"controller.node.termination","message":"deleted node","commit":"6b868db","node":"ip-10-129-90-75.us-west-2.compute.internal"}
karpenter-6b785d6cb7-qvlmx controller {"level":"INFO","time":"2024-07-24T00:24:31.438Z","logger":"controller.nodeclaim.termination","message":"deleted nodeclaim","commit":"6b868db","nodeclaim":"test-amd64-568w6","node":"ip-10-129-90-75.us-west-2.compute.internal","provider-id":"aws:///us-west-2c/i-062a83255a79deefe"}

Versions:

engedaam commented 1 month ago
karpenter-6b785d6cb7-qvlmx controller {"level":"INFO","time":"2024-07-24T00:24:21.340Z","logger":"controller.nodeclaim.lifecycle","message":"initialized nodeclaim","commit":"6b868db","nodeclaim":"test-amd64-mkvrb","provider-id":"aws:///us-west-2c/i-0bdfa9539de03d1e2","node":"ip-10-129-89-193.us-west-2.compute.internal","allocatable":{"cpu":"1930m","ephemeral-storage":"18242267924","hugepages-1Gi":"0","hugepages-2Mi":"0","memory":"7179184Ki","pods":"29"}}
karpenter-6b785d6cb7-qvlmx controller {"level":"INFO","time":"2024-07-24T00:24:21.987Z","logger":"controller.disruption.queue","message":"command succeeded","commit":"6b868db","command-id":"02393ae8-52ee-4bf0-9ce3-ce081a934367"}
karpenter-6b785d6cb7-qvlmx controller {"level":"INFO","time":"2024-07-24T00:24:22.014Z","logger":"controller.node.termination","message":"tainted node","commit":"6b868db","node":"ip-10-129-90-75.us-west-2.compute.internal"}
karpenter-6b785d6cb7-qvlmx controller {"level":"INFO","time":"2024-07-24T00:24:31.072Z","logger":"controller.node.termination","message":"deleted node","commit":"6b868db","node":"ip-10-129-90-75.us-west-2.compute.internal"}
karpenter-6b785d6cb7-qvlmx controller {"level":"INFO","time":"2024-07-24T00:24:31.438Z","logger":"controller.nodeclaim.termination","message":"deleted nodeclaim","commit":"6b868db","nodeclaim":"test-amd64-568w6","node":"ip-10-129-90-75.us-west-2.compute.internal","provider-id":"aws:///us-west-2c/i-062a83255a79deefe"}

From the logs you have provide, it Seems that Karpenter waited until the node was ready and initialized. Can you show maybe k8s events that show the nodes being prevented from being able to run on the new node?

github-actions[bot] commented 4 weeks ago

This issue has been inactive for 14 days. StaleBot will close this stale issue after 14 more days of inactivity.