aws / karpenter-provider-aws

Karpenter is a Kubernetes Node Autoscaler built for flexibility, performance, and simplicity.
https://karpenter.sh
Apache License 2.0
6.8k stars 957 forks source link

NodeClaims stranded even after NodePool deletion #6905

Open k24dizzle opened 2 months ago

k24dizzle commented 2 months ago

Description

Observed Behavior:

Screenshot 2024-08-30 at 7 15 41 PM

Expected Behavior:

Reproduction Steps (Please include YAML):

Versions:

engedaam commented 2 months ago

Are the underlaying nodes deleted?

k24dizzle commented 2 months ago

Are the underlaying nodes deleted?

omni eks-node-viewer --resources cpu --extra-labels karpenter.sh/nodepool --node-sort karpenter.sh/nodepool

Screenshot 2024-09-02 at 11 06 04 AM

The nodes still exist, but stuck in Deleting.

engedaam commented 2 months ago

Are there any pods that maybe stuck deleting on those nodes?

hamishforbes commented 2 months ago

I'm running into this same issue upgrading to Karpenter 1.0. I haven't deleted my nodepool but it has been changed to the the v1 custom resource.

So it looks like karpenter struggling because the owner reference on the nodeclaim is for the v1beta versions of the nodepool?

> k get nodeclaims -o custom-columns='APIVER:.apiVersion,NAME:.metadata.name,OWNER_API_VER:.metadata.ownerReferences[0].apiVersion,OWNERKIND:.metadata.ownerReferences[0].kind'
APIVER            NAME            OWNER_API_VER          OWNERKIND
karpenter.sh/v1   default-6nd8t   karpenter.sh/v1beta1   NodePool
karpenter.sh/v1   default-7bxsr   karpenter.sh/v1beta1   NodePool
karpenter.sh/v1   default-9chj5   karpenter.sh/v1        NodePool
karpenter.sh/v1   default-ct54l   karpenter.sh/v1        NodePool
karpenter.sh/v1   default-d6xv7   karpenter.sh/v1beta1   NodePool
karpenter.sh/v1   default-d98d8   karpenter.sh/v1beta1   NodePool
karpenter.sh/v1   default-d9kpt   karpenter.sh/v1beta1   NodePool
karpenter.sh/v1   default-gpjpg   karpenter.sh/v1beta1   NodePool
karpenter.sh/v1   default-j5fdd   karpenter.sh/v1beta1   NodePool
karpenter.sh/v1   default-j6wxw   karpenter.sh/v1beta1   NodePool
karpenter.sh/v1   default-jr5qf   karpenter.sh/v1beta1   NodePool
karpenter.sh/v1   default-l2fml   karpenter.sh/v1beta1   NodePool
karpenter.sh/v1   default-lk8bd   karpenter.sh/v1beta1   NodePool
karpenter.sh/v1   default-m2r7j   karpenter.sh/v1beta1   NodePool
karpenter.sh/v1   default-m5vdt   karpenter.sh/v1beta1   NodePool
karpenter.sh/v1   default-m7mtb   karpenter.sh/v1        NodePool
karpenter.sh/v1   default-mk6r6   karpenter.sh/v1        NodePool
karpenter.sh/v1   default-mnndr   karpenter.sh/v1        NodePool
karpenter.sh/v1   default-mz62l   karpenter.sh/v1beta1   NodePool
karpenter.sh/v1   default-pkdl4   karpenter.sh/v1beta1   NodePool
karpenter.sh/v1   default-ptfm8   karpenter.sh/v1        NodePool
karpenter.sh/v1   default-s9df7   karpenter.sh/v1beta1   NodePool
karpenter.sh/v1   default-ttzk8   karpenter.sh/v1beta1   NodePool
karpenter.sh/v1   default-wfpbj   karpenter.sh/v1        NodePool
karpenter.sh/v1   default-wwr7z   karpenter.sh/v1        NodePool
karpenter.sh/v1   default-xphxg   karpenter.sh/v1beta1   NodePool

I've got nodes that are empty and should be terminated dangling around, the ec2 instances still running.

Non-terminated Pods:          (5 in total)
  Namespace                   Name                                         CPU Requests  CPU Limits  Memory Requests  Memory Limits  Age
  ---------                   ----                                         ------------  ----------  ---------------  -------------  ---
  kube-system                 aws-node-termination-handler-zcnfs           10m (0%)      100m (2%)   64Mi (0%)        64Mi (0%)      3d15h
  kube-system                 aws-node-zlkl9                               50m (1%)      0 (0%)      0 (0%)           0 (0%)         3d15h
  kube-system                 istio-cni-node-jljp6                         50m (1%)      200m (5%)   200Mi (2%)       400Mi (5%)     3d15h
  kube-system                 kube-proxy-pfzxq                             100m (2%)     0 (0%)      0 (0%)           0 (0%)         3d15h
  prometheus                  prometheus-prometheus-node-exporter-mtfdf    50m (1%)      150m (3%)   32Mi (0%)        32Mi (0%)      3d15h

Events:
  Type     Reason                 Age                    From       Message
  ----     ------                 ----                   ----       -------
  Normal   Unconsolidatable       30m (x321 over 3d15h)  karpenter  SpotToSpotConsolidation is disabled, can't replace a spot node with a spot node
  Normal   DisruptionBlocked      26m                    karpenter  Cannot disrupt Node: not all pods would schedule, knative-eventing/kafka-broker-receiver-746fb7d66f-679g6 => would schedule against uninitialized nodeclaim/default-mk6r6
  Normal   DisruptionTerminating  22m                    karpenter  Disrupting Node: Underutilized/Delete
  Warning  FailedDraining         22m                    karpenter  Failed to drain node, 15 pods are waiting to be evicted
  Normal   DisruptionBlocked      14m (x5 over 22m)      karpenter  Cannot disrupt Node: state node is marked for deletion
  Normal   DisruptionBlocked      2m27s (x6 over 12m)    karpenter  Cannot disrupt Node: state node is marked for deletion
  Normal   DisruptionBlocked      92s                    karpenter  Cannot disrupt Node: state node is marked for deletion
  Normal   DisruptionBlocked      69s                    karpenter  Cannot disrupt Node: state node is marked for deletion

The major problem this is causing me is that these dangling nodes often still have PVs mounted, which is preventing those pods being re-scheduled on new nodes (the old multi-attach EBS controller problem, that i'm upgrading to 1.0 to try and fix...)

Manually terminating the EC2 instance does eventually cause everything to clean up. As does manually updating the apiVersion of the nodepool in the owner reference on the nodeclaim

engedaam commented 2 months ago

@hamishforbes are you seeing the empty nodes being deleted in the logs by karpenter? It also seems like the nodes are not fully empty, no? ''' Warning FailedDraining 22m karpenter Failed to drain node, 15 pods are waiting to be evicted '''

hamishforbes commented 2 months ago

No, that's the problem. Any nodeclaim where the ownerRef is for the old v1beta1 nodepool does not get deleted. Nodes provisioned after the 1.0 upgrade with an ownerRef for the v1 nodepool are fine.

Yes it says that 20 minutes ago there were 15 pods draining. But as you can see there are only daemonset pods left on that node now. If I fix the nodeclaim ownerRef then Karpenter immediately terminates the EC2 instance and cleans up.

k24dizzle commented 2 months ago

Are there any pods that maybe stuck deleting on those nodes?

No, just some running daemonsets.

Screenshot 2024-09-02 at 10 48 14 PM

I'm experiencing this in clusters where only Karpenter 1.0 exists, so don't think its related to upgrading:

% kubectl get nodeclaims -o custom-columns='APIVER:.apiVersion,NAME:.metadata.name,OWNER_API_VER:.metadata.ownerReferences[0].apiVersion,OWNERKIND:.metadata.ownerReferences[0].kind'
APIVER            NAME          OWNER_API_VER     OWNERKIND
karpenter.sh/v1   infra-4swmq   karpenter.sh/v1   NodePool
karpenter.sh/v1   infra-gp9d4   karpenter.sh/v1   NodePool
karpenter.sh/v1   infra-vxcpn   karpenter.sh/v1   NodePool
k24dizzle commented 2 months ago

I think it has something to do with the karpenter.sh/termination finalizer for the node claim, is there a way I can manually trigger the finalizer to run again (even after the node pool is deleted?). It isn't too clear to me what the finalizer is doing/if it is rerunning and retrying. I don't see any signal in the logs that it is continuing to run, which makes me think it is stuck.

engedaam commented 2 months ago

This issue seems like a duplicate of https://github.com/kubernetes-sigs/karpenter/issues/1578. Could we track the investigation on that issue? It would make it better to keep track

k24dizzle commented 2 months ago

I think it's slightly different. I've experienced this issue:

engedaam commented 2 months ago

@k24dizzle What version of karpenter are you running?

k24dizzle commented 2 months ago

v1.0.0

aquam8 commented 2 months ago

I have the same problem as the author.

I was running v0.37.2, with webhook.enabled. I upgraded to 1.0.1 (for CRD and app), updated IAM policy but kept my manifests for nodepool and ec2nc as v1beta1. All was well at that stage.

The next step was about updating my manifests for nodepool and ec2nc to reference the v1 CRD so i can leverage the new budget's reasons. But as soon as i updated the manifest to v1 and applied the changes i have encountered issues.

Karpenter logs:

{"level":"ERROR","time":"2024-09-09T06:49:17.087Z","logger":"controller","message":"failed listing instance types for mixed-1","commit":"62a726c","controller":"disruption","namespace":"","name":"","reconcileID":"583102f1-95d3-48f1-99b0-0a76cb430d69","error":"resolving node class, ec2nodeclasses.karpenter.k8s.aws \"mx51-eks\" is terminating, treating as not found
"}
{"level":"ERROR","time":"2024-09-09T06:49:18.369Z","logger":"controller","message":"nodePool not ready","commit":"62a726c","controller":"provisioner","namespace":"","name":"","reconcileID":"9bceb5be-935b-4d2d-b587-8154a3b8e17e","NodePool":{"name":"mixed-1"}}
{"level":"INFO","time":"2024-09-09T06:49:18.369Z","logger":"controller","message":"no nodepools found","commit":"62a726c","controller":"provisioner","namespace":"","name":"","reconcileID":"9bceb5be-935b-4d2d-b587-8154a3b8e17e"}

The nodepool fails with Failed resolving NodeClass.

The nodeclass fails with WaitingOnNodeClaimTermination - Waiting on NodeClaim termination for mixed-1-q25pf, mixed-1-dv8cp

k get nodeclaims -o custom-columns='APIVER:.apiVersion,NAME:.metadata.name,OWNER_API_VER:.metadata.ownerReferences[0].apiVersion,OWNERKIND:.metadata.ownerReferences[0].kind'
APIVER            NAME            OWNER_API_VER          OWNERKIND
karpenter.sh/v1   mixed-1-dv8cp   karpenter.sh/v1beta1   NodePool
karpenter.sh/v1   mixed-1-q25pf   karpenter.sh/v1beta1   NodePool

Recovery from there is painful and hit-and-miss I can't get a new node to register until i kill all existing nodes, or remove the finalizer karpenter.sh/termination on the nodeclaims/nodes. Sometimes i have had to re-add the ec2nc for everything to get going again. But of course this is highly disruptive and not suitable for PROD upgrade.

The way i update/apply the manifest is through IaC terraform through this:

resource "kubectl_manifest" "karpenter_node_pool_ondemand_1" {
  yaml_body          = <<-YAML
    apiVersion: karpenter.sh/v1
    kind: NodePool
    metadata:
      name: ondemand-1
    spec:
       # ...

where NodePool apiVersion is changed from karpenter.sh/v1beta1 to karpenter.sh/v1. Same for the EC2NodeClass. No other changes. I can try to split the changes so that i only do it for NodePool or EC2NodeClass - not both if you think that's helpful for troubleshooting.

I appreciate any assistance on how to address that last leg of the upgrade.

hontarenko commented 1 week ago

Any updates?

sergii-auctane commented 1 day ago

This issue seems like a duplicate of kubernetes-sigs/karpenter#1578. Could we track the investigation on that issue? It would make it better to keep track

It's nothing like that issue. I'm using version 1.0.6 and having tons of empty nodes suck in

  Warning  FailedDraining     7m30s (x4116 over 6d3h)  karpenter  Failed to drain node, 9 pods are waiting to be evicted
  Normal   DisruptionBlocked  2m16s (x7379 over 10d)   karpenter  Cannot disrupt Node: state node is marked for deletion

And those 9 pods are DaemonSets. I observe this issue with non-default node pools only. I run consolidation nightly and the default nodepool consolidates, but not another one.