NodeClaim stuck in 'Unknown' (Cannot disrupt NodeClaim: state node doesn't contain both a node and a nodeclaim)

Description

Observed Behavior: Karpenter raises a new EC2, but it doesn't connect to the EKS - Instead it's stuck in status 'Unknown': "Cannot disrupt NodeClaim: state node doesn't contain both a node and a nodeclaim"

Expected Behavior: The node is added to the EKS cluster

Reproduction Steps (Please include YAML): EC2NodeClass + Nodepool

---
apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
  name: karpenter
  namespace: kube-system
spec:
  template:
    spec:
      requirements:
        - key: kubernetes.io/arch
          operator: In
          values: ["amd64"]
        - key: kubernetes.io/os
          operator: In
          values: ["linux"]
        - key: karpenter.sh/capacity-type
          operator: In
          values: ["spot"]
        - key: karpenter.k8s.aws/instance-category
          operator: In
          values: ["c", "m", "r"]
        - key: karpenter.k8s.aws/instance-generation
          operator: Gt
          values: ["2"]
      nodeClassRef:
        group: karpenter.k8s.aws
        kind: EC2NodeClass
        name: karpenter
      expireAfter: 720h  # 30 * 24h = 720h
  limits:
    cpu: 1000
  disruption:
    consolidationPolicy: WhenEmptyOrUnderutilized
    consolidateAfter: 1m
---
apiVersion: karpenter.k8s.aws/v1
kind: EC2NodeClass
metadata:
  name: karpenter
  namespace: kube-system
spec:
  amiFamily: AL2  # Amazon Linux 2
  role: "KarpenterNodeRole-<cluster>"  # replace with your cluster name
  subnetSelectorTerms:
    - tags:
        Environment: test
        Tier: public
  securityGroupSelectorTerms:
    - tags:
        "aws:eks:cluster-name": <cluster>
  amiSelectorTerms:
    - id: ami-00710ab8f493e2428
  blockDeviceMappings:
    - deviceName: /dev/xvda
      ebs:
        volumeSize: 100Gi
        volumeType: gp3
        iops: 10000
        encrypted: false
        deleteOnTermination: true
        snapshotID: snap-0ec4fd6705eea533e
  tags:
    env: test
    Name: balance-test-karpenter

Extra info: I'm trying to replace my Auto-Scaler with Karpenter. I gave the nodes the exact same:

IAM role.
Security-Group.
EBS volume is based on the same snapshot.
Exact same AMI.

I've added the required roles to aws-auth:

- groups:
  - system:nodes
  - system:bootstrappers
  rolearn: arn:aws:iam::<account_id>:role/KarpenterNodeRole-<cluster>
  username: system:node:{{EC2PrivateDNSName}}

I've entered the EKS worker node and ran: journalctl -u kubelet but no entry was added there.

I tried changing the roles name, tried adding permissions to the role, tried adding permissions to the SG, Nothing, the nodes just refuse to connect.

Karpenter logs:

{"level":"INFO","time":"2024-11-25T15:53:54.818Z","logger":"controller","message":"found provisionable pod(s)","commit":"a2875e3","controller":"provisioner","namespace":"","name":"","reconcileID":"6703d662-e9b1-4f99-9c55-e72f0aaa6b7e","Pods":"over-provisioning/over-provisioning-6d568b6cf8-7tqjr, over-provisioning/over-provisioning-6d568b6cf8-8p8bf","duration":"181.987056ms"}
{"level":"INFO","time":"2024-11-25T15:53:54.818Z","logger":"controller","message":"computed new nodeclaim(s) to fit pod(s)","commit":"a2875e3","controller":"provisioner","namespace":"","name":"","reconcileID":"6703d662-e9b1-4f99-9c55-e72f0aaa6b7e","nodeclaims":1,"pods":1}
{"level":"INFO","time":"2024-11-25T15:53:54.819Z","logger":"controller","message":"computed 1 unready node(s) will fit 1 pod(s)","commit":"a2875e3","controller":"provisioner","namespace":"","name":"","reconcileID":"6703d662-e9b1-4f99-9c55-e72f0aaa6b7e"}
{"level":"INFO","time":"2024-11-25T15:53:54.842Z","logger":"controller","message":"created nodeclaim","commit":"a2875e3","controller":"provisioner","namespace":"","name":"","reconcileID":"6703d662-e9b1-4f99-9c55-e72f0aaa6b7e","NodePool":{"name":"karpenter"},"NodeClaim":{"name":"karpenter-wxbnk"},"requests":{"cpu":"1780m","memory":"2418Mi","pods":"6"},"instance-types":"c4.large, c5.large, c5.xlarge, c5a.large, c5a.xlarge and 55 other(s)"}
{"level":"INFO","time":"2024-11-25T15:53:58.329Z","logger":"controller","message":"launched nodeclaim","commit":"a2875e3","controller":"nodeclaim.lifecycle","controllerGroup":"karpenter.sh","controllerKind":"NodeClaim","NodeClaim":{"name":"karpenter-wxbnk"},"namespace":"","name":"karpenter-wxbnk","reconcileID":"01c8fd77-f4cb-4572-a600-333737c2caeb","provider-id":"aws:///us-east-2b/i-052e9c4c5d91c8767","instance-type":"c7i-flex.large","zone":"us-east-2b","capacity-type":"spot","allocatable":{"cpu":"1930m","ephemeral-storage":"89Gi","memory":"3114Mi","pods":"29"}}

(No errors AFAIK)

The auto-scaler still works as expected though - I raise the deployment replicas to 1, and everything works as expected.

Is this a bug? Or am I missing anything.

I've looked at all the other topics about this, all the solutions are "Oh I missed some tag" - I've checked all the tags again and again, it's not the issue.

Any help would be great.

Versions:

Chart Version: 1.0.8
Kubernetes Version (kubectl version): 1.31.3 (EKS)
Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
If you are interested in working on this issue or have submitted a pull request, please leave a comment

aws / karpenter-provider-aws

NodeClaim stuck in 'Unknown' (Cannot disrupt NodeClaim: state node doesn't contain both a node and a nodeclaim) #7435

Description