Closed midestefanis closed 2 weeks ago
I have exactly the same bug
Could you please share your nodeclass and nodepool configurations? As well as any other steps to reproduce?
i have same bug
share your nodeclass and nodepool configurations? As well as any other steps t
seems that for me too.
@rschalo We've also experienced this issue. After upgrading to 1.0.1/1.0.2 (+ patching the CRDs to enable conversion webhooks), everything is fine. But once we apply a new nodeclass/nodepool (or even update an existing one), it stopped working and every nodeclaim is showing status of unknown and that error. We've even tried applying both v1beta1 and v1 manifests, to no avail. We had to downgrade back to v0.33.0.
More information: We see the EC2 instances spinning up, and their system logs show that kubelet has started, but they can't join the cluster (we couldn't find a reason for that).
We started suspecting there's something wrong with the AmiSelectorTerms, but we couldn't figure it out. We tried using id (for an AL2 AMI), and then switched to alias (with AL2023), but it made no difference.
Can someone provide the kubelet logs from a node that fails to register?
One thing to note (not sure if this is the issue) but on 0.37+ version of Karpenter there is a new readiness check on EC2NodeClass CRD. Was this updated?
Karpenter now adds a readiness status condition to the EC2NodeClass. Make sure to upgrade your Custom Resource Definitions before proceeding with the upgrade. Failure to do so will result in Karpenter being unable to provision new nodes.
https://karpenter.sh/v1.0/upgrading/upgrade-guide/#upgrading-to-0370
I see this when I am trying to migrate from Cluster Autoscaler. (Almost a fresh installation)
Event Message for Node:
Events: Type Reason Age From Message
Normal DisruptionBlocked 4m32s (x211 over 7h4m) karpenter Cannot disrupt Node: state node doesn't contain both a node and a nodeclaim
The below is the log from the karpenter:
{"level":"ERROR","time":"2024-09-27T20:19:23.141Z","logger":"webhook.ConversionWebhook","message":"Reconcile error","commit":"688ea21","knative.dev/traceid":"b844441e-37e2-4c12-bdd3-8b3395383977","knative.dev/key":"nodeclaims.karpenter.sh","duration":"167.207687ms","error":"failed to update webhook: Operation cannot be fulfilled on customresourcedefinitions.apiextensions.k8s.io \"nodeclaims.karpenter.sh\": the object has been modified; please apply your changes to the latest version and try again"}
I see that the karpenter pods are up and running without any issue! I tried to patch and update the crd's but no luck
In our case the problem was with missing toleration for taint as described here: https://github.com/kubernetes-sigs/aws-ebs-csi-driver/issues/2158
UPDATE: First of all, this message is not an error. It is part of the lifecycle and every nodeclaim will have this event until the node it is assigned to is registered to the cluster.
We figured out what went wrong for us: Our cluster is still using aws-auth ConfigMap and we missed updating the role name there. This is why Karpenter was able to create the EC2 instance, but the instance wasn't able to join the cluster. We renamed the role as part of some other task and forgot about it when we upgraded Karpenter.
@midestefanis can you confirm if the issue that @roi-zentner ran into is relevant to your issue? If not then are you able to share other info about how to reproduce the issue you're seeing?
I have right aws-auth and still get Normal DisruptionBlocked 2m37s (x716 over 23h) karpenter Cannot disrupt Node: state node doesn't contain both a node and a nodeclaim and this node is not even managed by Karpenter
I have the same issue.
same issue, I get this error on nodes NOT managed by karpenter
Hi All, I've attempted to reproduce with a fresh install of v1.0.2
on K8S 1.30
and am not encountering this issue. For people that do see this, could you please share the Karpenter logs, nodepool, and ec2nodeclass definitions used that resulted in this behavior? Without reproduction it will be hard to determine if this is a bug or part of the normal lifecycle of nodeclaims. In my quick test, I set consolidateAfter
for a nodepool to 20s
and saw:
Normal DisruptionBlocked 62s karpenter Cannot disrupt NodeClaim: state node doesn't contain both a node and a nodeclaim
On the nodeclaim while it was waiting for the node to spin up. Are node objects and instances being created for these nodeclaims that have this?
Also, we used to emit events for non-managed nodes but that was addressed in https://github.com/kubernetes-sigs/karpenter/pull/1644 which has been merged to main.
I think the above log is a red herring to the issue. I agree we should change our eventing messages here to be more descriptive of what's actually happening, rather than describing internal karpenter state.
https://github.com/kubernetes-sigs/karpenter/blob/main/pkg/controllers/state/statenode.go#L177-L185
I just ran into this. In my case the node claim would appear and the instance would be provisioned but remain in unknown status, never getting the node info or joining the cluster. The issue for me was the tag I picked to use on the subnetSelectorTerms. I used kubernetes.io/cluster/clustername: shared and as soon as I changed that selector to a different tag the node joins the cluster.
In my case the node comes up and joins the cluster but the nodeclaim remains in unknown status. This happens quite often (not always, some nodes work fine). Relates to #6803.
I'm using loki to store the kubernetes event log, with these queries:
{app="eventrouter"} | json | line_format "{{.event_reason}}: {{.event_message}} ({{.event_metadata_name}})" |= "mynodepool-wz4vx" | verb = "ADDED"
{app="karpenter"}|="mynodepool-wz4vx"
I get:
2024-10-24 15:10:10.332 Nominated: Pod should schedule on: nodeclaim/mynodepool-wz4vx, node/i-0573b18a64d7a4ea5.eu-west-1.compute.internal (overprovisioning-755d56c54-4bkvw.18016592298fad4f)
2024-10-24 15:01:02.188 Nominated: Pod should schedule on: nodeclaim/mynodepool-wz4vx, node/i-0573b18a64d7a4ea5.eu-west-1.compute.internal (overprovisioning-755d56c54-578ct.18016512893be628)
2024-10-24 14:41:43.075 Nominated: Pod should schedule on: nodeclaim/mynodepool-wz4vx, node/i-0573b18a64d7a4ea5.eu-west-1.compute.internal (overprovisioning-755d56c54-4fp2p.18016404a8f8b6db)
2024-10-24 14:25:39.477 Nominated: Pod should schedule on: nodeclaim/mynodepool-wz4vx, node/i-0573b18a64d7a4ea5.eu-west-1.compute.internal (overprovisioning-755d56c54-669n7.180163244e3ed23d)
2024-10-24 13:57:25.965 Nominated: Pod should schedule on: nodeclaim/mynodepool-wz4vx, node/i-0573b18a64d7a4ea5.eu-west-1.compute.internal (overprovisioning-755d56c54-znbkz.1801619a01386d52)
2024-10-24 13:42:55.374 DisruptionBlocked: Cannot disrupt NodeClaim: state node isn't initialized (mynodepool-wz4vx.180160cf4dc0fc94)
2024-10-24 13:42:44.714 Nominated: Pod should schedule on: nodeclaim/mynodepool-wz4vx, node/i-0573b18a64d7a4ea5.eu-west-1.compute.internal (overprovisioning-755d56c54-vgw7j.180160ccd2866116)
2024-10-24 13:41:03.322 Registered: Status condition transitioned, Type: Registered, Status: Unknown -> True, Reason: Registered (mynodepool-wz4vx.180160b536a368e0)
2024-10-24 13:41:03.236 {"level":"INFO","time":"2024-10-24T11:41:03.235Z","logger":"controller","message":"registered nodeclaim","commit":"6174c75","controller":"nodeclaim.lifecycle","controllerGroup":"karpenter.sh","controllerKind":"NodeClaim","NodeClaim":{"name":"mynodepool-wz4vx"},"namespace":"","name":"mynodepool-wz4vx","reconcileID":"8c5b8d81-72b3-4bf1-88d1-bf21d6d7ae68","provider-id":"aws:///eu-west-1c/i-0573b18a64d7a4ea5","Node":{"name":"i-0573b18a64d7a4ea5.eu-west-1.compute.internal"}}
2024-10-24 13:40:46.880 DisruptionBlocked: Cannot disrupt NodeClaim: state node doesn't contain both a node and a nodeclaim (mynodepool-wz4vx.180160b162ed10a5)
2024-10-24 13:40:37.654 Launched: Status condition transitioned, Type: Launched, Status: Unknown -> True, Reason: Launched (mynodepool-wz4vx.180160af3c5d8d0d)
2024-10-24 13:40:37.568 {"level":"INFO","time":"2024-10-24T11:40:37.568Z","logger":"controller","message":"launched nodeclaim","commit":"6174c75","controller":"nodeclaim.lifecycle","controllerGroup":"karpenter.sh","controllerKind":"NodeClaim","NodeClaim":{"name":"mynodepool-wz4vx"},"namespace":"","name":"mynodepool-wz4vx","reconcileID":"8caaaf02-d0d1-4601-aef9-dbcaaf0dd99b","provider-id":"aws:///eu-west-1c/i-0573b18a64d7a4ea5","instance-type":"g5.2xlarge","zone":"eu-west-1c","capacity-type":"on-demand","allocatable":{"cpu":"7910m","ephemeral-storage":"403926258176","memory":"29317Mi","nvidia.com/gpu":"1","pods":"58","vpc.amazonaws.com/pod-eni":"17"}}
2024-10-24 13:40:35.873 Nominated: Pod should schedule on: nodeclaim/mynodepool-wz4vx (overprovisioning-755d56c54-vgw7j.180160aed18aa393)
2024-10-24 13:40:35.843 {"level":"INFO","time":"2024-10-24T11:40:35.843Z","logger":"controller","message":"created nodeclaim","commit":"6174c75","controller":"provisioner","namespace":"","name":"","reconcileID":"ad1171e5-67d1-4e54-929d-f9aa6ec5795d","NodePool":{"name":"mynodepool"},"NodeClaim":{"name":"mynodepool-wz4vx"},"requests":{"cpu":"210m","memory":"20720Mi","nvidia.com/gpu":"1","pods":"9"},"instance-types":"g5.2xlarge"}
Note that I had issues with the conversion webhook being broken, so removed it from the CRDs, but now I get:
{"level":"ERROR","time":"2024-10-24T11:56:13.458Z","logger":"webhook.ConversionWebhook","message":"Reconcile error","commit":"6174c75","knative.dev/traceid":"44702ea0-bdce-48bc-93ab-8d2d8b1a4d02","knative.dev/key":"nodepools.karpenter.sh","duration":"260.002µs","error":"custom resource \"nodepools.karpenter.sh\" isn't configured for webhook conversion"}
EDIT oh no I might have found the issue in the nodeclaim:
status:
- lastTransitionTime: "2024-10-24T11:41:18Z"
message: Resource "nvidia.com/gpu" was requested but not registered
reason: ResourceNotRegistered
status: Unknown
type: Initialized
- lastTransitionTime: "2024-10-24T11:41:03Z"
message: Initialized=Unknown
reason: UnhealthyDependents
status: Unknown
type: Ready
I'm using Bottlerocket OS 1.25.0 (aws-k8s-1.31-nvidia)
Hi All,
This log line is part of the normal lifecycle of nodeclaim disruption, we've adjusted the log line to be more clear in https://github.com/kubernetes-sigs/karpenter/pull/1644 and https://github.com/kubernetes-sigs/karpenter/pull/1766. If there is other behavior that is being observed that may be incorrect then please open a new issue.
I have the same issue node is created but not joining the cluster
Message: Cannot disrupt Node: state node doesn't contain both a node and a nodeclaim
I have the same issue node is created but not joining the cluster
Message: Cannot disrupt Node: state node doesn't contain both a node and a nodeclaim
I'm having the same issue node is created but not joining the cluster and forever stuck in Unknown state Karpenter helm chart: 1.0.7 Kubernetes version: v1.31.0-eks-a737599 Logs
{"level":"INFO","time":"2024-11-04T15:13:30.311Z","logger":"controller","message":"Starting workers","commit":"901a5dc","controller":"nodepool.readiness","controllerGroup":"karpenter.sh","controllerKind":"NodePool","worker count":10}
{"level":"INFO","time":"2024-11-04T15:13:30.311Z","logger":"controller","message":"Starting workers","commit":"901a5dc","controller":"nodeclaim.disruption","controllerGroup":"karpenter.sh","controllerKind":"NodeClaim","worker count":10}
{"level":"INFO","time":"2024-11-04T15:13:30.311Z","logger":"controller","message":"Starting workers","commit":"901a5dc","controller":"status","controllerGroup":"karpenter.k8s.aws","controllerKind":"EC2NodeClass","worker count":10}
{"level":"INFO","time":"2024-11-04T15:13:30.311Z","logger":"controller","message":"Starting workers","commit":"901a5dc","controller":"nodeclass.hash","controllerGroup":"karpenter.k8s.aws","controllerKind":"EC2NodeClass","worker count":10}
{"level":"INFO","time":"2024-11-04T15:13:30.610Z","logger":"controller","message":"discovered ssm parameter","commit":"901a5dc","controller":"nodeclass.status","controllerGroup":"karpenter.k8s.aws","controllerKind":"EC2NodeClass","EC2NodeClass":{"name":"default"},"namespace":"","name":"default","reconcileID":"95358f1e-4cad-4b64-bf6f-c89472c89cb7","parameter":"/aws/service/eks/optimized-ami/1.31/amazon-linux-2023/arm64/standard/recommended/image_id","value":"ami-0ae1e07e02f98b306"}
{"level":"INFO","time":"2024-11-04T15:13:30.636Z","logger":"controller","message":"discovered ssm parameter","commit":"901a5dc","controller":"nodeclass.status","controllerGroup":"karpenter.k8s.aws","controllerKind":"EC2NodeClass","EC2NodeClass":{"name":"default"},"namespace":"","name":"default","reconcileID":"95358f1e-4cad-4b64-bf6f-c89472c89cb7","parameter":"/aws/service/eks/optimized-ami/1.31/amazon-linux-2023/x86_64/standard/recommended/image_id","value":"ami-0be82d98bb3e7f36c"}
{"level":"INFO","time":"2024-11-04T15:13:30.670Z","logger":"controller","message":"discovered ssm parameter","commit":"901a5dc","controller":"nodeclass.status","controllerGroup":"karpenter.k8s.aws","controllerKind":"EC2NodeClass","EC2NodeClass":{"name":"default"},"namespace":"","name":"default","reconcileID":"95358f1e-4cad-4b64-bf6f-c89472c89cb7","parameter":"/aws/service/eks/optimized-ami/1.31/amazon-linux-2023/x86_64/nvidia/recommended/image_id","value":"ami-0ed2f679097182d7a"}
{"level":"INFO","time":"2024-11-04T15:13:30.694Z","logger":"controller","message":"discovered ssm parameter","commit":"901a5dc","controller":"nodeclass.status","controllerGroup":"karpenter.k8s.aws","controllerKind":"EC2NodeClass","EC2NodeClass":{"name":"default"},"namespace":"","name":"default","reconcileID":"95358f1e-4cad-4b64-bf6f-c89472c89cb7","parameter":"/aws/service/eks/optimized-ami/1.31/amazon-linux-2023/x86_64/neuron/recommended/image_id","value":"ami-0c42cc0277c8e37ac"}
{"level":"INFO","time":"2024-11-04T15:16:39.902Z","logger":"controller","message":"found provisionable pod(s)","commit":"901a5dc","controller":"provisioner","namespace":"","name":"","reconcileID":"b8f66e83-5fe0-403e-8a92-8ec5c7d2117d","Pods":"worker-port-scan-stage2/worker-port-scan-stage2-679564f457-5pqch","duration":"46.212393ms"}
{"level":"INFO","time":"2024-11-04T15:16:39.902Z","logger":"controller","message":"computed new nodeclaim(s) to fit pod(s)","commit":"901a5dc","controller":"provisioner","namespace":"","name":"","reconcileID":"b8f66e83-5fe0-403e-8a92-8ec5c7d2117d","nodeclaims":1,"pods":1}
{"level":"INFO","time":"2024-11-04T15:16:39.917Z","logger":"controller","message":"created nodeclaim","commit":"901a5dc","controller":"provisioner","namespace":"","name":"","reconcileID":"b8f66e83-5fe0-403e-8a92-8ec5c7d2117d","NodePool":{"name":"spot"},"NodeClaim":{"name":"spot-5q86d"},"requests":{"cpu":"680m","memory":"785Mi","pods":"8"},"instance-types":"c5.xlarge, c5a.2xlarge, c6a.2xlarge, c6i.xlarge, c7i-flex.xlarge and 7 other(s)"}
{"level":"INFO","time":"2024-11-04T15:16:43.397Z","logger":"controller","message":"launched nodeclaim","commit":"901a5dc","controller":"nodeclaim.lifecycle","controllerGroup":"karpenter.sh","controllerKind":"NodeClaim","NodeClaim":{"name":"spot-5q86d"},"namespace":"","name":"spot-5q86d","reconcileID":"70b8ee7c-e93d-4d6f-989f-844ee57c3c73","provider-id":"aws:///ap-southeast-1c/i-0665d2caaa8e38f76","instance-type":"c5.xlarge","zone":"ap-southeast-1c","capacity-type":"spot","allocatable":{"cpu":"3920m","ephemeral-storage":"35Gi","memory":"6584Mi","pods":"58","vpc.amazonaws.com/pod-eni":"18"}}
{"level":"INFO","time":"2024-11-04T15:17:03.759Z","logger":"controller","message":"deleted nodeclaim","commit":"901a5dc","controller":"nodeclaim.termination","controllerGroup":"karpenter.sh","controllerKind":"NodeClaim","NodeClaim":{"name":"spot-kngcp"},"namespace":"","name":"spot-kngcp","reconcileID":"08090e8d-9157-4e71-a7a9-77a950fe68e1","Node":{"name":""},"provider-id":"aws:///ap-southeast-1b/i-0865258af1c615c6f"}
{"level":"INFO","time":"2024-11-04T15:17:44.539Z","logger":"controller","message":"deleted nodeclaim","commit":"901a5dc","controller":"nodeclaim.termination","controllerGroup":"karpenter.sh","controllerKind":"NodeClaim","NodeClaim":{"name":"spot-fkcd6"},"namespace":"","name":"spot-fkcd6","reconcileID":"8f517525-5166-4d6b-8fe4-cf9858a0dc6b","Node":{"name":""},"provider-id":"aws:///ap-southeast-1c/i-031a02558cc13b16a"}
{"level":"INFO","time":"2024-11-04T15:31:49.255Z","logger":"controller","message":"found provisionable pod(s)","commit":"901a5dc","controller":"provisioner","namespace":"","name":"","reconcileID":"08e5d957-23e7-4de8-b2cf-2108b51ac98b","Pods":"microservice-public-web-manage/microservice-public-web-manage-5549b5b56f-xrk9w, microservice-admin-dashboard/microservice-admin-dashboard-cbf6cb68d-h6zhm, worker-cyberbay-scan-stage-complete/worker-cyberbay-scan-stage-complete-7746f99f85-smk8v, worker-auto-unlock-bug-report/worker-auto-unlock-bug-report-c4477f77d-bbc4c, keda/keda-metrics-apiserver-c5b6b66c-lsgsx and 2 other(s)","duration":"26.295328ms"}
{"level":"INFO","time":"2024-11-04T15:31:49.255Z","logger":"controller","message":"computed new nodeclaim(s) to fit pod(s)","commit":"901a5dc","controller":"provisioner","namespace":"","name":"","reconcileID":"08e5d957-23e7-4de8-b2cf-2108b51ac98b","nodeclaims":1,"pods":7}
{"level":"INFO","time":"2024-11-04T15:31:49.268Z","logger":"controller","message":"created nodeclaim","commit":"901a5dc","controller":"provisioner","namespace":"","name":"","reconcileID":"08e5d957-23e7-4de8-b2cf-2108b51ac98b","NodePool":{"name":"spot"},"NodeClaim":{"name":"spot-2nm9f"},"requests":{"cpu":"1780m","memory":"1981Mi","pods":"14"},"instance-types":"c5.xlarge, c5a.2xlarge, c6a.2xlarge, c6i.xlarge, c7i-flex.xlarge and 6 other(s)"}
{"level":"INFO","time":"2024-11-04T15:31:52.384Z","logger":"controller","message":"launched nodeclaim","commit":"901a5dc","controller":"nodeclaim.lifecycle","controllerGroup":"karpenter.sh","controllerKind":"NodeClaim","NodeClaim":{"name":"spot-2nm9f"},"namespace":"","name":"spot-2nm9f","reconcileID":"797858ad-e717-4641-9673-ae97537038fa","provider-id":"aws:///ap-southeast-1c/i-0bc5178ae30d55bca","instance-type":"c5.xlarge","zone":"ap-southeast-1c","capacity-type":"spot","allocatable":{"cpu":"3920m","ephemeral-storage":"35Gi","memory":"6584Mi","pods":"58","vpc.amazonaws.com/pod-eni":"18"}}
{"level":"INFO","time":"2024-11-04T15:33:02.965Z","logger":"controller","message":"deleted nodeclaim","commit":"901a5dc","controller":"nodeclaim.termination","controllerGroup":"karpenter.sh","controllerKind":"NodeClaim","NodeClaim":{"name":"spot-5q86d"},"namespace":"","name":"spot-5q86d","reconcileID":"5ef88cce-50cd-4d4c-8014-ed005543fbba","Node":{"name":""},"provider-id":"aws:///ap-southeast-1c/i-0665d2caaa8e38f76"}
Nodeclaim
kg nodeclaim
NAME TYPE CAPACITY ZONE NODE READY AGE
spot-2nm9f c5.xlarge spot ap-southeast-1c Unknown 3m14s
k describe nodeclaim spot-2nm9f
Name: spot-2nm9f
Namespace:
Labels: karpenter.k8s.aws/instance-category=c
karpenter.k8s.aws/instance-cpu=4
karpenter.k8s.aws/instance-cpu-manufacturer=intel
karpenter.k8s.aws/instance-ebs-bandwidth=4750
karpenter.k8s.aws/instance-encryption-in-transit-supported=false
karpenter.k8s.aws/instance-family=c5
karpenter.k8s.aws/instance-generation=5
karpenter.k8s.aws/instance-hypervisor=nitro
karpenter.k8s.aws/instance-memory=8192
karpenter.k8s.aws/instance-network-bandwidth=1250
karpenter.k8s.aws/instance-size=xlarge
karpenter.sh/capacity-type=spot
karpenter.sh/nodepool=spot
kubernetes.io/arch=amd64
kubernetes.io/os=linux
node.kubernetes.io/instance-type=c5.xlarge
topology.k8s.aws/zone-id=apse1-az3
topology.kubernetes.io/region=ap-southeast-1
topology.kubernetes.io/zone=ap-southeast-1c
Annotations: compatibility.karpenter.k8s.aws/kubelet-drift-hash: 15379597991425564585
karpenter.k8s.aws/ec2nodeclass-hash: 17935570713262261599
karpenter.k8s.aws/ec2nodeclass-hash-version: v3
karpenter.sh/nodepool-hash: 6821555240594823858
karpenter.sh/nodepool-hash-version: v3
karpenter.sh/stored-version-migrated: true
API Version: karpenter.sh/v1
Kind: NodeClaim
Metadata:
Creation Timestamp: 2024-11-04T15:31:49Z
Finalizers:
karpenter.sh/termination
Generate Name: spot-
Generation: 1
Owner References:
API Version: karpenter.sh/v1
Block Owner Deletion: true
Kind: NodePool
Name: spot
UID: f7479647-2be2-4c33-88eb-0261218ad48f
Resource Version: 60672567
UID: 84b5c89d-9192-475a-96b3-3a1931748888
Spec:
Expire After: 720h
Node Class Ref:
Group: karpenter.k8s.aws
Kind: EC2NodeClass
Name: default
Requirements:
Key: kubernetes.io/os
Operator: In
Values:
linux
Key: node.kubernetes.io/instance-type
Operator: In
Values:
c5.xlarge
c5a.2xlarge
c6a.2xlarge
c6i.xlarge
c7i-flex.xlarge
m5.xlarge
m6a.xlarge
r5.xlarge
r6i.xlarge
t2.2xlarge
t3.2xlarge
Key: karpenter.sh/nodepool
Operator: In
Values:
spot
Key: karpenter.sh/capacity-type
Operator: In
Values:
spot
Key: kubernetes.io/arch
Operator: In
Values:
amd64
Resources:
Requests:
Cpu: 1780m
Memory: 1981Mi
Pods: 14
Status:
Allocatable:
Cpu: 3920m
Ephemeral - Storage: 35Gi
Memory: 6584Mi
Pods: 58
vpc.amazonaws.com/pod-eni: 18
Capacity:
Cpu: 4
Ephemeral - Storage: 40Gi
Memory: 7577Mi
Pods: 58
vpc.amazonaws.com/pod-eni: 18
Conditions:
Last Transition Time: 2024-11-04T15:31:52Z
Message: Node not registered with cluster
Reason: NodeNotFound
Status: Unknown
Type: Initialized
Last Transition Time: 2024-11-04T15:31:52Z
Message:
Reason: Launched
Status: True
Type: Launched
Last Transition Time: 2024-11-04T15:31:52Z
Message: Initialized=Unknown, Registered=Unknown
Reason: UnhealthyDependents
Status: Unknown
Type: Ready
Last Transition Time: 2024-11-04T15:31:52Z
Message: Node not registered with cluster
Reason: NodeNotFound
Status: Unknown
Type: Registered
Image ID: ami-0be82d98bb3e7f36c
Provider ID: aws:///ap-southeast-1c/i-0bc5178ae30d55bca
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Launched 3m28s karpenter Status condition transitioned, Type: Launched, Status: Unknown -> True, Reason: Launched
Normal DisruptionBlocked 86s (x2 over 3m26s) karpenter Cannot disrupt NodeClaim: state node doesn't contain both a node and a nodeclaim
I fixed it using pod identity. I assigned the karpenter role to the karpenter service account. And it fixed it.
Is there any update on this issue ?
@bshre12 have you assigned a role to your karpenter service account?
Description
Observed Behavior:
Karpenter is not spinning up nodes
Expected Behavior:
New nodes
Reproduction Steps (Please include YAML):
Versions: 1.0.2
kubectl version
): 1.30Karpenter logs:
Node Claims are showing this: