Closed JoeSpiral closed 3 days ago
An update, my ec2nodeclass is not ready I that appears to be the root of the issue but I am unsure why.
Type Reason Age From Message
---- ------ ---- ---- -------
Normal AMIsReady 38s karpenter Status condition transitioned, Type: AMIsReady, Status: True -> Unknown, Reason: AwaitingReconciliation, Message: object is awaiting reconciliation
Normal Ready 38s karpenter Status condition transitioned, Type: Ready, Status: False -> Unknown, Reason: UnhealthyDependents, Message: InstanceProfileReady=Unknown, SecurityGroupsReady=Unknown, SubnetsReady=Unkn
own, AMIsReady=Unknown
Normal SecurityGroupsReady 38s karpenter Status condition transitioned, Type: SecurityGroupsReady, Status: False -> Unknown, Reason: AwaitingReconciliation, Message: object is awaiting reconciliation
Normal SubnetsReady 38s karpenter Status condition transitioned, Type: SubnetsReady, Status: True -> Unknown, Reason: AwaitingReconciliation, Message: object is awaiting reconciliation
Normal AMIsReady 38s karpenter Status condition transitioned, Type: AMIsReady, Status: Unknown -> True, Reason: AMIsReady
Normal InstanceProfileReady 38s karpenter Status condition transitioned, Type: InstanceProfileReady, Status: Unknown -> True, Reason: InstanceProfileReady
Normal Ready 38s karpenter Status condition transitioned, Type: Ready, Status: Unknown -> False, Reason: UnhealthyDependents, Message: SecurityGroupsReady=False
Normal SecurityGroupsReady 38s karpenter Status condition transitioned, Type: SecurityGroupsReady, Status: Unknown -> False, Reason: SecurityGroupsNotFound, Message: SecurityGroupSelector did not match any SecurityGroups
Normal SubnetsReady 38s karpenter Status condition transitioned, Type: SubnetsReady, Status: Unknown -> True, Reason: SubnetsReady
This is your issue:
Normal SecurityGroupsReady 38s karpenter Status condition transitioned, Type: SecurityGroupsReady, Status: Unknown -> False, Reason: SecurityGroupsNotFound, Message: SecurityGroupSelector did not match any SecurityGroups
I can see you're looking for this sg:
securityGroupSelectorTerms: tags: karpenter.sh/discovery: us-east-2-newcluster-prod1
Did you add these tags to a SG in the same VPC?
I'm not sure how I missed that but you are 100% correct. Thanks!!
I have a similar case, however ec2nodeclass has SubnetsReady=True
, SecurityGroupsReady=True
and InstanceProfileReady=True
however AMIsReady
is in AwaitingReconciliation
I tried various amiFamily
options with AL2, AL2023. tried even specifying the image id.
eg:
amiFamily: AL2023
amiSelectorTerms:
- alias: al2023@v20241121
Also tried just the alias with no amiFamily
amiSelectorTerms:
- alias: al2023@latest
Any thoughts on what could be the issue? Also once created I can't delete an ec2nodeclass even if there are no references to this.
My Helm, EKS versions are [karpenter 1.0.8, EKS 1.31]
@irfn You can't delete the ec2nc because you probably have nodeclaims. Once you delete all the nodeclaims, you'll be able to remove the ec2nc.
Pay attention, you can't delete any nodeclaims as long as the Karpenter deployments are not up.
Maybe that reconciliation error is because the same ec2nc is trying to use 2 different AMI's?
Try deleting all the nodeclaims and the ec2nc, once they're down, re-raise with the AL2 (That worked for me with the exact same settings), this is my ec2nc yaml (Created using Terraform):
---
apiVersion: karpenter.k8s.aws/v1
kind: EC2NodeClass
metadata:
name: karpenter
namespace: kube-system
spec:
amiFamily: AL2 # Amazon Linux 2
role: "KarpenterNodeRole-${CLUSTER_NAME}" # replace with your cluster name
subnetSelectorTerms:
- tags:
Environment: test
Tier: private
securityGroupSelectorTerms:
- tags:
"aws:eks:cluster-name": ${CLUSTER_NAME}
amiSelectorTerms:
- name: amazon-eks-node-1.31-*
blockDeviceMappings:
- deviceName: /dev/xvda
ebs:
volumeSize: 100Gi
volumeType: gp3
iops: 10000
encrypted: false
deleteOnTermination: true
tags:
env: test
Name: ${CLUSTER_NAME}-karpenter
@keoren3 Like I mentioned, there are no NodeClaims or any references to this. I also tried the name pattern amiSelectorTerms as well as direct amiId refs
Here is one example of ec2nodeclasses that I tried with similar name pattern. the pattern is verified via
aws ec2 describe-images --image-ids ami-00516539f0211c275
etc.
apiVersion: karpenter.k8s.aws/v1
kind: EC2NodeClass
metadata:
name: karpenter-nc2-al2023
spec:
instanceProfile: test-karpenter-node-instance-profile
amiFamily: AL2023
subnetSelectorTerms:
- tags:
karpenter.sh/discovery: test-eks
securityGroupSelectorTerms:
- tags:
karpenter.sh/discovery: test-eks
amiSelectorTerms:
- name: amazon-eks-node-al2023-x86_64-standard-1.31-*
- name: amazon-eks-node-al2023-arm64-standard-1.31-*
blockDeviceMappings:
- deviceName: /dev/xvda
ebs:
volumeSize: 30Gi
volumeType: gp3
encrypted: false
deleteOnTermination: true
metadataOptions:
httpEndpoint: enabled
httpProtocolIPv6: disabled
httpPutResponseHopLimit: 2
httpTokens: optional
Try deleting all the nodeclaims and the ec2nc,
Unable to do this as I don't have any NodeClaims and cannot delete ec2nodeclasses
this is fixed and was an issue in my Terraform code and missed the IAM Role on Image Describe.
Description
Observed Behavior: I have a new EKS 1.31 cluster with Karpenter 1.0.5 install via the Terraform eks-blueprints module. I have similar clusters up and running successfully that were installed the same way but they are k8s 1.29 and Karpenter 1.0.1. New nodes are not being created when scale up is needed. Below is the log.
Expected Behavior: New nodes would be spun up.
Reproduction Steps (Please include YAML): aws-auth
apiVersion: karpenter.k8s.aws/v1 kind: EC2NodeClass metadata: namespace: karpenter finalizers:
apiVersion: karpenter.sh/v1 kind: NodePool metadata: labels: app.kubernetes.io/instance: karpenter-spiral-siaas-prod1-blue app.kubernetes.io/managed-by: Helm name: default namespace: karpenter spec: disruption: budgets:
Versions:
kubectl version
): 1.31