Closed Izvi-digibank closed 2 years ago
The error you posted generally indicates an issue with Internet connectivity of the Karpenter controller pod. This could be due to the node's connectivity, the pod's connectivity (the IP assigned via the CNI), or DNS (node or pod level).
Can you double check your node's connectivity by curl'ing something from the instance? If that works, you'll want to check the pod's network configuration and try to access the network from the pod itself.
If the node is unable to access Internet resources, you can check the subnet's route table for a proper NAT GW or Internet GW setup. Another thing to check is the outbound security group rules on the node.
RequestError: send request failed\ncaused by: Post \"https://sts.eu-west-1.amazonaws.com/\": dial tcp: i/o timeout"}
@bwagner5 I guess your'e right. I moved now Karpenter controller and webhook to run on a node with the exact same configuration as the node in clusterA. Seem to not having network issues anymore but I get another error:
2022-02-10T21:35:41.273Z ERROR controller.controller.provisioning Reconciler error {"commit": "2346ed5", "reconciler group": "karpenter.sh", "reconciler kind": "Provisioner", "name": "workflows-provisioner", "namespace": "", "error": "fetching instance types using ec2.DescribeInstanceTypes, WebIdentityErr: failed to retrieve credentials\ncaused by: AccessDenied: Not authorized to perform sts:AssumeRoleWithWebIdentity\n\tstatus code: 403, request id: 4e07a374-081d-4466-8b67-6421e6a3022e"}
This is weird because the Node IAM role has AssumeRoleWithWebIdentity and it is also defined in the trust policy (you can see I pasted it above). All other roles and cm are well configured as explain in the question.
@Izvi-digibank is this still an issue? Were you able to figure out what was going on here?
@suket22 Yes, needed to separate KarpenterController into two statements.
@Izvi-digibank Can you post the output here for KarpenterController?
My controller in functioning fine by now hence I closed the issue. All I had to do as I wrote above is to separate the KarpenterController into two different statements (did it through aws console but you can do it by terraform if you use)
I had the same issue. I also resolved it by separating the Trust relationships statement. Thank you @Izvi-digibank
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Federated": "arn:aws:iam::123456789:oidc-provider/oidc.eks.eu-west-1.amazonaws.com/id/CLUSTER1-RANDOM-NUMBER"
},
"Action": "sts:AssumeRoleWithWebIdentity",
"Condition": {
"StringEquals": {
"oidc.eks.eu-west-1.amazonaws.com/id/CLUSTER1-RANDOM-NUMBER:sub": "system:serviceaccount:karpenter:karpenter"
}
}
},
{
"Effect": "Allow",
"Principal": {
"Federated": "arn:aws:iam::123456789:oidc-provider/oidc.eks.eu-west-1.amazonaws.com/id/CLUSTER2-RANDOM-NUMBER"
},
"Action": "sts:AssumeRoleWithWebIdentity",
"Condition": {
"StringEquals": {
"oidc.eks.eu-west-1.amazonaws.com/id/CLUSTER2-RANDOM-NUMBER:sub": "system:serviceaccount:karpenter:karpenter"
}
}
}
]
}
I am using Karpenter for two clusters under the same AWS account. Same roles are being used for both clusters, provisioners are the same (private subnet). aws-auth is configured with the
KarpenternodeRole-cluster
. Cluster a works perfectly, but in cluster b I get the following error:2022-02-10T10:33:35.421Z ERROR controller.controller.provisioning Reconciler error {"commit": "2346ed5", "reconciler group": "karpenter.sh", "reconciler kind": "Provisioner", "name": "workflows-provisioner", "namespace": "", "error": "fetching instance types using ec2.DescribeInstanceTypes, WebIdentityErr: failed to retrieve credentials\ncaused by: RequestError: send request failed\ncaused by: Post \"https://sts.eu-west-1.amazonaws.com/\": dial tcp: i/o timeout"}
Here are some details:
KarpenterController role:
KarpenterController trust relationships:
KarpenterNodeInstanceProfile-clusterB has the following policies:
KarpenterNodeInstanceProfile-clusterB trust relationships:
When trying to retrive node id in clusterB I get
null
: