eksctl-io / eksctl

The official CLI for Amazon EKS
https://eksctl.io
Other
4.93k stars 1.41k forks source link

"Attribute 'Arn' does not exist" error on the controlPlane resource during the stack creation #7806

Open laurentdroin opened 5 months ago

laurentdroin commented 5 months ago

Attempting to create an EKS cluster with eksctl 0.179.0 fails.

It fails during the stack creation:

2024-06-03 10:11:28 [✖]  unexpected status "CREATE_FAILED" while waiting for CloudFormation stack "eksctl-ldr-test-eksctl-june-3-cluster"
2024-06-03 10:11:28 [✖]  unexpected status "CREATE_FAILED" while waiting for CloudFormation stack "eksctl-ldr-test-eksctl-june-3-cluster"
2024-06-03 10:11:28 [ℹ]  fetching stack events in attempt to troubleshoot the root cause of the failure
2024-06-03 10:11:28 [✖]  AWS::CloudFormation::Stack/eksctl-ldr-test-eksctl-june-3-cluster: CREATE_FAILED – "The following resource(s) failed to create: [ControlPlane]. "
2024-06-03 10:11:28 [✖]  AWS::EKS::Cluster/ControlPlane: CREATE_FAILED – "Attribute 'Arn' does not exist"
2024-06-03 10:11:28 [!]  1 error(s) occurred and cluster hasn't been created properly, you may wish to check CloudFormation console
2024-06-03 10:11:28 [ℹ]  to cleanup resources, run 'eksctl delete cluster --region=us-east-1 --name=ldr-test-eksctl-june-3'
2024-06-03 10:11:28 [✖]  ResourceNotReady: failed waiting for successful resource state

This seems to be a timing issue because if the eksctl command is run with the '--cfn-disable-rollback' flag to prevent the stack rollback, I can see that even thought the stack creation failed, the role "ServiceRole" is properly created and the ARN is available. As a matter of fact, if I "retry" the stack, it completes and creates fine.

github-actions[bot] commented 5 months ago

Hello laurentdroin :wave: Thank you for opening an issue in eksctl project. The team will review the issue and aim to respond within 1-5 business days. Meanwhile, please read about the Contribution and Code of Conduct guidelines here. You can find out more information about eksctl on our website

cPu1 commented 5 months ago

@laurentdroin, can you share your ClusterConfig file or CLI args used to create the cluster? Are you able to consistently reproduce this?

laurentdroin commented 5 months ago

@cPu1 - Yes, I can reproduce consistently. Here is the command I used yesterday (I obfuscated the IDs of the private subnet and my email address):

eksctl create cluster --name ldr-test-eksctl-june-3 --region us-east-1 --version 1.29 --vpc-private-subnets subnet-XXX,subnet-XXX --tags Owner=me@address.com,blah=bleh --node-type t3.xlarge -N 3 --node-volume-size 20 --instance-prefix ldr-eksctl-june-3-

github-actions[bot] commented 4 months ago

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

laurentdroin commented 4 months ago

Commenting in order to remove the stale label.

github-actions[bot] commented 3 months ago

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

edmonl commented 3 months ago

I got this too with eksctl 0.188. Not sure if it's relevant, that I updated IAM permissions to the latest Minimum Permissions on the doc site before trying eksctl 0.188.

github-actions[bot] commented 2 months ago

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

cPu1 commented 2 months ago

I got this too with eksctl 0.188. Not sure if it's relevant, that I updated IAM permissions to the latest Minimum Permissions on the doc site before trying eksctl 0.188.

@edmonl, can you share your ClusterConfig file and the output from running the command?

DanielColor commented 1 month ago

I got the same issue with version 0.191. using EC2 with the platform Amazon Linux 2023, and an IAM role is attached.

IAM policy

AmazonEC2ContainerRegistryFullAccess
AmazonEKSClusterPolicy
AmazonEKSFargatePodExecutionRolePolicy
AmazonEKSServicePolicy
AmazonEKSVPCResourceController
AmazonEKSWorkerNodePolicy

Inline Policy

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "Statement1",
            "Effect": "Allow",
            "Action": [
                "eks:*",
                "cloudformation:*",
                "iam:CreateRole",
                "iam:AttachRolePolicy",
                "iam:PassRole",
                "iam:TagRole",
                "iam:DetachRolePolicy",
                "iam:DeleteRole"
            ],
            "Resource": [
                "*"
            ]
        }
    ]
}

My Cluster config yaml

apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig

metadata:
  name: daniel-test
  region: ap-northeast-1
  version: "1.30"  

vpc:
  subnets:
    private:
      ap-northeast-1a:
        id: "subnet-xx"
      ap-northeast-1c:
        id: "subnet-xx"
      ap-northeast-1d:
        id: "subnet-xx"
    public:
      ap-northeast-1a:
        id: "subnet-xx"
      ap-northeast-1c:
        id: "subnet-xx"
      ap-northeast-1d:
        id: "subnet-xx"

fargateProfiles:
  - name: daniel-test-fargate
    selectors:
      - namespace: daniel-test 

And I excuse the command eksctl create cluster -f ./cluste.yaml got this error

2024-10-09 09:19:13 [ℹ]  building cluster stack "eksctl-daniel-test-cluster"
2024-10-09 09:19:13 [ℹ]  deploying stack "eksctl-daniel-test-cluster"
2024-10-09 09:19:43 [ℹ]  waiting for CloudFormation stack "eksctl-daniel-test-cluster"
2024-10-09 09:19:43 [✖]  unexpected status "ROLLBACK_IN_PROGRESS" while waiting for CloudFormation stack "eksctl-daniel-test-cluster"
2024-10-09 09:19:43 [✖]  unexpected status "ROLLBACK_IN_PROGRESS" while waiting for CloudFormation stack "eksctl-daniel-test-cluster"
2024-10-09 09:19:43 [ℹ]  fetching stack events in attempt to troubleshoot the root cause of the failure
2024-10-09 09:19:43 [!]  AWS::EC2::SecurityGroup/ClusterSharedNodeSecurityGroup: DELETE_IN_PROGRESS
2024-10-09 09:19:43 [!]  AWS::EC2::SecurityGroup/ControlPlaneSecurityGroup: DELETE_IN_PROGRESS
2024-10-09 09:19:43 [!]  AWS::IAM::Role/ServiceRole: DELETE_IN_PROGRESS
2024-10-09 09:19:43 [!]  AWS::EC2::SecurityGroupIngress/IngressInterNodeGroupSG: DELETE_IN_PROGRESS
2024-10-09 09:19:43 [!]  AWS::IAM::Role/FargatePodExecutionRole: DELETE_IN_PROGRESS
2024-10-09 09:19:43 [✖]  AWS::EKS::Cluster/ControlPlane: CREATE_FAILED – "Attribute 'Arn' does not exist"
2024-10-09 09:19:43 [!]  1 error(s) occurred and cluster hasn't been created properly, you may wish to check CloudFormation console
2024-10-09 09:19:43 [ℹ]  to cleanup resources, run 'eksctl delete cluster --region=ap-northeast-1 --name=daniel-test'
2024-10-09 09:19:43 [✖]  ResourceNotReady: failed waiting for successful resource state

I try to add the --cfn-disable-rollback flag and retry the Cloud Formation stack , that is work .

Am I missing something ?

Solution

Add the permission to trust relationships like below

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
                "AWS": "arn:aws:iam::${account_id}:role/daniel-test",
                "Service": [
                    "ec2.amazonaws.com",
                    "eks-fargate-pods.amazonaws.com",
                    "eks.amazonaws.com"
                ]
            },
            "Action": "sts:AssumeRole"
        }
    ]
}

recreate the cluster , that's work. Maybe can show more error message detail from console output ? Thanks.

github-actions[bot] commented 2 days ago

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.