Closed morancj closed 3 years ago
I note Creating and managing clusters advises use of the --zones flag
There should be an equivalent top level availabilityZones: []
field, and/or other nodeGroup.availabilityZones: []
/managedNodeGroup.availabilityZones: []
fields in the config file, could you try those?
Thanks. I found that, and setting the AZ's there worked around this issue. I thought I'd added it to my massive missive above; apparently not, apologies! Here it is (appended .txt for GitHub). cluster.yaml.txt
apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig
availabilityZones:
- us-east-1a
- us-east-1d
metadata:
name: test-cluster
region: us-east-1
nodeGroups:
- name: ng-1
instanceType: t3a.large
desiredCapacity: 3
- name: ng-2
instanceType: t3a.large
desiredCapacity: 2
okay so just to tldr, the problem is that even if you specify the AZs in the config (knowing which ones have space), eksctl will create still create things wherever it wants?
The "problem" is that unless told otherwise, eksctl can select an AZ which is unable to fulfil the user's request.
IMO, either eksctl should check in advance if the AZ can fulfil the request, or, if that's not possible, avoid the us-east-1e AZ as it is often unable to provision these resources.
🤔 I don't think we want eksctl to have opinions like that. Once we start checking for capacity in one thing, we open ourselves up to making decisions on capacity of everything else. Not to mention the whole mess of API calls that would require. Eksctl is designed on the premise that 'you know what you have, you tell us what to use'.
If https://github.com/weaveworks/eksctl/issues/118#issuecomment-406597480 were implemented, the UX would be much improved.
This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.
This issue was closed because it has been stalled for 5 days with no activity.
I received this error:
2021-10-26 18:53:38 [✖] AWS::EKS::Cluster/ControlPlane: CREATE_FAILED – "Cannot create cluster 'dev' because us-east-1e, the targeted availability zone, does not currently have sufficient capacity to support the cluster. Retry and choose from these availability zones: us-east-1a, us-east-1b, us-east-1c, us-east-1d, us-east-1f (Service: AmazonEKS; Status Code: 400; Error Code: UnsupportedAvailabilityZoneException; Request ID: [ . . . REDACTED . . . ]; Proxy: null)"
2021-10-26 18:53:38 [!] 1 error(s) occurred and cluster hasn't been created
I used the existing VPC example as a base, (See: https://github.com/weaveworks/eksctl/blob/main/examples/04-existing-vpc.yaml).
When I added the top-level availabilityZones it gave me this error:
Error: vpc.subnets and availabilityZones cannot be set at the same time
@oxr463 thank you for asking. That error is intended. The availabilityZones
setting is for ensuring new VPC resources are created where you want them to be. If you already have a VPC and are therefore setting subnets
which have already been created in AZs, then there is nothing for the availabilityZones
setting to do. Eksctl therefore errors rather than quietly ignoring that config.
Similar to #905? Is there any workaround for this? I am just looking for a reliable, scriptable way to create a cluster in a given region, using whatever AZs are on offer. Note that I am using a YAML config.
cc @Himangini and the rest of the team since I am no longer working on this project
I just ran into this issue today. I had to delete the CFN stack separately
What were you trying to accomplish? Create a new simple cluster in us-east-1
What happened?
eksctl create
fails with EC2Resource creation cancelled
& EKS withCannot create cluster 'test-cluster' because us-east-1e, the targeted availability zone, does not currently have sufficient capacity to support the cluster. Retry and choose from these availability zones: us-east-1a, us-east-1b, us-east-1c, us-east-1d, us-east-1f
:and then instructs me to delete the cluster.
I note Creating and managing clusters advises use of the
--zones
flag, but this doesn't work with-f
:Other related information
I've been advised to use these resource quota limits:
Many regions do not have the ability to request increases for some services, such as EC2 Classic Elastic IPs. Per AWS' docs, Using Service Quotas request templates
source
There is also a limit of one request template, meaning the maximum number of regions for which one can request these 5 limits be increased is two, unless one submits further individual limit increase requests.
eu-west-1 is the only European region which allows requesting a limit increase for EC2 Classic IP's, further limiting region choice.
These all compound the effect of this issue. There are 5 closed issues around this AZ. Skipping us-east-1e (or better, handling insufficient capacity for any AZ) would be helpful.
How to reproduce it?
Start with a basic
cluster.yaml
like this:eksctl
will use random AZ's. If it chooses us-east-1e, creation will fail. In that case, usingeksctl create cluster --dry-run -f cluster.yaml > test-cluster.yaml
will generate a ClusterConfig like this:test-cluster.yaml
```yaml apiVersion: eksctl.io/v1alpha5 availabilityZones: - us-east-1e - us-east-1c iam: vpcResourceControllerPolicy: true withOIDC: false kind: ClusterConfig metadata: name: test-cluster region: us-east-1 version: "1.19" nodeGroups: - amiFamily: AmazonLinux2 desiredCapacity: 3 disableIMDSv1: false disablePodIMDS: false iam: withAddonPolicies: albIngress: false appMesh: null appMeshPreview: null autoScaler: false certManager: false cloudWatch: false ebs: false efs: false externalDNS: false fsx: false imageBuilder: false xRay: false instanceSelector: {} instanceType: t3a.large labels: alpha.eksctl.io/cluster-name: test-cluster alpha.eksctl.io/nodegroup-name: ng-1 name: ng-1 privateNetworking: false securityGroups: withLocal: true withShared: true ssh: allow: false volumeIOPS: 3000 volumeSize: 80 volumeThroughput: 125 volumeType: gp3 - amiFamily: AmazonLinux2 desiredCapacity: 2 disableIMDSv1: false disablePodIMDS: false iam: withAddonPolicies: albIngress: false appMesh: null appMeshPreview: null autoScaler: false certManager: false cloudWatch: false ebs: false efs: false externalDNS: false fsx: false imageBuilder: false xRay: false instanceSelector: {} instanceType: t3a.large labels: alpha.eksctl.io/cluster-name: test-cluster alpha.eksctl.io/nodegroup-name: ng-2 name: ng-2 privateNetworking: false securityGroups: withLocal: true withShared: true ssh: allow: false volumeIOPS: 3000 volumeSize: 80 volumeThroughput: 125 volumeType: gp3 privateCluster: enabled: false vpc: autoAllocateIPv6: false cidr: 192.168.0.0/16 clusterEndpoints: privateAccess: false publicAccess: true manageSharedNodeSecurityGroupRules: true nat: gateway: Single ```If I modify
test-cluster.yaml
to replaceus-east-1
,eksctl create cluster -f test-cluster.yaml
succeeds. I realise this might not happen every time. :slightly_smiling_face:Logs
Failure logs
```shell ➜ eksctl create cluster -f cluster.yaml 2021-06-07 14:57:11 [ℹ] eksctl version 0.51.0 2021-06-07 14:57:11 [ℹ] using region us-east-1 2021-06-07 14:57:12 [ℹ] setting availability zones to [us-east-1e us-east-1d] 2021-06-07 14:57:12 [ℹ] subnets for us-east-1e - public:192.168.0.0/19 private:192.168.64.0/19 2021-06-07 14:57:12 [ℹ] subnets for us-east-1d - public:192.168.32.0/19 private:192.168.96.0/19 2021-06-07 14:57:12 [ℹ] nodegroup "ng-1" will use "ami-0ef0c69399dbb5f3f" [AmazonLinux2/1.19] 2021-06-07 14:57:12 [ℹ] nodegroup "ng-2" will use "ami-0ef0c69399dbb5f3f" [AmazonLinux2/1.19] 2021-06-07 14:57:12 [ℹ] using Kubernetes version 1.19 2021-06-07 14:57:12 [ℹ] creating EKS cluster "test-cluster" in "us-east-1" region with un-managed nodes 2021-06-07 14:57:12 [ℹ] 2 nodegroups (ng-1, ng-2) were included (based on the include/exclude rules) 2021-06-07 14:57:12 [ℹ] will create a CloudFormation stack for cluster itself and 2 nodegroup stack(s) 2021-06-07 14:57:12 [ℹ] will create a CloudFormation stack for cluster itself and 0 managed nodegroup stack(s) 2021-06-07 14:57:12 [ℹ] if you encounter any issues, check CloudFormation console or try 'eksctl utils describe-stacks --region=us-east-1 --cluster=test-cluster' 2021-06-07 14:57:12 [ℹ] CloudWatch logging will not be enabled for cluster "test-cluster" in "us-east-1" 2021-06-07 14:57:12 [ℹ] you can enable it with 'eksctl utils update-cluster-logging --enable-types={SPECIFY-YOUR-LOG-TYPES-HERE (e.g. all)} --region=us-east-1 --cluster=test-cluster' 2021-06-07 14:57:12 [ℹ] Kubernetes API endpoint access will use default of {publicAccess=true, privateAccess=false} for cluster "test-cluster" in "us-east-1" 2021-06-07 14:57:12 [ℹ] 2 sequential tasks: { create cluster control plane "test-cluster", 3 sequential sub-tasks: { wait for control plane to become ready, create addons, 2 parallel sub-tasks: { create nodegroup "ng-1", create nodegroup "ng-2" } } } 2021-06-07 14:57:12 [ℹ] building cluster stack "eksctl-test-cluster-cluster" 2021-06-07 14:57:13 [ℹ] deploying stack "eksctl-test-cluster-cluster" 2021-06-07 14:57:43 [ℹ] waiting for CloudFormation stack "eksctl-test-cluster-cluster" 2021-06-07 14:58:14 [ℹ] waiting for CloudFormation stack "eksctl-test-cluster-cluster" 2021-06-07 14:59:14 [ℹ] waiting for CloudFormation stack "eksctl-test-cluster-cluster" 2021-06-07 14:59:14 [✖] unexpected status "ROLLBACK_IN_PROGRESS" while waiting for CloudFormation stack "eksctl-test-cluster-cluster" 2021-06-07 14:59:14 [ℹ] fetching stack events in attempt to troubleshoot the root cause of the failure 2021-06-07 14:59:15 [!] AWS::EC2::Subnet/SubnetPublicUSEAST1D: DELETE_IN_PROGRESS 2021-06-07 14:59:15 [!] AWS::EC2::RouteTable/PrivateRouteTableUSEAST1D: DELETE_IN_PROGRESS 2021-06-07 14:59:15 [!] AWS::EC2::Subnet/SubnetPrivateUSEAST1D: DELETE_IN_PROGRESS 2021-06-07 14:59:15 [!] AWS::EC2::RouteTable/PublicRouteTable: DELETE_IN_PROGRESS 2021-06-07 14:59:15 [!] AWS::EC2::VPCGatewayAttachment/VPCGatewayAttachment: DELETE_IN_PROGRESS 2021-06-07 14:59:15 [!] AWS::EC2::Subnet/SubnetPrivateUSEAST1E: DELETE_IN_PROGRESS 2021-06-07 14:59:15 [!] AWS::EC2::RouteTable/PrivateRouteTableUSEAST1E: DELETE_IN_PROGRESS 2021-06-07 14:59:15 [!] AWS::EC2::SecurityGroup/ClusterSharedNodeSecurityGroup: DELETE_IN_PROGRESS 2021-06-07 14:59:15 [!] AWS::IAM::Role/ServiceRole: DELETE_IN_PROGRESS 2021-06-07 14:59:15 [!] AWS::EC2::SecurityGroup/ControlPlaneSecurityGroup: DELETE_IN_PROGRESS 2021-06-07 14:59:15 [!] AWS::EC2::Route/PublicSubnetRoute: DELETE_IN_PROGRESS 2021-06-07 14:59:15 [!] AWS::EC2::SubnetRouteTableAssociation/RouteTableAssociationPublicUSEAST1D: DELETE_IN_PROGRESS 2021-06-07 14:59:15 [!] AWS::EC2::NatGateway/NATGateway: DELETE_IN_PROGRESS 2021-06-07 14:59:15 [!] AWS::EC2::SubnetRouteTableAssociation/RouteTableAssociationPrivateUSEAST1E: DELETE_IN_PROGRESS 2021-06-07 14:59:15 [!] AWS::EC2::SubnetRouteTableAssociation/RouteTableAssociationPublicUSEAST1E: DELETE_IN_PROGRESS 2021-06-07 14:59:15 [!] AWS::EC2::SubnetRouteTableAssociation/RouteTableAssociationPrivateUSEAST1D: DELETE_IN_PROGRESS 2021-06-07 14:59:15 [!] AWS::IAM::Policy/PolicyELBPermissions: DELETE_IN_PROGRESS 2021-06-07 14:59:15 [!] AWS::IAM::Policy/PolicyCloudWatchMetrics: DELETE_IN_PROGRESS 2021-06-07 14:59:15 [!] AWS::EC2::SecurityGroupIngress/IngressInterNodeGroupSG: DELETE_IN_PROGRESS 2021-06-07 14:59:15 [✖] AWS::EC2::SubnetRouteTableAssociation/RouteTableAssociationPrivateUSEAST1E: CREATE_FAILED – "Resource creation cancelled" 2021-06-07 14:59:15 [✖] AWS::EC2::SubnetRouteTableAssociation/RouteTableAssociationPrivateUSEAST1D: CREATE_FAILED – "Resource creation cancelled" 2021-06-07 14:59:15 [✖] AWS::EC2::SubnetRouteTableAssociation/RouteTableAssociationPublicUSEAST1E: CREATE_FAILED – "Resource creation cancelled" 2021-06-07 14:59:15 [✖] AWS::EC2::NatGateway/NATGateway: CREATE_FAILED – "Resource creation cancelled" 2021-06-07 14:59:15 [✖] AWS::EC2::Route/PublicSubnetRoute: CREATE_FAILED – "Resource creation cancelled" 2021-06-07 14:59:15 [✖] AWS::EC2::SubnetRouteTableAssociation/RouteTableAssociationPublicUSEAST1D: CREATE_FAILED – "Resource creation cancelled" 2021-06-07 14:59:15 [✖] AWS::EKS::Cluster/ControlPlane: CREATE_FAILED – "Cannot create cluster 'test-cluster' because us-east-1e, the targeted availability zone, does not currently have sufficient capacity to support the cluster. Retry and choose from these availability zones: us-east-1a, us-east-1b, us-east-1c, us-east-1d, us-east-1f (Service: AmazonEKS; Status Code: 400; Error Code: UnsupportedAvailabilityZoneException; Request ID: 1215c702-8018-4c6a-b922-b2dafe7249d4; Proxy: null)" 2021-06-07 14:59:15 [!] 1 error(s) occurred and cluster hasn't been created properly, you may wish to check CloudFormation console 2021-06-07 14:59:15 [ℹ] to cleanup resources, run 'eksctl delete cluster --region=us-east-1 --name=test-cluster' 2021-06-07 14:59:15 [✖] ResourceNotReady: failed waiting for successful resource state Error: failed to create cluster "test-cluster" ```Anything else we need to know?
What OS are you using? Ubuntu 20.04.1 LTS:
/etc/os-release
``` ➜ cat /etc/os-release NAME="Ubuntu" VERSION="20.04.1 LTS (Focal Fossa)" ID=ubuntu ID_LIKE=debian PRETTY_NAME="Ubuntu 20.04.1 LTS (fossa-charmander X68)" VERSION_ID="20.04" HOME_URL="https://www.ubuntu.com/" SUPPORT_URL="https://help.ubuntu.com/" BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/" PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy" VERSION_CODENAME=focal UBUNTU_CODENAME=focal ```Are you using a downloaded binary or did you compile eksctl?
What type of AWS credentials are you using (i.e. default/named profile, MFA)? - please don't include actual credentials though!
Versions