eksctl-io / eksctl

The official CLI for Amazon EKS
https://eksctl.io
Other
4.9k stars 1.41k forks source link

Avoid starting in bad availability zone for us-east-1 #905

Closed StevenACoffman closed 3 years ago

StevenACoffman commented 5 years ago

Before creating a feature request, please search existing feature requests to see if you find a similar one. If there is a similar feature request please up-vote it and/or add your comments to it instead

Why do you want this feature? Better use experience.

What feature/behavior/change do you want? It is possible to programmatically discover the availability zone in us-east-1 that EKS will fail to launch nodes in. It would be good to just avoid potential failures automatically, or at least to warn people that what they are trying to do will fail. You can list what types of instances are available in an az and the odd-one-out becomes very obvious.

export TYPE=m5.large
aws ec2 describe-reserved-instances-offerings --filters 'Name=scope,Values=Availability Zone' --no-include-marketplace --instance-type $TYPE | jq -r '.ReservedInstancesOfferings[].AvailabilityZone' | sort | uniq
aws ec2 describe-availability-zones --query 'AvailabilityZones[*].{ZoneName: ZoneName, ZoneId: ZoneId}' --region us-east-1
errordeveloper commented 5 years ago

@StevenACoffman we've been trying to explore this as part of #118, and #252 is currently the standing issue for this.

errordeveloper commented 5 years ago

It should be relatively easy to add the logic here:

https://github.com/weaveworks/eksctl/blob/4a44c8a46362cb8d3c662711f8b29d84fb97a18a/pkg/az/az.go#L129-L149

StevenACoffman commented 5 years ago

So the zoneId (unlike zone name) is consistent across accounts, so just avoid use1-az3 ?

errordeveloper commented 5 years ago

I would rather avoid hard-coding that in any case, but let's try to test it and see what works.

mrichman commented 5 years ago

Is there any further interest in this proposal? I've found that using --zones=us-east-1a,us-east-1b,us-east-1d just happens to be the magic incantation of AZs that works reliably for me. However, something dynamic with retry logic would be ideal.

ptran32 commented 4 years ago

Hi @mrichman, did you find a way to use --zones in the yaml file ? Looks like the option is not allowed in the yaml file

I added a "zones" field in the yaml file below. But I've got this error json: unknown field "zones".

https://github.com/weaveworks/eksctl/blob/master/examples/01-simple-cluster.yaml

metadata:
  name: cluster-1
  region: us-east-1
  zones: ["us-east-1a, us-east-1b"]

Thank you for your help.

rdubya16 commented 4 years ago

Would be nice to see this retry feature. I hit this problem once a week spinning up sandbox clusters for testing. Its quite annoying because you have to wait for the stack to rollback, then go manually delete the rolled back stack.

deliahu commented 4 years ago

If I understand correctly, is it true that @StevenACoffman's trick (i.e. aws ec2 describe-reserved-instances-offerings) resolves a different problem, i.e. not being able to create or expand ASGs in an existing cluster? And that this trick would not necessarily resolve the issue @rdubya16 brings up regarding not being able to create the cluster in the first place (which I also am seeing)? Here is my error:

AWS::EKS::Cluster/ControlPlane: CREATE_FAILED – "Cannot create cluster 'my-cluster' because us-east-1e, the targeted availability zone, does not currently have sufficient capacity to support the cluster. Retry and choose from these availability zones: us-east-1a, us-east-1b, us-east-1c, us-east-1d, us-east-1f (Service: AmazonEKS; Status Code: 400; Error Code: UnsupportedAvailabilityZoneException; Request ID: 9783591e-a9f4-4511-b142-fcd8ba0f08a7)"

(I did not specify availability zones when calling eksctl create cluster)

Is there currently a workaround for the cluster creation issue?

cloudkarpe commented 4 years ago

Hi,

I am still facing similar issues.

cloud_user:~/eks $  eksctl version
[ℹ]  version.Info{BuiltAt:"", GitCommit:"", GitTag:"0.11.1"}

cloud_user:~/eks $ cluster.yaml
apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig
metadata:
  name: basic-cluster
  region: us-east-1
nodeGroups:
  - name: ng-1
    instanceType: t2.micro
    desiredCapacity: 2
  - name: ng-2
    instanceType: t2.micro
    desiredCapacity: 2

cloud_user:~/eks $ eksctl create cluster -f cluster.yaml
[ℹ]  eksctl version 0.11.1
[ℹ]  using region us-east-1
[ℹ]  setting availability zones to [us-east-1b us-east-1e]
[ℹ]  subnets for us-east-1b - public:192.168.0.0/19 private:192.168.64.0/19
[ℹ]  subnets for us-east-1e - public:192.168.32.0/19 private:192.168.96.0/19
[ℹ]  nodegroup "ng-1" will use "ami-0392bafc801b7520f" [AmazonLinux2/1.14]
[ℹ]  nodegroup "ng-2" will use "ami-0392bafc801b7520f" [AmazonLinux2/1.14]
[ℹ]  using Kubernetes version 1.14
[ℹ]  creating EKS cluster "basic-cluster" in "us-east-1" region with un-managed nodes
[ℹ]  2 nodegroups (ng-1, ng-2) were included (based on the include/exclude rules)
[ℹ]  will create a CloudFormation stack for cluster itself and 2 nodegroup stack(s)
[ℹ]  will create a CloudFormation stack for cluster itself and 0 managed nodegroup stack(s)
[ℹ]  if you encounter any issues, check CloudFormation console or try 'eksctl utils describe-stacks --region=us-east-1 --cluster=basic-cluster'
[ℹ]  CloudWatch logging will not be enabled for cluster "basic-cluster" in "us-east-1"
[ℹ]  you can enable it with 'eksctl utils update-cluster-logging --region=us-east-1 --cluster=basic-cluster'
[ℹ]  Kubernetes API endpoint access will use default of {publicAccess=true, privateAccess=false} for cluster "basic-cluster" in "us-east-1"
[ℹ]  2 sequential tasks: { create cluster control plane "basic-cluster", 2 parallel sub-tasks: { create nodegroup "ng-1", create nodegroup "ng-2" } }
[ℹ]  building cluster stack "eksctl-basic-cluster-cluster"
[ℹ]  deploying stack "eksctl-basic-cluster-cluster"
[✖]  unexpected status "ROLLBACK_IN_PROGRESS" while waiting for CloudFormation stack "eksctl-basic-cluster-cluster"
[ℹ]  fetching stack events in attempt to troubleshoot the root cause of the failure
[✖]  AWS::EC2::SubnetRouteTableAssociation/RouteTableAssociationPrivateUSEAST1E: CREATE_FAILED – "Resource creation cancelled"
[✖]  AWS::EC2::NatGateway/NATGateway: CREATE_FAILED – "Resource creation cancelled"
[✖]  AWS::EC2::SubnetRouteTableAssociation/RouteTableAssociationPublicUSEAST1B: CREATE_FAILED – "Resource creation cancelled"
[✖]  AWS::EC2::SubnetRouteTableAssociation/RouteTableAssociationPrivateUSEAST1B: CREATE_FAILED – "Resource creation cancelled"
[✖]  AWS::EC2::SubnetRouteTableAssociation/RouteTableAssociationPublicUSEAST1E: CREATE_FAILED – "Resource creation cancelled"
[✖]  AWS::EKS::Cluster/ControlPlane: CREATE_FAILED – "Cannot create cluster 'basic-cluster' because us-east-1e, the targeted availability zone, does not currently have sufficient capacity to support the cluster. Retry and choose from these availability zones: us-east-1a, us-east-1b, us-east-1c, us-east-1d, us-east-1f (Service: AmazonEKS; Status Code: 400; Error Code: UnsupportedAvailabilityZoneException; Request ID: be715977-a421-4ad6-9dba-b4e907ca1ce8)"
[ℹ]  1 error(s) occurred and cluster hasn't been created properly, you may wish to check CloudFormation console
[ℹ]  to cleanup resources, run 'eksctl delete cluster --region=us-east-1 --name=basic-cluster'
[✖]  waiting for CloudFormation stack "eksctl-basic-cluster-cluster": ResourceNotReady: failed waiting for successful resource state
[✖]  failed to create cluster "basic-cluster"
cloud_user:~/eks $

From CloudFormation for "eksctl-basic-cluster-cluster"

2019-12-19 18:12:08 UTC+0800    ControlPlane    CREATE_FAILED    Cannot create cluster 'basic-cluster' because us-east-1e, the targeted availability zone, does not currently have sufficient capacity to support the cluster. Retry and choose from these availability zones: us-east-1a, us-east-1b, us-east-1c, us-east-1d, us-east-1f (Service: AmazonEKS; Status Code: 400; Error Code: UnsupportedAvailabilityZoneException; Request ID: be715977-a421-4ad6-9dba-b4e907ca1ce8)
icarvalho-tmg commented 4 years ago

Hi,

I am facing similar issues. Trying to create a cluster at region us-east-1 but CloudFormation is rolling back due to: "us-east-1e, the targeted availability zone, does not currently have sufficient capacity to support the cluster."

$ aws ec2 describe-reserved-instances-offerings --filters 'Name=scope,Values=Availability Zone' --no-include-marketplace --instance-type m5.large | jq -r '.ReservedInst
ancesOfferings[].AvailabilityZone' | sort | uniq
us-east-1a
us-east-1b
us-east-1c
us-east-1d
us-east-1f

$ eksctl version
0.15.0

....
[ℹ]  using Kubernetes version 1.14
[ℹ]  creating EKS cluster "minimum-cluster" in "us-east-1" region with un-managed nodes
[ℹ]  1 nodegroup (ng-1) was included (based on the include/exclude rules)
[ℹ]  will create a CloudFormation stack for cluster itself and 1 nodegroup stack(s)
[ℹ]  will create a CloudFormation stack for cluster itself and 0 managed nodegroup stack(s)
[ℹ]  if you encounter any issues, check CloudFormation console or try 'eksctl utils describe-stacks --region=us-east-1 --cluster=minimum-cluster'
[ℹ]  CloudWatch logging will not be enabled for cluster "minimum-cluster" in "us-east-1"
[ℹ]  you can enable it with 'eksctl utils update-cluster-logging --region=us-east-1 --cluster=minimum-cluster'
[ℹ]  Kubernetes API endpoint access will use default of {publicAccess=true, privateAccess=false} for cluster "minimum-cluster" in "us-east-1"
[ℹ]  2 sequential tasks: { create cluster control plane "minimum-cluster", create nodegroup "ng-1" }
[ℹ]  building cluster stack "eksctl-minimum-cluster-cluster"
[ℹ]  deploying stack "eksctl-minimum-cluster-cluster"
[✖]  unexpected status "ROLLBACK_IN_PROGRESS" while waiting for CloudFormation stack "eksctl-minimum-cluster-cluster"
[ℹ]  fetching stack events in attempt to troubleshoot the root cause of the failure
[✖]  AWS::EC2::SubnetRouteTableAssociation/RouteTableAssociationPublicUSEAST1F: CREATE_FAILED – "Resource creation cancelled"
[✖]  AWS::EC2::NatGateway/NATGateway: CREATE_FAILED – "Resource creation cancelled"
[✖]  AWS::EC2::SubnetRouteTableAssociation/RouteTableAssociationPublicUSEAST1E: CREATE_FAILED – "Resource creation cancelled"
[✖]  AWS::EC2::SubnetRouteTableAssociation/RouteTableAssociationPrivateUSEAST1F: CREATE_FAILED – "Resource creation cancelled"
[✖]  AWS::EC2::SubnetRouteTableAssociation/RouteTableAssociationPrivateUSEAST1E: CREATE_FAILED – "Resource creation cancelled"
[✖]  AWS::EKS::Cluster/ControlPlane: CREATE_FAILED – "Cannot create cluster 'minimum-cluster' because us-east-1e, the targeted availability zone, does not currently have sufficient capacity to support the cluster. Retry and choose from these availability zones: us-east-1a, us-east-1b, us-east-1c, us-east-1d, us-east-1f (Service: AmazonEKS; Status Code: 400; Error Code: UnsupportedAvailabilityZoneException; Request ID: e941b719-a19d-4861-b39e-e80dbb40d593)"
[ℹ]  1 error(s) occurred and cluster hasn't been created properly, you may wish to check CloudFormation console
[ℹ]  to cleanup resources, run 'eksctl delete cluster --region=us-east-1 --name=minimum-cluster'
[✖]  waiting for CloudFormation stack "eksctl-minimum-cluster-cluster": ResourceNotReady: failed waiting for successful resource state
michaelbeaumont commented 4 years ago

For those who are stuck with setting the AZ in the config, it belongs under the nodegroup:

https://github.com/weaveworks/eksctl/blob/8da038f95ef3b2bd82a96a69933e59a706ada47c/examples/05-advanced-nodegroups.yaml#L59

github-actions[bot] commented 3 years ago

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

github-actions[bot] commented 3 years ago

This issue was closed because it has been stalled for 5 days with no activity.