coiled / feedback

A place to provide Coiled feedback
14 stars 3 forks source link

Errors configuring coiled in existing AWS account (us-west-2) #146

Closed scottyhq closed 2 years ago

scottyhq commented 3 years ago

I'm trying to follow these docs to spin up coiled clusters under an existing AWS account: https://docs.coiled.io/user_guide/backends_aws.html#using-your-own-aws-account

I've created a user with sufficient permissions documented here https://github.com/uwhackweek/coiled-deploy

Then under https://cloud.coiled.io/scottyhq/account/update-backend-options I select --> AWS--> Managed container backend (ECS) --> region='us-west-2', 'Launch in my AWS account' and provide the access keys of my 'coiled' IAM user. --> AWS ECR

After my first attempt to set this up I get the following error:

Creating IAM Role scottyhq ...

Error configuring backend: An error occurred (NoSuchEntity) when calling the AttachRolePolicy operation: Policy arn:aws:iam::769926636128:policy/dev-cluster-role-s3-read-policy does not exist or is not attachable.

We couldn't find VPC named dev-us-west-2-vpc. Not doing anything ...

A second attempt gets further by still fails (abbreviated log below, can. provide the full one via email).

Creating IAM Role scottyhq ...
Creating default VPC ...
Creating Default VPC infrastructure ...
Finding next available CIDR ...
CIDR found: 10.22.0.0/16 ...
Creating VPC dev-us-west-2-vpc ...
.
.
.
Creating Public Subnet in us-west-2c ...
Created Public Subnet subnet-098a49cf3607ab232 ...
Allocating an Elastic IP for the VPC ...
Error configuring backend: An error occurred (AddressLimitExceeded) when calling the AllocateAddress operation: The maximum number of addresses has been reached.
.
.
.
Deleting VPC ...
Finished deleting VPC

I also notice that dev-us-west-2-vpc is still in my aws console, so it seems everything was not actually deleted... It would be nice if there were an easy way to ensure deployed by coiled are removed. perhaps an 'uninstall backend resources' button or via a cloudformation template...

FabioRosado commented 3 years ago

Hello @scottyhq thank you for reporting this to us. We have a fix for the NoSuchEntity error and it will be fixed in the next deployment to production, which should happen next week. Currently, the workaround is to attempt a second time as you did and it should work.

Looking at the second attempt logs, it seems your AWS account didn't have enough Elastic IP addresses. We have updated the docs and will deploy these updates next week as well. I wanted to give you some information about the Elastic IP addresses and how many we create.

We create one Elastic IP address per availability zone. Can you confirm if you can create at least 4 more Elastic IP addresses? If not, are you able to request more? Alternatively, you could try our VM backend, which will use a single Elastic IP address.

I would also like to thank you for your suggestion. So far, most of our resources are tagged with the tag owner: coiled and it should be straightforward for us to implement a way to allow users to remove all the resources we create. I will pass the suggestion to the team.

scottyhq commented 3 years ago

We create one Elastic IP address per availability zone. Can you confirm if you can create at least 4 more Elastic IP addresses? If not, are you able to request more? Alternatively, you could try our VM backend, which will use a single Elastic IP address.

I see. According to the AWS docs by default accounts have 5 Elastic IPs per region. My account was already using 3, so deployment failed on the last availability zone.

Alternatively, you could try our VM backend, which will use a single Elastic IP address.

I don't see any docs describing the key differences between ECS versus VM backend. Would be great to document this (elastic IPs as well as other considerations).

most of our resources are tagged with the tag owner: coiled and it should be straightforward for us to implement a way to allow users to remove all the resources we create. I will pass the suggestion to the team.

If you have a recommended approach for a temporary workaround, that would be great, looks like some CLI commands based on https://docs.aws.amazon.com/cli/latest/reference/resource-groups/index.html might do the trick...

FabioRosado commented 3 years ago

Hello @scottyhq apologies for the delay in replying to you. We have released some new features recently, I've also added a note on the docs related to the Elastic IP Addresses as you suggested.

Moving forward we will use the AWS VM backend as the default backend for new users. Can I check with you if you were able to launch a cluster using this backend?

ntabris commented 2 years ago

Closing stale issue, we no longer support ECS (and have made lots of other changes since May 2021).