coiled / feedback

A place to provide Coiled feedback
14 stars 3 forks source link

Trouble setting up Coiled permissions in ESIP AWS #162

Closed phobson closed 1 year ago

phobson commented 2 years ago

via slack

Description

@rsignell-usgs is trying to set up customer-hosted coiled in ESIP AWS (https://aws.amazon.com/blogs/publicsector/earth-science-information-partners-promoting-innovation-for-earth-science-data/). He was previously using coiled-hosted.

He followed our guide (https://docs.coiled.io/user_guide/aws_configure.html) and got this error:

ServerError: Unable to access base docker image '077742499581.dkr.ecr.us-west-2.amazonaws.com/prod/rsignell-odc:4e854839-76e3-43bf-9ab0-cbb40ac15c7e'. Ensure you have a properly configured Container Registry, or get in touch if you need help or think this is a bug.
time="2022-06-13T21:13:04Z" level=fatal msg="Error parsing image name \"docker://077742499581.dkr.ecr.us-west-2.amazonaws.com/prod/rsignell-odc:4e854839-76e3-43bf-9ab0-cbb40ac15c7e\": Error reading manifest 4e854839-76e3-43bf-9ab0-cbb40ac15c7e in 077742499581.dkr.ecr.us-west-2.amazonaws.com/prod/rsignell-odc: denied: User: arn:aws:iam::097532040392:user/coiled is not authorized to perform: ecr:BatchGetImage on resource: arn:aws:ecr:us-west-2:077742499581:repository/prod/rsignell-odc because no resource-based policy allows the ecr:BatchGetImage action"

He reports that when he logged into his coiled account, it appeared everything had completely successfully, but he didn't see anything about setting up ECR. He did remove the setup policy and only kept the "ongoing" policy, as directed.

Any thoughts about what might be happening here?

hayesgb commented 2 years ago

cc: @dchudz

dchudz commented 2 years ago

First (most urgent) question: Is he currently blocked on using Coiled at all?

I assume he can he go back to Coiled-hosted while we work this out, but wanted to be sure.

ntabris commented 2 years ago

I assume he can he go back to Coiled-hosted while we work this out, but wanted to be sure.

How? My impression is that once you switch off Coiled-hosted there's no way to switch back... is that wrong?

ntabris commented 2 years ago

Okay, so this is a slightly painful part of switching account backends.

He created a software environment in Coiled hosted. After switching the AWS credentials, he no longer has credentials which can access that software environment in our AWS account.

Note the different account IDs here:

arn:aws:iam::097532040392:user/coiled is not authorized to perform: ecr:BatchGetImage on resource: arn:aws:ecr:us-west-2:077742499581:repository/prod/rsignell-odc

@rsignell-usgs how much trouble is it to just call create_software_environment again to re-build this software environment in your AWS account?

dchudz commented 2 years ago

How? My impression is that once you switch off Coiled-hosted there's no way to switch back... is that wrong?

It's gone from the UI but he could do it with coiled.set_backend_options. Or we can set it as admins. So if unblocking the switch isn't fast let's do one of those quick so that @rsignell-usgs isn't totally blocked on using Coiled.

gjoseph92 commented 2 years ago

This seems like the typical public-coiled-hosted-software-environment issue. If rebuilding the software environment with v2 is possible, that seems like the best way forward. It's just unfortunate there isn't a clearer error message from Coiled telling you to do this.

rsignell-usgs commented 2 years ago

It's no problem at all to re-run create_software_environment. It's just that I totally forgot that was how I created the environment! :)

I just tried it however, and got the same error:

 raise error_class(parsed_response, operation_name)
botocore.exceptions.ClientError: An error occurred (AccessDeniedException) when calling the CreateRepository operation: User: arn:aws:iam::097532040392:user/coiled is not authorized to perform: ecr:CreateRepository on resource: arn:aws:ecr:us-west-2:097532040392:repository/prod/rsignell-odc with an explicit deny in a service control policy

Do I need a new token?

ntabris commented 2 years ago

The explicit deny in a service control policy part of that probably means that the AWS account you're using explicitly doesn't allow you to make a repository in ECR for that account.

I don't know how locked down your AWS account is, so this may or may not be something you can ask for IAM permissions to do. Alternately, you could store the images somewhere other than ECR in that account.

Ask for more permissions in the AWS account

Specifically, these are the list of permissions we need related to ECR

"ecr:BatchCheckLayerAvailability",
"ecr:BatchGetImage",
"ecr:CompleteLayerUpload",
"ecr:CreateRepository",
"ecr:DescribeImages",
"ecr:DescribeRepositories",
"ecr:GetAuthorizationToken",
"ecr:GetDownloadUrlForLayer",
"ecr:GetRepositoryPolicy",
"ecr:InitiateLayerUpload",
"ecr:ListImages",
"ecr:PutImage",
"ecr:UploadLayerPart"

(from the ongoing policy at https://docs.coiled.io/user_guide/aws_reference.html#iam-policies)

and they'd need to be scoped to apply to at least

arn:aws:ecr:us-west-2:097532040392:repository/prod/rsignell-odc.

Use another container registry

When you setup your AWS account with Coiled, one step of the process asks you for the container registry to use:

image

Instead of using the default, ECR in the same AWS account, you can enter a different registry. For instance, if you have access to (or can create) a DockerHub registry, you can give us a token to access that.

I was hoping there would be a page in our docs that went into more detail about this, but I don't see one, so let us know if you have questions about any of this.

rsignell-usgs commented 2 years ago

Okay, I logged into my coiled account (rsignell) and went to "cloud provider configuration", then switched my AWS key and secret key.

I chose "automatically create VPC" even though I had previously answered this the same way, so wasn't sure if now an additional VPC would be created or whether it would use the existing VPC that Coiled already created.

I previously tried "use existing VPC" and was planning to specify the coiled one, but couldn't figure out how to answer the subnet and other questions.

After saying to create new VPC, I chose ECR and then hit submit. It came back very quickly and said "successfully configured backend" (like suspiciously quickly, in 1s).

I then tried creating my environment, but no go -- I got the same permission problem.

See this notebook: https://nbviewer.org/gist/1821962b2a2f3ddbb642e8b37e35e2f6

phobson commented 2 years ago

Thanks for the info, Rich. We'll get to the bottom of this

ntabris commented 2 years ago

Right, I think you'll either need additional permissions enabled in that AWS account (which I suspect might not be an option, since if whoever manages this account was okay with you having those permissions they probably wouldn't have blocked them in the first place), or you'll need to configure Coiled to use something other than the default ECR registry.

I think @phobson is going to follow up in case a call would help.

rsignell-usgs commented 2 years ago

@ntabris , the credentials I switched to (or tried to switch to -- not sure it "took" because of the immediate declaration of "successfully configured") have AWS admin privileges. So I shouldn't get any more permission problems once it's properly configured.

ntabris commented 2 years ago

I think there are organization-level Security Control Policies (relevant AWS doc) which are taking precedence over your IAM permissions and are prohibiting the creation of ECR repositories. SCP applies even to admin users. I think you'd either need the SCP changed, or use something other than ECR (e.g., use GitHub, or maybe you could use ECR in an AWS account that's not under scope of relevant SCP).

rsignell-usgs commented 2 years ago

I'm testing Coiled out on ESIP's AWS account, not one of the USGS AWS accounts, so there should NOT be Security Control Policies in place.

But in the interest of getting this going, I switched to DockerHub and that seems to be working fine: 2022-06-17_10-44-56

P.S. Wouldn't it be faster to use mamba over conda for these builds?

P.P.S I just realized my environment pinned coiled-runtime=0.0.3 Should I rebuild with coiled-runtime to get 0.0.4 or should I not bother?

ntabris commented 2 years ago

Hm, it's possible there's something else going on that Security Control Policies. Were you still getting the error that says

An error occurred (AccessDeniedException) when calling the CreateRepository operation: User: arn:aws:iam::097532040392:user/coiled is not authorized to perform: ecr:CreateRepository on resource: arn:aws:ecr:us-west-2:097532040392:repository/prod/rsignell-odc with an explicit deny in a service control policy

?

Regardless, glad you got something working with DockerHub.

rsignell-usgs commented 2 years ago

@ntabris - I was wrong -- you were right! There is a SCP in place to force tagging using the IAMUsername and Projectname tags

ntabris commented 2 years ago

Oh, sure would be nice if the SCP error message was more explicit about that (but I can understand why AWS wouldn't say this in error).

I gather that means you should be able to create ECR repo so long as it's tagged appropriately?

We don't currently support custom tagging. We're adding custom tagging for EC2 instances (mostly people want this for cost tracking), but hadn't yet considered the need for this on other resources like ECR repos. That makes sense, and I'll log this as an issue internally.

rsignell-usgs commented 2 years ago

Not sure whether to close this -- I do have it working with Dockerhub...

ntabris commented 2 years ago

Glad you got this working with dockerhub!

How much of a concern is it that you can't currently use ECR in your AWS account? Is it a "don't care", "nice to have", or "I can get a lot more people to use Coiled if you add the feature that would enable creating ECR registries in the ESIP AWS account". Asking so I know how much to push this to get prioritized.

rsignell-usgs commented 2 years ago

It's probably not that important, but it would feel cleaner to be using AWS -- seems messy to also have to create account on Dockerhub.

shughes-uk commented 1 year ago

We have a new build backend that does not use ECR/docker containers. No longer required to fix this