aws / aws-cdk

The AWS Cloud Development Kit is a framework for defining cloud infrastructure in code
https://aws.amazon.com/cdk
Apache License 2.0
11.38k stars 3.79k forks source link

(core): docker login to deployment account ECR occurs before asset is built #25894

Open blimmer opened 1 year ago

blimmer commented 1 year ago

Describe the bug

Given a simple Dockerfile that pulls from a private ECR repository in the same account you're deploying to:

ARG AWS_ACCOUNT_NUMBER
ARG AWS_REGION
ARG REPO
ARG TAG

FROM ${AWS_ACCOUNT_NUMBER}.dkr.ecr.${AWS_REGION}.amazonaws.com/${REPO}:${TAG}

With a DockerImageAsset:

import * as cdk from 'aws-cdk-lib';
import { DockerImageAsset } from 'aws-cdk-lib/aws-ecr-assets';
import { Construct } from 'constructs';
import { join } from 'path';

export class CdkBugReportsStack extends cdk.Stack {
  constructor(scope: Construct, id: string, props?: cdk.StackProps) {
    super(scope, id, props);

    new DockerImageAsset(this, 'DockerImageAsset', {
      directory: join(__dirname, '..', 'assets', 'docker'),
      buildArgs: {
        "AWS_ACCOUNT_NUMBER": "123456789012",
        "AWS_REGION": "us-west-2",
        "REPO": "my-repo",
        "TAG": "latest"
      }
    })
  }
}

The cdk deploy will fail with a message that looks like this:

#3 [internal] load metadata for <ACCOUNT>.dkr.ecr.<REGION>.amazonaws.com/<REPO>:<TAG>
#3 ERROR: pulling from host<ACCOUNT>.dkr.ecr.<REGION>.amazonaws.com failed with status code [manifests <TAG>]: 403 Forbidden

So, in other words, the FROM in the Dockerfile cannot be resolved. The reason this happens is because the image publishing role (arn:aws:iam::<ACCOUNT>:role/cdk-hnb659fds-image-publishing-role-<ACCOUNT>-<REGION>) is used to login to docker before the image is built.

Therefore, it overrides the existing docker login you might have already done via:

> aws ecr get-login-password --region us-west-2 | docker login --username AWS --password-stdin <ACCOUNT>.dkr.ecr.<REGION>.amazonaws.com

And then it can't pull using existing credentials you've already set up.

Expected Behavior

I expected the system-level docker login to be respected during image build time so it could resolve the image from the private ECR repo.

I understand that the image-publishing-role needs to be assumed to push to the CDK assets ECR repository, but it feels like those credentials should only be used before calling docker push.

In other words, the flow looks like:

  1. Standard docker login happens as a setup step in my CI platform
  2. DockerImageAsset is built using the credentials from step 1.
  3. Existing docker login is backed up
  4. docker login occurs for image-publishing-role
  5. docker push the built asset
  6. Restore saved docker login credentials from step 3

Current Behavior

What's happening now appears to be:

  1. Standard docker login happens as a setup step in my CI platform
  2. docker login occurs for image-publishing-role
  3. DockerImageAsset is built using the credentials from step 2 (failure because image-publishing-role can't access the private ECR repo.

Reproduction Steps

https://github.com/blimmer/cdk-bug-reports/pull/2 shows an example. You do need to manually push a latest tag to the repo to make it technically correct. However, you should still see the error even with an empty repo (you'll get a 403 error).

Possible Solution

If possible, system docker logins should be used to build the Docker images, not the image-publishing-role.

It might be challenging, however, to back up docker credentials, since there are a few different ways you can store those values.

Additional Information/Context

You can work around this issue by applying a policy to your private repository that allows the image-publishing-role access to the repo.

CDK CLI Version

2.83.0

Framework Version

No response

Node.js Version

18

OS

MacOS

Language

Typescript

Language Version

No response

Other information

No response

peterwoodworth commented 1 year ago

I had to dig for a while to find the answer to this, we should document this. You're right, login occurs before the asset is built. But that's done intentionally, and there's a way to adjust that default. See the comment in the code here https://github.com/aws/aws-cdk/blob/3196cbc8d09c54e634ad54487b88e5ac962909f3/packages/cdk-assets/lib/private/docker.ts#L221-L225

You can configure a file which contains credential information, the CDK expects it to be here https://github.com/aws/aws-cdk/blob/3196cbc8d09c54e634ad54487b88e5ac962909f3/packages/cdk-assets/lib/private/docker-credentials.ts#L25-L28

I didn't know we could have a config file for this, cool! I don't think we document this anywhere though

blimmer commented 1 year ago

Interesting - that's helpful that it exists already. I wonder, though, why doesn't it default to ~/.docker/config.json? That's where the default authentication is stored and I think it might "just work" with that default.

iliapolo commented 1 year ago

@blimmer Were you able to resolve the issue with CDK_DOCKER_CREDS_FILE?

blimmer commented 1 year ago

Hey @iliapolo , I won't have time to check out the suggested workaround for some time due to a few tight deadlines. The workaround I provided in the description (granting the image publishing role access to the ECR repo) unblocked me for now.

I'm still curious to hear the CDK team's response to my question above. Why not default to using system docker credentials vs the publishing role? It feels like the expected behavior is inverted from the reality today.

joshua-haunty commented 6 months ago

It took me awhile to finally find someone with a similar issue to me. Thanks for bringing this issue up. I attempted to solve the 403 forbidden by using the CDK_DOCKER_CREDS_FILE env variable described above but did not have luck (although I didn't pursue it very thoroughly). What actually solved my issue was giving proper ECR permissions to the ECR publishing bootstrap cdk role that you @blimmer stated (specifically, access to two new ECR repositories that were outside the scope of our cdk setup that I switched our Dockerfiles to reference as their base image).

The knowledge gap for me was the OIDC authentication, role, policy, and trust relationship I was using for github actions to execute cdk deploy in a pipeline was not the actual role performing the docker build command when creating a lambda via lambda_.DockerImageFunction (because that role was using sts:assume to do all necessary cdk work). This meant my attempt to give that OIDC role ecr:* permissions did nothing to solve my issue, nor did authenticating to ECR with that role earlier in the job.

I would like to mention that I had trouble logging what role was attempting to create the dockerized lambda and execute the docker build command in the first place (which is why it took me this long to find this github issue). I probably would have gotten here sooner if I had changed the log verbosity during the deployment.

will7200 commented 2 months ago

Just wanted to say that CDK_DOCKER_CREDS_FILE does not work. My current work around is to pull the desired image before hitting cdk-assets.

[Container] 2024/05/17 18:34:19.009136 Running on CodeBuild On-demand
[Container] 2024/05/17 18:34:19.009149 Waiting for agent ping
[Container] 2024/05/17 18:34:19.211914 Waiting for DOWNLOAD_SOURCE
[Container] 2024/05/17 18:34:20.751952 Phase is DOWNLOAD_SOURCE
[Container] 2024/05/17 18:34:20.761618 CODEBUILD_SRC_DIR=/codebuild/output/src2092927603/src
[Container] 2024/05/17 18:34:20.762078 YAML location is /codebuild/readonly/buildspec.yml
[Container] 2024/05/17 18:34:20.763711 Setting HTTP client timeout to higher timeout for S3 source
[Container] 2024/05/17 18:34:20.763822 Processing environment variables
[Container] 2024/05/17 18:34:20.972663 No runtime version selected in buildspec.
[Container] 2024/05/17 18:34:21.024315 Moving to directory /codebuild/output/src2092927603/src
[Container] 2024/05/17 18:34:21.025827 Unable to initialize cache download: no paths specified to be cached
[Container] 2024/05/17 18:34:21.131149 Configuring ssm agent with target id: codebuild:-
[Container] 2024/05/17 18:34:21.156487 Successfully updated ssm agent configuration
[Container] 2024/05/17 18:34:21.156769 Registering with agent
[Container] 2024/05/17 18:34:21.199904 Phases found in YAML: 3
[Container] 2024/05/17 18:34:21.199919  PRE_BUILD: 3 commands
[Container] 2024/05/17 18:34:21.199924  INSTALL: 1 commands
[Container] 2024/05/17 18:34:21.199950  BUILD: 1 commands
[Container] 2024/05/17 18:34:21.200189 Phase complete: DOWNLOAD_SOURCE State: SUCCEEDED
[Container] 2024/05/17 18:34:21.200201 Phase context status code:  Message:
[Container] 2024/05/17 18:34:21.288625 Entering phase INSTALL
[Container] 2024/05/17 18:34:21.289039 Running command npm install -g cdk-assets@2
added 109 packages in 8s
[Container] 2024/05/17 18:34:42.134466 Phase complete: INSTALL State: SUCCEEDED
[Container] 2024/05/17 18:34:42.134494 Phase context status code:  Message:
[Container] 2024/05/17 18:34:42.166368 Entering phase PRE_BUILD
[Container] 2024/05/17 18:34:42.166917 Running command ACCOUNT_OWNER=`aws sts get-caller-identity --query 'Account' --output text`
[Container] 2024/05/17 18:34:57.488756 Running command aws ecr get-login-password | docker login -u AWS --password-stdin https://${ACCOUNT_OWNER}.dkr.ecr.${AWS_DEFAULT_REGION}.amazonaws.com
WARNING! Your password will be stored unencrypted in /root/.docker/config.json.
Configure a credential helper to remove this warning. See
https://docs.docker.com/engine/reference/commandline/login/#credentials-store
Login Succeeded
[Container] 2024/05/17 18:34:58.149278 Running command export CDK_DOCKER_CREDS_FILE=~/.docker/config.json
[Container] 2024/05/17 18:34:58.154418 Phase complete: PRE_BUILD State: SUCCEEDED
[Container] 2024/05/17 18:34:58.154442 Phase context status code:  Message:
[Container] 2024/05/17 18:34:58.188559 Entering phase BUILD
[Container] 2024/05/17 18:34:58.189069 Running command cdk-assets --path "assembly--14E3797B.assets.json" --verbose publish "-:--us-west-2"
verbose: Loaded manifest from assembly----/-.assets.json: 6 assets found
verbose: Applied selection: 1 assets selected.
info   : [0%] start: Publishing -:--us-west-2
verbose: [0%] check: Check -.dkr.ecr.us-west-2.amazonaws.com/cdk-hnb659fds-container-assets---us-west-2:-
error  : [100%] fail: Cannot convert undefined or null to object
Failure: TypeError: Cannot convert undefined or null to object
    at Function.keys (<anonymous>)
    at Docker.configureCdkCredentials (/usr/local/lib/node_modules/cdk-assets/lib/private/docker.js:114:32)
    at DockerFactory.forBuild (/usr/local/lib/node_modules/cdk-assets/lib/private/docker.js:180:53)
    at ContainerImageAssetHandler.build (/usr/local/lib/node_modules/cdk-assets/lib/private/handlers/container-images.js:23:65)
    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
    at async AssetPublishing.publishAsset (/usr/local/lib/node_modules/cdk-assets/lib/publishing.js:123:17)
    at async AssetPublishing.publish (/usr/local/lib/node_modules/cdk-assets/lib/publishing.js:41:22)
    at async publish (/usr/local/lib/node_modules/cdk-assets/bin/publish.js:19:5)
    at async /usr/local/lib/node_modules/cdk-assets/bin/cdk-assets.js:32:9
    at async Object.handler (/usr/local/lib/node_modules/cdk-assets/bin/cdk-assets.js:56:9)
[Container] 2024/05/17 18:34:58.728495 Command did not exit successfully cdk-assets --path "assembly---Dev/-.assets.json" --verbose publish "-:--us-west-2" exit status 1
[Container] 2024/05/17 18:34:58.732469 Phase complete: BUILD State: FAILED
[Container] 2024/05/17 18:34:58.732488 Phase context status code: COMMAND_EXECUTION_ERROR Message: Error while executing command: cdk-assets --path "assembly---Dev/-.assets.json" --verbose publish "-:--us-west-2". Reason: exit status 1
[Container] 2024/05/17 18:34:58.760429 Entering phase POST_BUILD
[Container] 2024/05/17 18:34:58.763247 Phase complete: POST_BUILD State: SUCCEEDED
[Container] 2024/05/17 18:34:58.763259 Phase context status code:  Message:
sthuber90 commented 3 weeks ago

For me the same. I cannot get it to work with CDK_DOCKER_CREDS_FILE. A working example in the docs would be really helpful @peterwoodworth

I've tried the DockerCredentials helper, created the file manually, following the structure described here, all without success. Only thing that works is pulling the image before running cdk deploy