aws / aws-cdk

The AWS Cloud Development Kit is a framework for defining cloud infrastructure in code
https://aws.amazon.com/cdk
Apache License 2.0
11.57k stars 3.88k forks source link

(app-staging-synthesizer-alpha): Allow adding PolicyStatements to imageRole #29894

Open blimmer opened 5 months ago

blimmer commented 5 months ago

Describe the feature

I'd like to be able to grant the auto-generated image publishing role additional permissions.

Use Case

I want to add registry caching to my DockerImageAsset builds. I have a separate ECR repo called docker-image-cache with a lifecycle policy specifically designed for storing cache layers.

However, the Docker AppStagingSynthesizer image role doesn't, by default, allow working with other ECR repos for security. For example, the imageRole it creates (cdk-my-app-image-role-us-west-2) looks like this:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Action": [
                "ecr:BatchCheckLayerAvailability",
                "ecr:BatchGetImage",
                "ecr:CompleteLayerUpload",
                "ecr:DescribeImages",
                "ecr:DescribeRepositories",
                "ecr:GetDownloadUrlForLayer",
                "ecr:InitiateLayerUpload",
                "ecr:PutImage",
                "ecr:UploadLayerPart"
            ],
            "Resource": "arn:aws:ecr:us-west-2:12345678910:repository/my-app/my-app-service",
            "Effect": "Allow"
        },
        {
            "Action": "ecr:GetAuthorizationToken",
            "Resource": "*",
            "Effect": "Allow"
        }
    ]
}

So the cache step fails:

User: arn:aws:sts::12345678910:assumed-role/cdk-my-app-image-role-us-west-2/aws-cdk-blimmer is not authorized to perform: ecr:BatchCheckLayerAvailability on resource: arn:aws:ecr:us-west-2:12345678910:repository/docker-image-cache because no identity-based policy allows the ecr:BatchCheckLayerAvailability action

Proposed Solution

It'd be nice to provide some API to access the imageRole that's auto-generated. Then, I'd like to call .grantPullPush(imageRole) on the other ECR repo.

Other Information

I can work around this by creating a role that allows access to the cache repo:

private createCdkAppStagingSynthesizerImagePublishingRole(dockerImageCacheRepository: Repository): Role {
  const role = new Role(this, "CdkAppStagingSynthesizerImagePublishingRole", {
    description: "Used by AppStagingSynthesizer stacks to publish images",
    assumedBy: new AccountPrincipal(this.awsAccount.id),
    roleName: "cdk-app-staging-synthesizer-image-role",
  });
  dockerImageCacheRepository.grantPullPush(role);

  return role;
}

Then referencing it in the defaultStackSynthesizer:

const app = new App({
  defaultStackSynthesizer: AppStagingSynthesizer.defaultResources({
    appId: "my-app",
    stagingBucketEncryption: BucketEncryption.S3_MANAGED,
    imageAssetVersionCount: 10, // keep the 10 most recent versions of the image assets for rollback
    imageAssetPublishingRole: BootstrapRole.fromRoleArn(
      "arn:aws:iam::13245678910:role/cdk-app-staging-synthesizer-image-role",
    ),
  }),
});

However, I will eventually run into a quota issue because each app staging synthesizer attaches a managed policy that grants access to work with its specific ECR repos.

I also tried to dig into the staging stack to find the role resource and append another PolicyStatement. However, I think the role is lazily created, so I'm not sure how to grab it through L1 constructs (e.g., this.synthesizer.stagingStack).

I also considered saving the cache in the same ECR repo, however, it pushes a lot of extra layers (with mode max for caching), so the "expire any 10 images" lifecycle rule will cause problems.

Acknowledgements

CDK version used

2.135.0 (build d46c474)

Environment details (OS name and version, etc.)

MacOS Sonoma

blimmer commented 5 months ago

I was able to find another workaround inside a custom Construct I'm exposing, CachedDockerImageAsset.

I found that after the first DockerImageAsset is initialized, the imageRole is created on the staging stack. These are all private properties, so I had to use as anys to make TS happy, but this works:

function ensureAccessToCacheRepo(scope: Construct, repo: IRepository) {
  const appStagingSynthesizerStack = (scope as any)?.synthesizer?.stagingStack;
  if (!appStagingSynthesizerStack) {
    throw new Error(
      "The CachedDockerImageAsset must be used with the @aws-cdk/app-staging-synthesizer-alpha synthesizer"
    );
  }

  const imageRole = appStagingSynthesizerStack.imageRole as Role;
  if (!imageRole) {
    throw new Error("The app synthesizer didn't create the image role like we expected...");
  }

  repo.grantPullPush(imageRole);
}

It'd be great if imageRole were not private - that'd be a great start!

pahud commented 5 months ago

I see dockerAssetPublishingRole here and according to this, I think you'll need to define a custom staging stack for the custom dockerAssetPublishingRole.

blimmer commented 5 months ago

That is a possibility that I looked into, but it's not quite what I was looking for. I like the fact that the default setup creates the role that only allows publishing to the specific ECR repos per-asset.

If I passed my own role, I'd need to recreate the same logic the default app staging synthesizer already has (granting push/pull to the ECR repos it auto-creates). Image if I set up the custom role, then another developer who's unfamiliar with the customization creates a new DockerImageAsset. They'd encounter errors that could be confusing to debug because they have the find the existing custom role I created.

Being able to simply call .grant on the role is a much better solution, at least for my workflow. What do you think?