aws / aws-cdk

The AWS Cloud Development Kit is a framework for defining cloud infrastructure in code
https://aws.amazon.com/cdk
Apache License 2.0
11.61k stars 3.91k forks source link

(@aws-cdk/aws-ecr-assets): DockerImageAsset - can't tell which images are outdated, and where they came from #18629

Open a-h opened 2 years ago

a-h commented 2 years ago

Description

I got the results back from a 3rd part security test of our AWS account. The results included this finding:

Amazon Elastic Container Registry (ECR) repositories had vulnerabilities identified by the ECR scanning service.

And it listed the affected assets. They were all CDK assets.

aws-cdk/assets:157c44972d2cfea90aa4428e8b06b6527062d992e06eef1a9f12e2ec1c6d4821"
aws-cdk/assets:e887bc7ca1693059443036ccb17ccc10f1a203a1ecf4004dc137c111d3c8e919"
aws-cdk/assets:c91fc18db0adf214fc5ca60d9edd90d546b0328f7ebb3fc47251b6aed9eb6ab6"
aws-cdk/assets:a5d71a67cf09b808b3534a73780548dc0613a8a45925098fbc3ebdbb7f46cfab"
...

So I went to ECR to take a look.

Screenshot 2022-01-24 at 19 10 48

But I can't see a way to differentiate between the 8 DockerImageAssets for different projects that all deploy to the same AWS account using CDK, and I assume it will just keep all the old stuff lying around and growing.

So clearly this isn't the way to do things:

        const dockerfile = path.join(__dirname, '../../')
        const dockerImage = new ecrAssets.DockerImageAsset(this, 'frontend', {
            directory: dockerfile,
            exclude: ['.git', 'cdk.out', 'node_modules'],
            buildArgs: {
                NEXT_PUBLIC_TAG_MANAGER_URL: props.tagManagerUrl,
            },
        })
        const image = ecs.ContainerImage.fromDockerImageAsset(dockerImage)

Perhaps DockerImageAsset needs to be more aggressive about cleaning up after itself, and also to provide information in ECR so that it's possible to find out which CDK project it's related to?

Use Case

Need to find the source of the security issues in the Docker containers without looking into each CDK project build history to find the hash in the CI logs.

Proposed Solution

Also tag the ECR instances with the CloudFormation ID.

Other information

No response

Acknowledge

eladb commented 2 years ago

Related to https://github.com/aws/aws-cdk-rfcs/issues/64

I am also wondering if there is a way to add some kind of description to the ECR repositories.

eladb commented 2 years ago

@madeline-k assigning to you as part of your ECR ownership.

a-h commented 2 years ago

Reading about this, there doesn't seem to be a good solution.

I thought I might be able to use Lifecycle Policies to clear down unused images, but it seems that they just delete images, even if they're in use within ECS, which I definitely don't want to happen.

In that scenario, a scaling event (e.g. adding another container) would result in loss of service.

Is it really the case that ECS isn't checked? 6 years ago https://github.com/trek10inc/ecr-cleaner had this in place, but discontinued it due to the ECR lifecycle feature - I suppose they thought it would be developed more.

Seems like I'm going to have to write a Lambda function to periodically check ECS and tag the images that are in use (and de-tag ones that are no longer in use), then use a lifecycle policy to delete images that are no longer in use. What a waste of time.

a-h commented 2 years ago

Since I'm only using ECS, I wrote a Go program to:

I was able to clean up 1200+ unused images from my testing environment, and then focus on solving the security vulnerabilities in images that are actually in use.

https://github.com/a-h/cdk-ecr-asset-cleaner/

madeline-k commented 2 years ago

The RFC @eladb linked (https://github.com/aws/aws-cdk-rfcs/issues/64) would address this issue. Please comment and add +1s there to help us prioritize it!

In the meantime, we could definitely add some descriptions to the ECR repos to make manual deletion easier.

@a-h , thanks for sharing your tool for cleanup!

There is also a community-created construct that can do asset clean up as well: https://github.com/jogold/cloudstructs/blob/master/src/toolkit-cleaner/README.md. Please check that out! (Shout out to @jogold 🎉 🚀)

madeline-k commented 2 years ago

I'm estimating this as large effort thinking about the overall asset cleanup problem. But there may be smaller effort things we can do just for ECR.

a-h commented 2 years ago

Thanks @madeline-k - I've put some detailed notes on how I think the overall problem could be resolved into https://github.com/aws/aws-cdk-rfcs/issues/64#issuecomment-1027740674

github-actions[bot] commented 1 year ago

This issue has not received any attention in 1 year. If you want to keep this issue open, please leave a comment below and auto-close will be canceled.

a-h commented 1 year ago

This is still a problem. CDK's DockerImageAsset construct still creates untraceable assets with no mechanism for cleaning them up, or identifying the source of the asset.

rawpixel-vincent commented 1 year ago

Maybe adding an option to the policy settings to prevent purging image that have been pulled in the last n days is workable? It's probably impossible for aws ecr to knows if an image is in use, but there is probably pull access metrics that could be used?

a-h commented 1 year ago

@rawpixel-vincent - pull metrics doesn't really work. You might have a monthly scheduled task, and if you delete images which haven't been pulled in the last 14 days, your scheduled task will fail when it does try and pull the image.

Ultimately, I don't think DockerImageAsset is a good design, and its use should be discouraged in favour of a new design which uses explicit naming and tagging, immutable tags, and manages the deletion of unused images at the CDK layer.

rawpixel-vincent commented 1 year ago

yes, I definitely agree with that. And what I suggested wouldn't fix this issue. I should move that to another feature request 👍🏻

On the issue subject, I had a lambda that stopped working in the past because the image has been deleted by an ECR lifecycle. I got no notifications from AWS lambda - and it was not possible to update the CDK stack of that lambda, ultimately I had to delete the stack (I also had to delete some resources manually or the stack deletion would never finish...) and recreate it.

Ultimately, I don't think DockerImageAsset is a good design, and its use should be discouraged in favour of a new design which uses explicit naming and tagging, immutable tags, and manages the deletion of unused images at the CDK layer.

I think the new construct should also enforce to use an explicit ECR repository, instead of publishing everything to the same cdk repo

jgodwin commented 1 year ago

I'll +1 the suggestion that DockerImageAsset be retired. It makes absolutely zero sense that multiple Stacks can deploy to the same ECR repo with zero indication of where or what the images contained within are. It's not even clear to me how you ensure that you get the right image for each task/service right now aside from hoping that they have independent tags on them.

I'd much rather have DockerImageAsset create a new (named) ECR repo, and then upload images to that specific repo. Then we could easily manage images through lifecycle policies on the named ECR repos.

rawpixel-vincent commented 1 year ago

some links related to this issue for search bots

https://github.com/aws/aws-cdk/issues/6692 https://github.com/aws/aws-cdk-rfcs/issues/64 https://stackoverflow.com/questions/72647414/aws-cdk-fargate-service-and-ecr-lifecycle-policy-of-created-image https://www.reddit.com/r/aws_cdk/comments/yjpjwj/various_cdk_assets_and_implications_of_deleting/ https://twitter.com/pahudnet/status/1375787983379423236

SamStephens commented 1 year ago

This is a massive problem for those using AWS Inspector.

Firstly, when vulnerabilities are reported, I have no simple way to attribute those vulnerabilities to the component that generated the image and needs updating.

Secondly, because there's no cleanup, I'm going to be getting security reports for outdated images that haven't actually been deployed for potentially years.