(aws-ecr-assets): Build and save images during synthesis rather than at deployment

icj217 commented 1 year ago

Describe the feature

Currently, docker images defined in CDK apps are not built at synthesis time, but rather at deployment time.

The CDK should offer a way to build docker images during synthesis and save them as assets using docker save so that asset generation happens entirely at synthesis time.

Use Case

The CDK's build behavior for docker images diverges from the observed behavior of other types of assets (e.g. aws_lambda.AssetCode) where the asset's output directory (e.g. cdk.out/asset.${hash}/) contains the "final" contents of the asset (which are simply compressed during deployment).

This behavior seems to lead to a couple of undesirable realities/limitations:

There's no way to verify that a docker image definition will be built successfully until one attempts a deployment
Docker build arguments (which may contain values with TTLs) are encoded in the cdk.out/<stack>.assets.json file and are subject to expiration before the image is ever built
Cloud assemblies are no longer fully portable from system to system. The same docker context (and potentially other things like mounts) need to be available on both the system that synthesized the assembly as well as the one that deploys it

Proposed Solution

No response

Other Information

No response

Acknowledgements

[X] I may be able to implement this feature request
[ ] This feature might incur a breaking change

CDK version used

latest

Environment details (OS name and version, etc.)

MacOS 12.6

pahud commented 1 year ago

Thank you for the feedback and the upcoming PR. Can you share a little bit about the proposed solution?

SydneyUni-Jim commented 1 year ago

Would this help alleviate the issue of base images being built multiple times?

If I have a Dockerfile with this kind of structure, base is currently built independently in each asset CodeBuild project.

FROM alpine:latest AS base
# Baseline configuration

FROM base AS container1
# …

FROM base AS container2
# …

icj217 commented 1 year ago

@pahud I've taken a look at the source code to understand how the CDK handles building docker images. Here's my understanding:

During synthesis, assets are only "staged" (i.e. contents are written to cdk.out/asset-$hash/...) and registered with the stack synthesizer (via stack.synthesizer.addDockerImageAsset() method)
- For DockerImageAsset, that means the asset's source directory is copied into the directory above
- For TarballImageAsset, that means the source tarball is copied into the directory above
During deployment, the stack synthesizer actually builds the docker image (or, in the case of a TarballImageAsset, runs docker load) using the cdk-assets package. Interactions with the docker CLI are all handled through asset publishing (AssetPublishing.publishAsset()) using the private Docker class.

It seems like the most logical solution is to create a some kind of "bridge" construct that looks like DockerImageAsset on the outside but internally ends up behaving like TarballImageAsset (e.g. DockerImageTarballAsset). This construct would build the image during synthesis and write the image tarball (using docker save) out to the "staged" asset directory.

The only issue I see with this solution is that direct interactions with the docker CLI are currently not possible. Is there any reason we couldn't make the cdk-assets package's Docker class public or great a more generalized docker CLI interface that is agnostic to the context in which it is invoked (i.e. as part of asset manifest publishing vs being called directly from a construct)?

Please let me know if I'm missing anything or if you have any suggestions on possible solutions here!

automartin5000 commented 11 months ago

Just now building a deployment using DockerImageAsset for the first time and I too find the behavior unexpected for the reasons outlined by the OP. For us, we run cdk deploy 'prod/*' --app cdk.out/ when a PR has been approved and merged. This means there should be no other building happening nor any other dependencies required for this step other than the cdk CLI.

If something is being built after a PR has been approved (and expected to be ready for immediately deploy), that's an anti-pattern, especially for those using a CD tool like CodePipeline. I made a similar comment here

automartin5000 commented 11 months ago

Another use case is to be able to not have to rewrite the docker build logic for local dev that's already implemented in the CDK. I'd want to only have to do it once (eg. build args, dir, etc)

uncledru commented 7 months ago

We run these during our publish steps in GitHub Actions. cdk-assets builds and publishes the docker image.

    - name: Synth
      shell: bash
      working-directory: ${{ inputs.directory }}
      run: |-
        npx cdk synth ...

    - name: Upload cdk assets to AWS
      shell: bash
      run: |-
        npx cdk-assets publish --path ./<StackName>.assets.json

spyoungtech commented 6 months ago

This would be a great benefit for automated/approval workflows. Our automated processes synthesize and save the cdk.out directory as an artifact and await manual approval processes. However, because the docker builds only occur at deployment time, this can cause two key problems: (1) unexpected failures after the approval process (meaning it has be fixed and approved again) and (2) even if there are no changes in the source, the same docker build can produce different results depending on the time in which it is built (think unpinned dependencies, upstream image tag changes, etc.).

Ideally, we'd like to configure image build to occur beforehand so that the deployed artifacts never change after they're generated, meaning what is approved is definitely what gets deployed and we don't have unexpected failures at deploy time.

We could just avoid using the CDK for managing docker image assets altogether and require that build pipelines build and push docker images separately, but it's a very useful feature we'd like to continue leveraging.

jlnstrk commented 1 month ago

It would already help a lot if the cdk-assets tool mentioned by @uncledru offered a way to only build and not actually publish. The fact that it doesn't makes it equally unsuited for PR checks and the like, as all these assets would just pile up on S3/ECR without ever being part of a deployment, all the while there is no garbage collection (#64).

Even if it did, one would still have to build again for the actual deploy. So building during synth and saving the (compressed) tarballs into cdk.out really sounds like the most elegant solution, as pointed out by others here.

aws / aws-cdk