aws / aws-cdk

The AWS Cloud Development Kit is a framework for defining cloud infrastructure in code
https://aws.amazon.com/cdk
Apache License 2.0
11.5k stars 3.85k forks source link

[pipelines] Add stage to strip assets out of cloud assembly before deploying to CloudFormation #9917

Open MamishIo opened 4 years ago

MamishIo commented 4 years ago

To avoid the CodePipeline artifact size limit in CloudFormation deploy actions, the pipeline should generate an intermediate artifact which is the cloud assembly but with asset files removed, and use this as the input for the deploy actions.

Use Case

Regardless of the source provider used, CFN deploy actions have an input artifact size limit of 256MB. The CDK pipeline uses the initial cloud assembly, containing all asset files, all the way through to the CFN action inputs, even though the stacks don't require them (as far as I understand the asset system, all assets are published and linked to CFN parameters by this point).

For builds that produce large/multiple assets totalling over 256MB, this causes CodePipeline limit errors in the deployment stages. Assemblies up to 1GB or 5GB (depending on the source provider) could be produced with this change.

Specific example: monorepos used to build many related services that are all deployed as separate containers/functions/etc.

Proposed Solution

Add an extra pipeline stage after asset publishing and before application stage deployment, which runs a CodeBuild action to load the cloud assembly, strip out asset files, and generate a new artifact containing only the CFN templates and any data necessary for CFN. The CFN actions should use this new artifact as their input.

Other


This is a :rocket: Feature Request

seawatts commented 3 years ago

I'm also seeing this error and becoming very close to hitting the limit 253MB with 36 lambdas, 1 docker container, and two application stages, Staging and Prod.

seawatts commented 3 years ago

@MamishIo any progress here?

jonathan-kosgei commented 3 years ago

Ran into this too while deploying to multiple regions, worked for 2 regions, got the limit on 3.

jonathan-kosgei commented 3 years ago

@MamishIo is there any workaround for this?

rix0rrr commented 3 years ago

There is no easy workaround as of yet.

rix0rrr commented 3 years ago

What could work is producing 2 cloud artifacts from the synth step (one with the assets, one without) and then using property overrides to switch between them for the different actions.

jonathan-kosgei commented 3 years ago

@rix0rrr Is there any timeline when this might be fixed? We're not able to use pipelines for a multi-region setup because of this.

rix0rrr commented 3 years ago

There is no timeline as of yet.

rix0rrr commented 3 years ago

Another workaround you could try is postprocessing the .json files in a post-build script in your Synth step and dedupe the assets yourself.

github-actions[bot] commented 3 years ago

⚠️COMMENT VISIBILITY WARNING⚠️

Comments on closed issues are hard for our team to see. If you need more assistance, please either tag a team member or open a new issue that references this one. If you wish to keep having a conversation with other community members under this issue feel free to do so.

maxdumas commented 3 years ago

I'm not sure if this issue is completely fixed by #11008, as it is reported. #11008 only fixes issues of assets being needlessly duplicated—it doesn't do anything to solve the issues with assets needlessly moving forward in the pipeline and potentially hitting size limits. I'm currently encountering this issue, as my assets include several large Docker images with large build context dependencies. As a result, the CloudAssembly artifact hits 4.6GB in size by the time it goes forward into the CFN deployment stage.

@rix0rrr

vibe commented 3 years ago

I'm not sure if this issue is completely fixed by #11008, as it is reported. #11008 only fixes issues of assets being needlessly duplicated—it doesn't do anything to solve the issues with assets needlessly moving forward in the pipeline and potentially hitting size limits. I'm currently encountering this issue, as my assets include several large Docker images with large build context dependencies. As a result, the CloudAssembly artifact hits 4.6GB in size by the time it goes forward into the CFN deployment stage.

Also faced this issue recently. Even after some optimization, I'm uncomfortably close to the 256MB limit.

maxdumas commented 3 years ago

@rix0rrr Any chance we could get this re-opened? See my comment above.

shortjared commented 3 years ago

This is continuously causing us pain when working through deploying a bunch of static assets via https://docs.aws.amazon.com/cdk/api/latest/docs/aws-s3-deployment-readme.html.

drdivine commented 3 years ago

This is currently causing us pain. The Master build won't go through on our pipeline. We have the following error staring at us:

Action execution failed Artifact [Artifact_Build_Synth] exceeds max artifact size

Please help.

mpuhacz commented 3 years ago
Artifact [CloudAssemblyArtifact] exceeds max artifact size

This is a real pain and it breaks our pipelines. Any chance the 256 MB limit can be increased?

SimonJang commented 2 years ago

Any update on this?

ChrisSargent commented 2 years ago

Does anyone have any working work-arounds for this - and @AWS team, is there anything specific that could be worked on to aid in this?

hoos commented 2 years ago

It would be great to at least have a viable workaround for this until a fix is put into place. It's causing a lot pain.

rmanig commented 2 years ago

Does anyone have any working work-arounds for this - and @aws team, is there anything specific that could be worked on to aid in this?

You can try something like this, its quite hacky but it works for me. Add a new Pre-Shell/CodeBuild Stage step. Get the current (latest) cloud-assembly artifact from S3. Remove the assets, in my case all JARs. Copy it back to S3. That's it. The buildspec should look something like this:

  "version": "0.2",
  "phases": {
    "build": {
      "commands": [
        "LATEST=$(aws s3 ls s3://<path-to-cloudassembly>/ | sort | tail -n 1 | awk '{print $4}')",
        "aws s3 cp s3://<path-to-cloudassembly>/$LATEST .",
        "unzip $LATEST -d tmp",
        "cd tmp",
        "rm -rf *.jar",
        "zip -r -A $LATEST *",
        "aws s3 cp $LATEST s3://<path-to-cloudassembly>/"
      ]
    }
  }
}

Don't forget to add a S3::PutObject Permission to the ServiceRole.

ewahl-al commented 2 years ago

We hit this issue today too.

@aws team it would sure be nice if someone took the time to add a clearer guide on how to work around this given that it doesn't sound like a fix is on the radar soon.

thank you

MamishIo commented 2 years ago

This might not be applicable for most people since my project is a bit weird (e.g. Java instead of TS, legacy CDK pipeline lib, CodeBuild synth via buildspec.yml file...) but I finally put together a workaround for this by generating a second no-assets artifact and post-processing the pipeline template to use the new artifact for CFN actions (side note: I'd have preferred doing this purely in CDK but it seemed impractical in this case).

https://github.com/HtyCorp/serverbot2-core/commit/d4397291b98098ae2d337ef86dd4ba8f580ff09a

The pipeline is spitting out 260MB assemblies now but deploying without any problems! Hope that helps someone even if it's not a great general solution.

tobni commented 2 years ago

Unless I'm mistaken, all assets are already published past the Assets step, meaning it is safe to strip all assets from the synth output in an inital Wave. I believe this is a generic solution that is basically plug-and-play for aws-cdk 2.12. Could be that the rm -rfv <files> needs customization for your needs.

        strip_assets_step = CodeBuildStep(
            'StripAssetsFromAssembly',
            input=pipeline.cloud_assembly_file_set,
            commands=[
                'S3_PATH=${CODEBUILD_SOURCE_VERSION#"arn:aws:s3:::"}',
                'ZIP_ARCHIVE=$(basename $S3_PATH)',
                'rm -rfv asset.*',
                'zip -r -q -A $ZIP_ARCHIVE *',
                'aws s3 cp $ZIP_ARCHIVE s3://$S3_PATH',
            ],
        )
        pipeline.add_wave('BeforeDeploy', pre=[strip_assets_step])
        # Add your stages...

        pipeline.build_pipeline()
        pipeline.pipeline.artifact_bucket.grant_write(strip_assets_step.project)
jonathan-kosgei commented 2 years ago

@tobni the strip_assets_step is working correctly for me and shows the artifact SynthOutput is 1.1 MB however the subsequent stages in the wave still get an input artifact SynthOutput that's 200MB+. Is there a missing step to get them to use the output from strip_assets_step?

Edit: It seems to work but only in the region the pipeline is. This is because the other regions seem to get the assembly from a different bucket with name format <stack-name>-seplication<some region specific id> I don't see a way to be able to get the names of the region specific s3 artifact buckets to copy the new zip to.

jonathan-kosgei commented 2 years ago

I finally got @tobni's code to work with cross region replication, which uses a different randomly named bucket for every region!

strip_assets_step = CodeBuildStep(
    'StripAssetsFromAssembly',
    input=pipeline.cloud_assembly_file_set,
    commands=[
        "cross_region_replication_buckets=$(grep BucketName cross-region-stack-* | awk -F ':' '{print $4}' | tr '\n' ' ' | tr -d '\"')",
        'S3_PATH=${CODEBUILD_SOURCE_VERSION#"arn:aws:s3:::"}',
        'ZIP_ARCHIVE=$(basename $S3_PATH)',
        'rm -rf asset.*',
        'zip -r -q -A $ZIP_ARCHIVE *',
        'aws s3 cp $ZIP_ARCHIVE s3://$S3_PATH',
        'object_location=${S3_PATH#*/}',
        'for bucket in $cross_region_replication_buckets; do aws s3 cp $ZIP_ARCHIVE s3://$bucket/$object_location; done'
    ],
)

And you need the following permissions

pipeline.build_pipeline()
pipeline.pipeline.artifact_bucket.grant_write(strip_assets_step.project)
strip_assets_step.project.add_to_role_policy(
    iam.PolicyStatement(
        effect=iam.Effect.ALLOW,
        resources=[f"arn:aws:s3:::<pipeline stack name>-seplication/*", f"arn:aws:s3:::<pipeline stack name>-seplication*"],
        actions=["s3:*"],
    )
)
strip_assets_step.project.add_to_role_policy(
    iam.PolicyStatement(
        effect=iam.Effect.ALLOW,
        resources=["*"],
        actions=["kms:GenerateDataKey"]
    )
)
BenassiJosef commented 2 years ago

@tobni and @jonathan-kosgei thanks a lot guys for the help. Just leaving my TS version here for folks to C and V.

    let strip = new CodeBuildStep("StripAssetsFromAssembly", {
      input: pipeline.cloudAssemblyFileSet,
      commands: [
        'S3_PATH=${CODEBUILD_SOURCE_VERSION#"arn:aws:s3:::"}',
        "ZIP_ARCHIVE=$(basename $S3_PATH)",
        "echo $S3_PATH",
        "echo $ZIP_ARCHIVE",
        "ls",
        "rm -rfv asset.*",
        "zip -r -q -A $ZIP_ARCHIVE *",
        "ls",
        "aws s3 cp $ZIP_ARCHIVE s3://$S3_PATH",
      ],
      rolePolicyStatements:[ new iam.PolicyStatement({
        effect: iam.Effect.ALLOW,
        resources: ["*"],
        actions: ["s3:*"],
      }),
      new iam.PolicyStatement({
        effect: iam.Effect.ALLOW,
        resources: ["*"],
        actions: ["kms:GenerateDataKey"],
      })]

    });

    pipeline.addWave("BeforeStageDeploy", {
      pre: [strip],
    });
rurounijones commented 2 years ago

Tagging @rix0rrr and @MamishIo following advice from the Comment Visibility Warning.

I ran into this issue today.

I believe the current situation is that people have found a bit of an icky workaround in adding extra CodeBuildSteps to clean out the assets in the SynthOutput (See above comments) but it would be great to not have to do this.

Based on what others have said it seems like the SynthOutput doesn't need to be passed at all in the first place and this could be removed? Doing so would render this workaround unneeded.

hrvg commented 1 year ago

We hit this issue this week and had to put together a work around from the answers here.

Adding to the comments from @jonathan-kosgei to add a version of the awk command that works if you have one or more cross-region stacks. @jonathan-kosgei version works for more than one cross-region stacks, but awk-ing on : will fail with $4 when only one cross-region stack is present; the element of interest is at $2; using BucketName solves that and works regardless of the number of cross-region stacks.

strip_assets_step = CodeBuildStep(
    'StripAssetsFromAssembly',
    input=pipeline.cloud_assembly_file_set,
    commands=[
        "cross_region_replication_buckets=$(grep BucketName cross-region-stack-* | awk -F 'BucketName' '{print $2}' | tr -d ': ' | tr -d '\"' | tr -d ',')",
        'S3_PATH=${CODEBUILD_SOURCE_VERSION#"arn:aws:s3:::"}',
        'ZIP_ARCHIVE=$(basename $S3_PATH)',
        'rm -rf asset.*',
        'zip -r -q -A $ZIP_ARCHIVE *',
        'aws s3 cp $ZIP_ARCHIVE s3://$S3_PATH',
        'object_location=${S3_PATH#*/}',
        'for bucket in $cross_region_replication_buckets; do aws s3 cp $ZIP_ARCHIVE s3://$bucket/$object_location; done'
    ],
)

You can also access the replication names dynamically from pipeline.pipeline.cross_region_support:

pipeline.build_pipeline()
cross_region_support = pipeline.pipeline.cross_region_support
replication_bucket_arns = [
    cross_region_support[key].replication_bucket.bucket_arn
    for key in cross_region_support.keys()]
replication_bucket_objects = [arn + '/*' for arn in replication_bucket_arns]
replication_resources = replication_bucket_arns + replication_bucket_objects
pipeline.pipeline.artifact_bucket.grant_write(strip_assets_step.project)
strip_assets_step.project.add_to_role_policy(
    cdk.aws_iam.PolicyStatement(
        effect=cdk.aws_iam.Effect.ALLOW,
        resources=replication_resources,
        actions=["s3:*"],
    )
)
strip_assets_step.project.add_to_role_policy(
    cdk.aws_iam.PolicyStatement(
        effect=cdk.aws_iam.Effect.ALLOW,
        resources=["*"],
        actions=["kms:GenerateDataKey"]
    )
)
wr-cdargis commented 1 year ago

@rix0rrr it seems the common workaround is to wipe out the assets. Is this a suggested workaround?

moltar commented 1 year ago

Throwing in my TypeScript solution for cross-region buckets based on the above:

    const { crossRegionSupport, artifactBucket } = pipeline.pipeline
    const artifactBuckets = [
      artifactBucket,
      ...Object.values(crossRegionSupport).map((crs) => crs.replicationBucket),
    ]
    for (const bucket of artifactBuckets) {
      bucket.grantReadWrite(stripAssetsStep.project)
    }
moltar commented 1 year ago

How about this mad solution:

Create an object lambda access point on the bucket. Lambda would filter the artifacts on the fly, and remove unnecessary files.

The only thing I am unsure how to achieve is to tell the steps below to use the access point, instead of the bucket directly. I am guessing this would be possible to do at the CDK core level, but not sure if would be possible to do as a "workaround".

eciuca commented 1 year ago

I am also leaving my Java solution based on @tobni 's implementation. Thanks a lot!

// See why we need the stripAssetsFromAssembly here: https://github.com/aws/aws-cdk/issues/9917#issuecomment-1063857885
        CodeBuildStep stripAssetsFromAssembly = CodeBuildStep.Builder.create("StripAssetsFromAssembly")
                .input(pipeline.getCloudAssemblyFileSet())
                .commands(List.of(
                        "S3_PATH=${CODEBUILD_SOURCE_VERSION#\"arn:aws:s3:::\"}",
                        "ZIP_ARCHIVE=$(basename $S3_PATH)",
                        "rm -rfv asset.*",
                        "zip -r -q -A $ZIP_ARCHIVE *",
                        "aws s3 cp $ZIP_ARCHIVE s3://$S3_PATH"
                ))
                .build();
        pipeline.addWave("BeforeDeploy", WaveOptions.builder()
                .pre(List.of(stripAssetsFromAssembly))
                .build());
        pipeline.addStage(deployStage);
        pipeline.buildPipeline();
        pipeline.getPipeline().getArtifactBucket().grantWrite(stripAssetsFromAssembly.getProject());
bgtill commented 7 months ago

this is annoying and honestly feels like something that cdk/pipelines should handle. I'll leave this here Incase there are any other golang people out there that run into this, this is what seems to be working for me:

stripAssetsStep := pipelines.NewCodeBuildStep(jsii.String("StripAssetsFromAssembly"), &pipelines.CodeBuildStepProps{
    Input: pipeline.CloudAssemblyFileSet(),
    Commands: jsii.Strings(
        "S3_PATH=${CODEBUILD_SOURCE_VERSION#\"arn:aws:s3:::\"}",
        "ZIP_ARCHIVE=$(basename $S3_PATH)",
        "echo $S3_PATH",
        "echo $ZIP_ARCHIVE",
        "ls",
        "rm -rfv asset.*",
        "zip -r -q -A $ZIP_ARCHIVE *",
        "ls",
        "aws s3 cp $ZIP_ARCHIVE s3://$S3_PATH",
    ),
    RolePolicyStatements: &[]iam.PolicyStatement{
        iam.NewPolicyStatement(&iam.PolicyStatementProps{
            Effect:    iam.Effect_ALLOW,
            Resources: jsii.Strings("*"),
            Actions: jsii.Strings(
                "s3:*",
            ),
        }),
        iam.NewPolicyStatement(&iam.PolicyStatementProps{
            Effect:    iam.Effect_ALLOW,
            Resources: jsii.Strings("*"),
            Actions:   jsii.Strings("kms:GenerateDataKey"),
        }),
    },
})

Add this to your Pre argument in either the AddStage or AddWave step.

markrekveld commented 2 weeks ago

For those using java here is wat is working for me in my multi region cdk pipeline

// TODO Create your pipline
CodePipeline pipeline = CodePipeline.Builder.create().build();

// See why we need the stripAssetsFromAssembly here: https://github.com/aws/aws-cdk/issues/9917#issuecomment-1063857885
CodeBuildStep stripAssetsFromAssembly = CodeBuildStep.Builder.create("StripAssetsFromAssembly")
        .input(pipeline.getCloudAssemblyFileSet())
        .commands(List.of(
                "cross_region_replication_buckets=$(grep BucketName cross-region-stack-* | awk -F 'BucketName' '{print $2}' | tr -d ': ' | tr -d '\"' | tr -d ',')",
                "S3_PATH=${CODEBUILD_SOURCE_VERSION#\"arn:aws:s3:::\"}",
                "ZIP_ARCHIVE=$(basename $S3_PATH)",
                "rm -rf asset.*",
                "zip -r -q -A $ZIP_ARCHIVE *",
                "aws s3 cp $ZIP_ARCHIVE s3://$S3_PATH",
                "object_location=${S3_PATH#*/}",
                "for bucket in $cross_region_replication_buckets; do aws s3 cp $ZIP_ARCHIVE s3://$bucket/$object_location; done"))
        .build();
pipeline.addWave("BeforeDeploy",
        WaveOptions.builder()
                .pre(List.of(stripAssetsFromAssembly))
                .build());

// TODO Add your waves/stages here

pipeline.buildPipeline();
pipeline.getPipeline()
        .getArtifactBucket()
        .grantWrite(stripAssetsFromAssembly.getProject());
for (CrossRegionSupport crossRegionSupport : pipeline.getPipeline()
        .getCrossRegionSupport()
        .values())
{
    crossRegionSupport.getReplicationBucket()
            .grantWrite(stripAssetsFromAssembly.getProject());
}

Thanks to all that have contributed to this issue with examples in different langauges!