cdk-assets: Remove asset from staging bucket on failed deployment

aws / aws-cdk

The AWS Cloud Development Kit is a framework for defining cloud infrastructure in code

https://aws.amazon.com/cdk

Apache License 2.0

11.38k stars 3.79k forks source link

cdk-assets: Remove asset from staging bucket on failed deployment #14474

Open bgshacklett opened 3 years ago

bgshacklett commented 3 years ago

In #12536, it has been noted that part of the problem is that a corrupted zip file may be uploaded to the staging bucket. At this point, CDK will no-longer attempt to upload the asset, again, because it detects that an asset with the corresponding hash resides within the bucket. After reaching this state, it is necessary to manually remove the affected asset, or assets, from the staging bucket before a successful deployment can occur. In cases where the deployment of a given asset fails, the asset should be removed from the staging bucket to ensure that this "broken" state is not reached.

Use Case

This change would help ensure that CDK does not attempt to use a corrupt pre-existing asset from the staging bucket during deployment.

Alternatives

Provide a CLI flag to ensure that assets are overwritten in the staging bucket on every deployment.

Other

[ ] :wave: I may be able to implement this feature request
[ ] :warning: This feature might incur a breaking change

This is a :rocket: Feature Request

eladb commented 3 years ago

Reassigning to @rix0rrr

dariagrudzien commented 3 years ago

We seem to be experiencing the same issue.

github-actions[bot] commented 2 years ago

This issue has not received any attention in 1 year. If you want to keep this issue open, please leave a comment below and auto-close will be canceled.

bgshacklett commented 2 years ago

Please do not auto-close this issue.

sannies commented 1 year ago

when you interrupt a deploy with Ctrl-C you also might end up with corrupted asset directories(*) for 3rd party layers. These asset directories will then be zipped, uploaded and cached. It is very hard to recover from that state.

(*) in my case the 3rd party layer is created by 'pip' in a docker.

rix0rrr commented 1 year ago

Good find! If we can, we should try and switch to multipart uploads. Those are atomic by default, and the file will only appear if the upload completes.

Depends on whether wr already have the correct s3 permissions on the asset role though...

rix0rrr commented 1 year ago

Multipart shouldn't need any additional permissions, so we should be good to deploy that.

Does need an additional lifecycle rule on the bucket to remove old multiparts though.

sannies commented 1 year ago

I don't think that we are exactly talking about the same issue here. I my case I hit Ctrl-C while the pip install (*) is running. The asset directory (asset.0aff....cd54) was created and some but not all of the 3rd party libraries have been installed in it. In this moment Ctrl-C interrupts the installation. The directory is then there but its content is corrupt. The next cdk synth will not rebuild this specific asset again. It is already there - no reason to do it. The directory will then be zipped and uploaded. In this moment the cdk asset bucket is 'poisoned' and you can only recover when your change the assets by force e.g. change the requirements.txt. A force flag would allow recovery without actually performing a dummy change.

(*)

LayerVersion(
   stack, '3rdpLayer',
   code=AssetCode(
        "lambdas",
        bundling=BundlingOptions(
            image=Runtime.PYTHON_3_9.,
            command=[
                'bash', '-c',
                'pip install -r requirements.txt -t /asset-output/python',
            ])))

rix0rrr commented 1 year ago

Oh I see, this isn't about the upload but about the build. I misunderstood.

We've fixed this for zipping (by building to a tempfile), but apparently not for bundling. That'll be the solution for bundling as well then.