aws / aws-cdk

The AWS Cloud Development Kit is a framework for defining cloud infrastructure in code
https://aws.amazon.com/cdk
Apache License 2.0
11.5k stars 3.84k forks source link

aws_s3_deployment: Files deleted when deployment bucket is used in sequence #30871

Open thvasilo opened 1 month ago

thvasilo commented 1 month ago

Describe the bug

I have a stack for which I wish to deploy two scripts on S3.

I have two files under ../assets which I want both deployed at the top level of the created AssetBucket. What I'm observing instead is that only one of the files ends up being deployed, and it seems like it's random, sometimes the corresponding to SINGLE_ENTRYPOINT_FILENAME is deployed, sometimes ENTRYPOINT_FILENAME.

Not sure if there's some sort of race condition going on.

Expected Behavior

Both files should be deployed

Current Behavior

Only one of the resource files actually ends up on S3

Reproduction Steps

Here's how I'm trying to accomplish that:

from aws_cdk import (
    Stack,
    aws_s3_deployment as s3_deploy,
)

DIRNAME = os.path.dirname(__file__)

class EntrypointAssetStack(Stack):
    """Creates the needed resources for the entrypoint assets on S3."""

    def __init__(
        self,
        scope: Construct,
        construct_id: str,
        **kwargs,
    ) -> None:
        super().__init__(scope, construct_id, **kwargs)

        # Create an asset bucket where we upload the entry point script
        # TODO: Trying to add removal policy here messes up bucket policy downstream, can we fix this?
        asset_bucket = s3.Bucket(
            self,
            "AssetBucket",
            # removal_policy=RemovalPolicy.DESTROY,
            # auto_delete_objects=True,
            block_public_access=s3.BlockPublicAccess.BLOCK_ALL,
            versioned=True,
        )

        # Omitted a point where I create a policy for the asset bucket, asset_bucket_policy

        entrypoint_deployment = s3_deploy.BucketDeployment(
            self,
            "DeployMultiEntryPoint",
            destination_bucket=asset_bucket,
            sources=[
                s3_deploy.Source.asset(
                    path=os.path.join(DIRNAME, "..", "assets"),
                    exclude=["**", f"!{ENTRYPOINT_FILENAME}"],
                )
            ],
            retain_on_delete=True,
        )
        entrypoint_deployment.node.add_dependency(asset_bucket_policy)

        single_entrypoint_deployment = s3_deploy.BucketDeployment(
            self,
            "DeploySingleWorkerEntryPoint",
            destination_bucket=asset_bucket,
            sources=[
                s3_deploy.Source.asset(
                    path=os.path.join(DIRNAME, "..", "assets"),
                    exclude=["**", f"!{SINGLE_ENTRYPOINT_FILENAME}"],
                )
            ],
            retain_on_delete=True,
        )
        single_entrypoint_deployment.node.add_dependency(asset_bucket_policy)

Possible Solution

No response

Additional Information/Context

No response

CDK CLI Version

2.149.0 (build c8e5924)

Framework Version

No response

Node.js Version

v18.18.2

OS

Amazon Linux 2

Language

Python

Language Version

3.9.18

Other information

No response

khushail commented 1 month ago

Hi @thvasilo , thanks for reaching out.

I tried to repro similar issue with a sample code, which reads files from 2 different folders and does the deployment into target bucket. here is a code snippet and Readme doc for reference


from aws_cdk import (
    # Duration,
    Stack,
    # aws_sqs as sqs,
    aws_s3_deployment as s3Deploy,
    aws_s3 as s3,
    RemovalPolicy,

)
from constructs import Construct

class BucketDeploymentIssueStack(Stack):

    def __init__(self, scope: Construct, construct_id: str, **kwargs) -> None:
        super().__init__(scope, construct_id, **kwargs)

        ##source_bucket = s3.Bucket.from_bucket_arn(self, "sourceBucket001","arn:aws:s3:::my-first-bucket-191")

        target_bucket = s3.Bucket(self, "targetBucket001", 
            removal_policy=RemovalPolicy.DESTROY,
            auto_delete_objects=True,
            block_public_access=s3.BlockPublicAccess.BLOCK_ALL
        )

        deployment = s3Deploy.BucketDeployment(self, "DeployZippedFolders",
            sources= [s3Deploy.Source.asset("./assetSource")],
            destination_key_prefix="folder",
            destination_bucket=target_bucket,
            retain_on_delete=True
        )

        deployment.add_source(s3Deploy.Source.asset("./assetSource02"))

This is the directory structure -

Screenshot 2024-07-17 at 3 51 33 PM

and this is snippet of copied resources into the bucket -

Screenshot 2024-07-17 at 3 42 46 PM

Hope this is what you are trying to do and this sample code solves your problem. Let me know if it does not work out or your usecase is different.

thvasilo commented 1 month ago

That worked yes! I create a single deployment and add two sources, instead of creating one deployment per file. I would have expected both cases to work, but this is a good workaround.

thvasilo commented 1 month ago

My guess is this happens because of step 3 from the readme

The custom resource invokes its associated Lambda function, which downloads the .zip archive, extracts it and issues aws s3 sync --delete against the destination bucket (in this case websiteBucket). If there is more than one source, the sources will be downloaded and merged pre-deployment at this step.

Question: if I had used an existing bucket for this, would running the deployment have deleted all other files on that bucket (assuming I had set it to upload to the bucket root as above). That seems quite dangerous

khushail commented 1 month ago

Yes, that seems correct. Just to verify , I ran this code on existing bucket(created as target bucket above which has the copied zipped files already) -

        target_bucket002 = s3.Bucket.from_bucket_arn(self, "targetBucket002", "arn:aws:s3:::bucketdeploymentissuestack-targetbucket001d2a3fb71-xdun4ggp3wrq")

        deployment = s3Deploy.BucketDeployment(self, "DeployZippedFolders",
            sources= [s3Deploy.Source.asset("./assetSource")],
            ##destination_key_prefix="folder",
            destination_bucket=target_bucket002,
            retain_on_delete=True
        )

        CfnOutput(self, "DeploymentOutput", value=target_bucket002.bucket_arn)

and after re-deploying from another stack, this is whats seen now in the target bucket. looks like other 2 zipped files are deleted with folder.

Screenshot 2024-07-18 at 1 28 15 PM

Let me check with team on this and get back to you