aws / aws-cdk

The AWS Cloud Development Kit is a framework for defining cloud infrastructure in code
https://aws.amazon.com/cdk
Apache License 2.0
11.61k stars 3.9k forks source link

aws-s3-deployment: Transfer asset without extracting contents of .zip files #8065

Closed baer closed 1 year ago

baer commented 4 years ago

Allow the S3 Deployment module to transfer assets to a new bucket without extracting the contents.

Use Case

Some AWS services work natively with zip files. An example is Elastic Beanstalk's SourceBundle. Today, in order to work with these services you must:

All of these are viable, but clumsy next to the fantastic S3 Deployment module.

Proposed Solution

Add a property to BucketDeploymentProps, that disables this line of code in the deployment lambda.


This is a :rocket: Feature Request

iliapolo commented 4 years ago

Hi @baer

Just to make sure I fully understand, are you saying you want to zip a local directory and transfer it in its zipped to a bucket? What do you mean by:

I would imagine the the first two steps should be sufficient.

baer commented 4 years ago

Yes, that's right. I'd like to:

  1. Zip a directory
  2. Send that directory to CDK's asset bucket as a zip file
  3. Use aws-s3-deployment to move that zip file to a new bucket without unzipping it.

This is basically what eb deploy does in the Elastic Beanstalk CLI, but could be generally useful for all sorts of things.

By "use an event to re-zip files after transfer" I meant that it would be possible to do this by creating a lambda + event notification on the target bucket and create a zipped version of the files that aws-s3-deployment adds. It's a bad idea. I was just trying to illustrate that it's currently difficult to use a zip file with CDK unless you want it to be expanded.

iliapolo commented 4 years ago

@baer Got it. Thanks.

My concern here is that the fact CDK even zips up the asset directory, is somewhat an internal implementation detail. Notice that when a user defines a directory as an asset, no where does it say that the asset is going to be zipped:

new s3deploy.BucketDeployment(this, 'DeployWebsite', {
  sources: [s3deploy.Source.asset('./website-dist')],
  destinationBucket: websiteBucket,
 });

The intent of the user here is to copy the directory from the local machine to the destination bucket. Incidentally, CDK implements this by zipping up and extracting. But that could also have implemented in a different way.

Having said that, i'm not strictly opposed to adding such functionality, but if we do it naively, it might cause confusion. For example, say we did something like:

new s3deploy.BucketDeployment(this, 'DeployWebsite', {
  sources: [s3deploy.Source.asset('./website-dist')],
  destinationBucket: websiteBucket,
  extract: false
});

That would be unclear, because the property extract: false only makes sense to users who know exactly how the internal mechanism works. For others, I believe, it would be unclear.

Of the top of my head, perhaps we can do something like:

new s3deploy.BucketDeployment(this, 'DeployWebsite', {
  sources: [s3deploy.Source.zip('./website-dist')],
  destinationBucket: websiteBucket,
});

The Source.zip will zip up the asset, and wrap it in a directory. This API conveys the intent of the user to treat the directory as a zip.

What do you think? Would you like to continue the discussion on the API and take a stab at implementing it?

baer commented 4 years ago

Your concern is valid, and I'm very much on-board with trying to hide implementation details! I was confused about how Assets are supposed to be used when I was first learning CDK, so I'd love to help clarify this for others. Let's talk docs and communication first and the actual code second.

Docs

The docs around assets don't line up with this intent in this issue. The three asset docs (s3-asset, s3-deployment, and the CDK User Guide on Assets) all say or imply different stuff.

aws-s3-asset - Construct Docs

This doc says that an Asset is a thing that your stack needs and that it will be deployed to an S3 bucket. It then goes on to explain how this works, including info about zip and packaging. A few relevant quotes are below.

"when the stack is deployed, the toolkit will package the asset (i.e. zip the directory)"

Explaining the implementation details runs counter to your stated desire to treat asset-deployment, and asset by proxy, as a black box.

"When an asset is defined in a construct, a construct metadata entry aws:cdk:asset is emitted with instructions on where to find the asset and what type of packaging to perform (zip or file)."

This sort of implies the flexibility that I'm suggesting despite it not being possible.

aws-s3-deployment - Construct Docs

This doc says that an s3-deployment is a way to move files/folders, often Assets, into S3 buckets. It has some extensive explanation about how it works, including a bit that gives a specific CLI statement used in the implementation (aws s3 sync --delete against the destination bucket), which also runs counter to the idea of s3-deploy as a "black box."

One particular confusion about this doc is that the examples seem to imply that the bucket used for Assets is NOT where your assets should live long-term. Instead, you should use the Asset construct to upload your files initially, and the s3-deployment to move them to a final home. If this is what AWS' best practice is, I think it would be good to add a note to the Asset docs that say something like "Assets should be moved to a permanent bucket as a part of your stack. See s3-deployment."

CDK User Guide

Finally, there is the User Guide. This seems to imply that Assets shouldn't really be used directly at all, much less as a place to store things long-term by saying that "You typically reference assets through APIs that are exposed by specific AWS constructs."

There are similar disclaimers in several of the CDK construct docs that say "You will probably use ___ with the Construct.addX method, rather than directly." BasePathMapping is an example of this. If it's the intention of the CDK team that, over time Assets will mostly be used via the helper methods like lambda.Code.fromAsset, it would be good to add that to the docs.

Code

It's hard to know what to implement without first understanding the direction the team has for Assets. An additional option would be to consider this a corner case and expose elasticBeanstalk.Code.fromAsset once that construct does more than the Cfn generated code it is today.

iliapolo commented 4 years ago

Hi @baer - Sorry for the delayed response.

Thanks for taking the time and detailing the confusion and intent around asset usage in the CDK. It definitely sounds like its worth some clarification and alignment.

Specifically to focus on this use case, I think the elasticBeanstalk.Code.fromAsset you suggested is actually pretty nice. This does seem appropriate since it conveys Elasticbeanstalk's ability to work with zipped files.

Are you still up for maybe pitching in here? Let me know and we can come up with the final API together and then you can hack away :)

Thanks!

baer commented 4 years ago

No problem. It looks like, as of right now, Elastic Beanstalk does not have a module in CDK beyond the auto-generated Cfn-prefixed resources. I haven't looked at that code but, I'm assuming that to create something like elasticBeanstalk.Code.fromAsset, somebody would have to scaffold out the initial boilerplate of the non-generated elastic beanstalk module. Since this could have a large-ish code volume and isn't on the roadmap, is this even a PR y'all would accept?

iliapolo commented 4 years ago

@baer for sure. It’s probably gonna take a bit longer but if you’re up for it we would gladly accept it.

If you’re willing, we can hash out the MVP for this API by first creating a small README.

iliapolo commented 4 years ago

@baer P.S regarding the scaffolding, you can have a look at our example construct library package to get a sense of what it should look like.

kgunny commented 4 years ago

I was able to create a Node EBS App with Version and Env using python CDK 1.54.0. So EBS is supported by the CDK, and i ran into this very issue trying to setup the EBS App Version using the source_bundle property. That CDK part is fine, by when attempting to upload the zip file for the version using aws_s3_deployment, the zip file contents were extracted, where as EBS App Version source_bundle property needs the zip file un-extracted in the bucket. ebs_version_bucket = aws_s3.Bucket.from_bucket_name(self, 'ebs_version_bucket', ebs_version_bucket_name) ebs_version_bucket_deployment = s3_deployment.BucketDeployment( self, "ebs_version_bucket_deployment", sources=[s3_deployment.Source.asset("my-app-version-1.zip")], destination_bucket=ebs_version_bucket, destination_key_prefix='my-app-versions/my-app-version-1.zip' ) my-app-version-1.zip ends up being a folder on S3.

kgunny commented 4 years ago

Just to confirm, placing the zip in a folder, then making the folder the asset works in that the sub level zip file is not extracted.

iliapolo commented 4 years ago

@kgunny Thanks for the info 👍

IngussNeilands commented 3 years ago

Would be nice to have this, very much needed for lambda zip assets

nbaillie commented 3 years ago

Also looking for this; to send code from git export into S3 to use as a CodePipeline Source. I had looked at doing it direct from an asset but the ToolKit Bucket had no versioning enabled and is therefor not supported by CodePipeline. So wanted to move to seeding the source with aws-s3-deploy, the unzipping was not intuitive..

I can confirm that the bellow worked for me.

const sourceBucket = new Bucket(this, 'sourceBucket', {
      versioned: true
    });

new BucketDeployment(this, 'sourceBucketDeploy', {
      sources: [Source.asset({PATH_FOR_FOLDER_WITH_ZIP_INSIDE})],
      destinationBucket: sourceBucket,
      retainOnDelete: false
    });

there may be cases where extract is useful for me though so might be good to have a flag as was discribed above. extract: false

or more for my case have an AssetSourceAction added to @aws-cdk/aws-codepipeline-actions would have been easy to use, and abstract all this.. however might be that this is an edge case.

cmoore1776 commented 3 years ago

This is also needed beyond Elastic Beanstalk. For example, depoying ansible playbooks for use with SSM (see https://aws.amazon.com/blogs/mt/keeping-ansible-effortless-with-aws-systems-manager/).

I'd like to keep a directory structure in source for my playbooks (without zipping) then have CDK zip and deploy to S3 WITHOUT extracting, so they can be referenced in the CreateAssociation call (currently implemented in CDK as aws_cdk.aws_ssm.CfnAssociation

McDoit commented 3 years ago

I'm using double ziping as a workaround with my usecase of using cdk to create stacksets

I have my pre-zipped assets in a folder which then gets zipped by the bucket deployment, but would be the same by zipping manually and pointing the bucket deployment on that zip

jtmichelson commented 3 years ago

AWS MWAA plug-ins are also expected to be zipped, definitely looking forward to this improvement on aws-s3-deployment to deploy a directory as a zip. :)

tyildirim24 commented 3 years ago

Same with Lambda function binaries for Golang.

edgar-slalom commented 2 years ago

I have a similar use case for AWS Glue which uses a Python function which itself it's calling (importing modules). AWS Glue can't see the modules unless I zip them and upload them to an s3 bucket. The way it works now - I have to manually zip up the modules/folder myself.

Example Project Structure

s3://bucket/make_car.py
s3://bucket/modules.zip

As of now to create the modules.zip, I have to manually zip them. I wish the BucketDeployment had an automated way to zip up a directory and have that be deployed to the destination_bucket.


s3_deployment.BucketDeployment(
    self,"my_s3_deployment",
    sources=[s3_deployment.Source.asset("../glue")],
    zip_this_local_dir="../glue/modules/" ... 
)
github-actions[bot] commented 2 years ago

This issue has received a significant amount of attention so we are automatically upgrading its priority. A member of the community will see the re-prioritization and provide an update on the issue.

wanjacki commented 2 years ago

Are we opposed to this solution proposed by iliapolo?

new s3deploy.BucketDeployment(this, 'DeployWebsite', {
  sources: [s3deploy.Source.zip('./website-dist')],
  destinationBucket: websiteBucket,
});

If not I can go ahead and implement it. @TheRealAmazonKendra Let me know your thoughts as well as we need this feature(as does others looking at this thread) and are willing to prioritize and implement it ourselves.

TheRealAmazonKendra commented 2 years ago

I've asked @iliapolo to take another look at this to confirm.

iliapolo commented 2 years ago

Looking back at this issue, it looks like the prevalent use-case described here is:

1) I have a local directory, and I need to reference it in another service in zip form

And it seems like we haven't mentioned that this type of use-case is already handled in the CDK. So I want to make sure we are aligned. Usually this is done via support from the service in question, for example the lambda.Code class when configuring the source for a lambda function. This is the recommended path.

However, even if the service construct doesn't offer it explicitly, it can still be achieved by using regular assets, no need for the s3-deployment module. For example:

const app = new s3Assets.Asset(this, 'App', {
  path: path.join(__dirname, 'app'),
});

new bean.CfnApplicationVersion(this, 'Version', {
  applicationName: 'app',
  sourceBundle: {
    s3Bucket: app.s3BucketName,
    s3Key: app.s3ObjectKey,
  }
});

With this code, the zipped up app directory remains in the CDK asset bucket for the entirety of the application. Note that if your use is:

2) I have a pre-packaged zip, and I need to reference it in another service in zip form

you can do pretty much the same:

const app = new s3Assets.Asset(this, 'App', {
  path: path.join(__dirname, 'app.zip'),
});

new bean.CfnApplicationVersion(this, 'Version', {
  applicationName: 'app',
  sourceBundle: {
    s3Bucket: app.s3BucketName,
    s3Key: app.s3ObjectKey,
  }
});

@wanjacki Does this address your use-case? or do you have some restrictions that force moving the zipped up app directory to a different bucket?

The use case currently unsupported is:

I have a local directory, and I need to place it in a custom bucket in zip form

For this use-case, the workaround is to do the pre-packaging yourself and fallback to "2)". The solution we proposed still has its quirks, and i'm not fully onboard with it. Would be interested to know what exact use-case are you trying to solve.

wanjacki commented 2 years ago

@iliapolo Yes we need are trying to implement: #20690.

Essentially we need to share this asset with another account meaning we need to give permissions to another user to read that bucket and sharing the entire CDK asset bucket is not ideal (There is also an encryption on the CDK asset bucket I believe). Customers have also specified they want to define their own bucket in which we will place all the assets that need to be shared.

This use-case falls under the one that is currently unsupported exactly as you say: I have a local directory, and I need to place it in a custom bucket in zip form. We do have a restriction that requires us to move the zip file to a different bucket.

The workaround of double zipping should work for us, but it seems hacky, so we thought it made more sense to have actual support for this. It seems like the two ways to support this is to allow a custom location for CDK Asset or having s3-deployment not unzip to destination bucket or even support for CDK Asset to be shared cross account.

Please advise if you think we should continue persuing support for this (I believe it will be useful for customer and future customers will no doubt also run into this issue again for their use case) or go ahead with our feature by implementing it with the workaround. We do have a deadline of 10/05 we are trying to meet for #20690.

If curious or would like even more context, there is a draft PR we have been working on for our feature with lots of insightful comments: https://github.com/aws/aws-cdk/pull/21508

iliapolo commented 2 years ago

@wanjacki I see, thanks for sharing. Your use-case sounds legit.

So as far as the solution goes, the more I think about it the less I like my proposal of Source.zip. The problem is the semantic inconsistency it creates, because the function names are supposed to refer to the input, not the side-effect.

I was initially opposed adding an extract property because I felt it exposes implementation details that won't be clear to users. However, looking at the Source API I know think an extract property does have merit.

For example, in the case of Source.bucket, the user is expected to provide a zip object key. It makes perfect sense to add an extract: false option instructing the deployment to keep the zip object as is in the destination bucket:

new s3Deploy.BucketDeployment(this, 'Deployment', {
  sources: [s3Deploy.Source.bucket(someBucket, 'archive.zip')],
  destinationBucket: bucket,
  extract: false, // don't extract archive.zip
});

I do think its less clear in the case of configuring Source.asset that points to a directory:

new s3Deploy.BucketDeployment(this, 'Deployment', {
  sources: [s3Deploy.Source.asset(path.join(__dirname, 'directory'))],
  destinationBucket: bucket,
  extract: false, // don't extract...what exactly?
});

However, the pragmatic thing to do here is add this property.

Also going back to @baer comment that rightfully points out we already kind of expose that assets are stored in zip format in the bootstrap bucket.

In any case, your PR seems good to me (but please rename unzipFile to extract, just more concise).

A note about locations

When extract is turned off, the resulting object in the destination bucket will have a name in the form of: asset.<asset-hash>.zip, regardless of the original file / directory you provided. This means that users will have a hard time constructing a location to that file, because they don't have access to the underlying asset hash. To solve this, I suggest adding a method to BucketDeployment that gives the location of a source in the destination.

const myApp = s3Deploy.Source.asset(path.join(__dirname, 'app'));
const deployment = new s3Deploy.BucketDeployment(this, 'Deployment', {
  sources: [app],
  destinationBucket: bucket,
  extract: false, // don't extract...what exactly?
});

const myAppLocation = deployment.s3Location(myApp);

The asset hashes themselves are already available in the BucketDeployment construct:

https://github.com/aws/aws-cdk/blob/809e1b0d5dc29be02f95ea4361b6f87f94325f3d/packages/%40aws-cdk/aws-s3-deployment/lib/bucket-deployment.ts#L348

But you'll need to figure out a way to map those to their original sources. Another option would be to somehow expose the underlying asset so that users could call asset.assetHash and manually construct the location. And the final option is to ignore this use-case for now, and assume users are only interested in the file getting to the bucket, and they don't need its specific location.

@TheRealAmazonKendra for you consideration.

wanjacki commented 2 years ago

@iliapolo Thanks for getting back to us and for the very detailed and informative response. We will go ahead and reopen that PR for review then with property renamed to extract. As for the location part, there is very likely a use case for this, but it is not particularly useful for our use case so we can ignore it for now as we already have have access to the hash (generated by Asset) and can construct the file location.

corymhall commented 1 year ago

Looks like this was fixed in #21805

github-actions[bot] commented 1 year ago

⚠️COMMENT VISIBILITY WARNING⚠️

Comments on closed issues are hard for our team to see. If you need more assistance, please either tag a team member or open a new issue that references this one. If you wish to keep having a conversation with other community members under this issue feel free to do so.