aws / aws-cdk

The AWS Cloud Development Kit is a framework for defining cloud infrastructure in code
https://aws.amazon.com/cdk
Apache License 2.0
11.71k stars 3.93k forks source link

S3 Bucket Deployments with content encoding based on file extension. #7090

Open hleb-albau opened 4 years ago

hleb-albau commented 4 years ago

Currently, there is contentEncoding?:string option in BucketDeployment construct (System-defined content-encoding metadata to be set on all objects in the deployment). It would be nice to have possibility to specify contentEncoding according to mapping for file extension. Example: for files with extension .br specify "Content-Encoding: br", for .gzip files - "Content-Encoding: gzip" and so on.

Use Case

We use s3+ cloudfront pair to serve static website. To provide better performance br files are server based on Accept-Encoding header, so our files have to copies (ex: index.html and **index.html.br*). Currently, we have to use aws cli to deploy differently encoded files with right headers. If BucketDeployment construct will support contentEncoding** encoding by file extension option, it would be more easy-to-go static hosting option.


This is a :rocket: Feature Request

iliapolo commented 4 years ago

Hi @hleb-albau

This is definitely an interesting use-case. The problem is that bucket deployments run:

aws s3 sync --delete --content-type=<content-type> {sourceDir} {targetBucket}

The command does not allow specifying different conte-types for different files. Splitting to different source directories won't work either because of the (necessary) --delete flag.

Are you having issues with anything other than br files? The aws cli actually determines the content-type of each individual file automatically, by delegating to python's standard library. However, support for br file extensions was only just added to CPython, and is not yet released outside of an alpha version.

One possible solution would be to have the bucket deployment lambda add a specific entry for br files to one of the known mime type linux files, which should make the cli detect it properly, voiding the need to pass Content-Type all-together.

This seems like the most pragmatic solution for now.

WDYT?

hleb-albau commented 4 years ago

Thanks for response!

Beside content-encoding header, which determines content compression(br, gz and etc) we also have content-type header.

Example for file util.js.br: content-type - application/javascript, content-encoding - br

Right now our deployments process runs as follows:

  1. use cdk bucket deployment to deploy all files.
  2. redeploy compressed files via cli with given flags
    aws s3 cp ./dist s3://{BUCKET_NAME} \
    --exclude="*" --include="*.js.br" \
    --content-encoding br \
    --content-type="application/javascript" \
    --cache-control "max-age=31536000" \
    --metadata-directive REPLACE --recursive

    So, I wonder if cli can detect both headers properly..

iliapolo commented 4 years ago

Hi @hleb-albau - Yeah, seems like there's no way around this.

We can probably support this use-case by doing what you did with exclude/include.

Stay tuned 👍

Thanks!

iliapolo commented 4 years ago

relates also to https://github.com/aws/aws-cdk/issues/4687

peabnuts123 commented 4 years ago

Just to add to this conversation, here is the script I am using to achieve this right now:

# Clear out / upload everything first
echo "[Phase 1] Sync everything"
aws s3 sync . "s3://${s3_bucket_name}" --acl 'public-read' --delete

# Brotli-compressed files
# - general (upload everything brotli-compressed as "binary/octet-stream" by default)
echo "[Phase 2] Brotli-compressed files"
aws s3 cp . "s3://${s3_bucket_name}" \
  --exclude="*" --include="*.br" \
  --acl 'public-read' \
  --content-encoding br \
  --content-type="binary/octet-stream" \
  --metadata-directive REPLACE --recursive;

# - javascript (ensure javascript has correct content-type)
echo "[Phase 3] Brotli-compressed JavaScript"
aws s3 cp . "s3://${s3_bucket_name}" \
  --exclude="*" --include="*.js.br" \
  --acl 'public-read' \
  --content-encoding br \
  --content-type="application/javascript" \
  --metadata-directive REPLACE --recursive;
Simonl9l commented 1 year ago

would also be good if the upload detected the file encoding and added a charset content type that defaulted to utf-8

andresionek91 commented 1 year ago

I solved it with a Custom Resource. This one changes the CacheControl but the logic is the same for other metadata.

s3_deployment = BucketDeployment(...

copy_object_changing_cache = cr.AwsSdkCall(
    service="S3",
    action="copyObject",
    parameters={
        "Bucket":bucket.bucket_name,
        "CopySource": f"{bucket.bucket_name}/remote.js",
        "Key": "remote.js",
        "MetadataDirective": "REPLACE",
        "CacheControl": "no-cache, no-store",
        "Metadata": {"object-hash": uuid4().hex[:8]}  # Important to trigger update in cloudformation
    },
    physical_resource_id=cr.PhysicalResourceId.of("ChangeObjectCacheControl"),
)

change_cache_role = iam.Role(
    scope=self,
    id="ChangeCacheRole",
    assumed_by=iam.ServicePrincipal(service="lambda.amazonaws.com"),
    inline_policies={
        "AllowCopyRemoteJs": iam.PolicyDocument(
            statements=[
                iam.PolicyStatement(
                    actions=[
                        "s3:PutObject", 
                        "s3:CopyObject", 
                        "s3:GetObject", 
                        "s3:DeleteObject"
                    ],
                    resources=[bucket.arn_for_objects("remote.js")],
                    effect=iam.Effect.ALLOW,
                ),
            ],
        )
    },
)

change_cache = cr.AwsCustomResource(
    scope=self,
    id="ChangeObjectCacheControl",
    role=change_cache_role,
    on_create=copy_object_changing_cache,
    on_update=copy_object_changing_cache,
)
change_cache.node.add_dependency(s3_deployment)
bitblit commented 5 months ago

Just throwing in my particular use-case (although this is certainly solvable using other means, including the ones already on this ticket)

Many single page app frameworks (Angular, React, etc) build in such a way that the various JS and CSS resources have hashed names and can therefore be cached roughly forever, but the root document (typically index.html) will be the same and therefore should have a very short or empty cache length during times of heavy development. Being able to mark index.html to have a max-age of 0 and everything else to have a much longer cache age could allow a downstream Cloudfront distro to use the S3 cache headers, if they could be set this way.