aws / aws-cdk

The AWS Cloud Development Kit is a framework for defining cloud infrastructure in code
https://aws.amazon.com/cdk
Apache License 2.0
11.61k stars 3.9k forks source link

cdk: Breaking change to S3 bucket server access logging (v2.59.0) #23547

Closed viaferreirj closed 1 year ago

viaferreirj commented 1 year ago

Describe the bug

The latest release (v2.59.0) caused a breaking change within our CDK app for already existing buckets with server access logging enabled. The buckets are using a shared bucket as a target for logs. In version v2.8.0, this caused no issue when deploying with CDK. However, since this release our CDK deployment fails early with the following error: RuntimeError: Cannot enable log delivery to this bucket because the bucket's ACL has been set and can't be changed

Expected Behavior

In v2.8.0, a CDK deployment with the provided bucket properties will not fail deployment. We observed ACLs not impacting an ability of the source bucket to write logs to the server access log target bucket.

Current Behavior

Deployment of the app fails with the error:

Traceback (most recent call last):
  File "/codebuild/output/src389656955/src/slingshot/cdk/app.py", line 92, in <module>
    s3_qa_buckets_stack = S3QaBucketsStack(
  File "/root/.pyenv/versions/3.9.12/lib/python3.9/site-packages/jsii/_runtime.py", line 111, in __call__
    inst = super().__call__(*args, **kwargs)
  File "/codebuild/output/src389656955/src/slingshot/cdk/stacks/s3_qa_buckets_stack.py", line 31, in __init__
    qa_source_files_bucket = s3.Bucket(
  File "/root/.pyenv/versions/3.9.12/lib/python3.9/site-packages/jsii/_runtime.py", line 111, in __call__
    inst = super().__call__(*args, **kwargs)
  File "/root/.pyenv/versions/3.9.12/lib/python3.9/site-packages/aws_cdk/aws_s3/__init__.py", line 16759, in __init__
    jsii.create(self.__class__, self, [scope, id, props])
  File "/root/.pyenv/versions/3.9.12/lib/python3.9/site-packages/jsii/_kernel/__init__.py", line 336, in create
    response = self.provider.create(
  File "/root/.pyenv/versions/3.9.12/lib/python3.9/site-packages/jsii/_kernel/providers/process.py", line 363, in create
    return self._process.send(request, CreateResponse)
  File "/root/.pyenv/versions/3.9.12/lib/python3.9/site-packages/jsii/_kernel/providers/process.py", line 340, in send
    raise RuntimeError(resp.error) from JavaScriptError(resp.stack)
RuntimeError: Cannot enable log delivery to this bucket because the bucket's ACL has been set and can't be changed

Reproduction Steps

Here is the definition of our target access logs bucket:

access_logs_bucket = s3.Bucket(
    scope=self,
    id='accessLogsS3Bucket',
    bucket_name='access-logs-bucket',
    access_control=s3.BucketAccessControl.BUCKET_OWNER_FULL_CONTROL,
    block_public_access=s3.BlockPublicAccess.BLOCK_ALL,
    encryption=s3.BucketEncryption.S3_MANAGED,
    object_ownership=s3.ObjectOwnership.BUCKET_OWNER_ENFORCED,
    public_read_access=False,
    removal_policy=RemovalPolicy.RETAIN,
    versioned=True
)

And this is the definition of the bucket failing the deployment:

s3.Bucket(
    scope=self,
    id='sourceFilesS3Bucket',
    bucket_name='qa-bucket',
    access_control=s3.BucketAccessControl.BUCKET_OWNER_FULL_CONTROL,
    block_public_access=s3.BlockPublicAccess.BLOCK_ALL,
    encryption=s3.BucketEncryption.S3_MANAGED,
    object_ownership=s3.ObjectOwnership.BUCKET_OWNER_ENFORCED,
    public_read_access=False,
    removal_policy=RemovalPolicy.RETAIN,
    server_access_logs_bucket=access_logs_bucket,
    server_access_logs_prefix='qa-bucket/serverAccessLogging_',
    versioned=False
)

Possible Solution

No response

Additional Information/Context

No response

CDK CLI Version

v2.45.0

Framework Version

2.59.0

Node.js Version

16.13.0

OS

Windows

Language

Python

Language Version

3.8.0

Other information

No response

peterwoodworth commented 1 year ago

We did recently make changes to this in 2.59.0 in this pr, however this code should still fail in 2.8.0 because the check for access_control type on the bucket was present in that version as well. I was able to reproduce this bug in TypeScript on 2.8.0. The following is code from 2.8.0:

https://github.com/aws/aws-cdk/blob/8a5eb49dc12d9c7b8dbd7e6a6d4ba90c9beb40bf/packages/%40aws-cdk/aws-s3/lib/bucket.ts#L1695-L1697

https://github.com/aws/aws-cdk/blob/8a5eb49dc12d9c7b8dbd7e6a6d4ba90c9beb40bf/packages/%40aws-cdk/aws-s3/lib/bucket.ts#L2045-L2051

So, I'm curious how you were able to succeed at all with this in 2.8.0, because it shouldn't have worked then with the code you provided. @kylelaker I'd appreciate your set of eyes on this, you might have some insight here

laurelmay commented 1 year ago

@peterwoodworth I have also tested this on Python with v2.8.0 of the library and v2.8.0 of the CLI and I get the same error as shown for v2.59.0. This code has been in the CDK since v1.21.0 in #5072.

I am wondering if there's an issue with the instanceof check here; possibly because one of is actually referenced via some kind of import or fromXxx or is it possible that @viaferreirj has two versions of the CDK library installed somehow and that the instanceof check is failing (like in the issues referenced below) and the else is then passing and causing ACLs to now be set because the feature flag is also disabled. On v2.8.0 without the else if, in these scenarios, the issue would be silently ignored and no changes would be made to the ACL (and if the escape hatch was used, then) log delivery would succeed via Bucket Policy instead of via ACL, making this unnoticed.

This is purely an assumption based on #19448, #20564, and #15580. But instanceof seems fragile here.

The else probably should be if (!props.serverAccessLogsBucket && props.serverAccessLogsPrefix) to be safer but the instanceof probably needs to be removed as well. I can fix the else if but I am not sure I feel comfortable doing the magic required to remove instanceof.

So to list this out, I think this is because:

  1. The instanceof for some reason fails, causing the else to be triggered
  2. In v2.8.0 there is no else, the error is ignored; in v2.59.0 there is an else, and serverAccessLogsPrefix is set (and so is serverAccessLogsBucket)
  3. The Feature Flag is disabled (expected because this is an upgrade) so the ACL is attempted to be set on this/self which has an ACL (so does the target bucket but we passed this/self so that's causing the issue here)
  4. The reproduction has an ACL set, and that throws the error.

I will open a PR to fix the else if but I don't think that's a complete solution.

viaferreirj commented 1 year ago

I see the error in my reporting. The affected bucket is indeed applying ~an ACL~ a bucket policy with add_to_resource_policy. Apologies for omitting that. The ACL addition seems to make the difference as originally outlined in the reproduction steps.

I saw mention of a possible duplication of installed CDK versions, however I can verify that our build system is installing CDK at build time. CodeBuild is being used, so the environment is guaranteed to not be caching additional versions.

viaferreirj commented 1 year ago

I removed the property access_control=s3.BucketAccessControl.BUCKET_OWNER_FULL_CONTROL from the buckets and can confirm that access logging was not affected permission-wise. This also corrects the new error seen in v2.59.0.

I leave it up to your team to decide how to proceed with the difference in behavior when this property is present.

andresionek91 commented 1 year ago

We are having a similar error here. We are setting up both of the following properties:

Our Access logs bucket is a separate bucket. However, CDK behaviour is adding ACLs to enable log delivery to the current bucket.

The same happens when setting the feature flag "@aws-cdk/aws-s3:serverAccessLogsUseBucketPolicy": true. It adds a policy to enable log delivery to the current bucket.

The expected behaviour is that if server_access_logs_bucket is set, no ACLs or Bucket Policies should be added to the bucket.

It started happening when we upgraded from aws-cdk-lib version 2.55.1 to 2.59.0

andresionek91 commented 1 year ago

The problem seems to be in this logic here https://github.com/aws/aws-cdk/pull/23386/files#diff-97a6344bb43117c16441333d96eede412c2c7f7df034f88bbebb90b151eca42dR1835 introduced in the PR you mentioned @peterwoodworth

    if (props.serverAccessLogsBucket instanceof Bucket) {
      props.serverAccessLogsBucket.allowLogDelivery(this, props.serverAccessLogsPrefix);
    } else if (props.serverAccessLogsPrefix) {
      this.allowLogDelivery(this, props.serverAccessLogsPrefix);
    }

I think the fix here would be something like this:

    if (props.serverAccessLogsBucket instanceof Bucket) {
      props.serverAccessLogsBucket.allowLogDelivery(this, props.serverAccessLogsPrefix);
    } else if (props.serverAccessLogsPrefix) and not (props.serverAccessLogsBucket instanceof Bucket)  {
      this.allowLogDelivery(this, props.serverAccessLogsPrefix);
    }
laurelmay commented 1 year ago

Hi @andresionek91! That is almost exactly the fix implemented in #23552. There's an additional fix in that PR as well for other import scenarios (and tests added to hopefully avoid future regressions). Hopefully it gets reviewed by the CDK team soon

andresionek91 commented 1 year ago

Great news! Thanks a lot!

github-actions[bot] commented 1 year ago

⚠️COMMENT VISIBILITY WARNING⚠️

Comments on closed issues are hard for our team to see. If you need more assistance, please either tag a team member or open a new issue that references this one. If you wish to keep having a conversation with other community members under this issue feel free to do so.