aws / aws-cdk

The AWS Cloud Development Kit is a framework for defining cloud infrastructure in code
https://aws.amazon.com/cdk
Apache License 2.0
11.56k stars 3.87k forks source link

(aws-s3): bucket policy fails to create when bucket:arn is not yet available #28659

Open biffgaut opened 8 months ago

biffgaut commented 8 months ago

Describe the bug

A dependency issue between S3 Buckets and Bucket Policies in the L2 Bucket class allows the Policy to access the arn of the bucket before it is available, causing the creation of the Bucket Policy to fail. Being a dependency issue, this is an intermittent issue and works correctly the vast majority of the time. When it fails, simply relaunching the stack usually works.

Expected Behavior

The L2 Bucket construct should launch successfully every time.

Current Behavior

testPolicy9D625504

CREATE_FAILED

Unable to retrieve Arn attribute for AWS::S3::Bucket, with error message Bucket not found

Reproduction Steps

I created a simple CDK app with this code:

import * as cdk from 'aws-cdk-lib';
import { Construct } from 'constructs';
import * as s3 from 'aws-cdk-lib/aws-s3';

export class BucketPolicyDependencyStack extends cdk.Stack {
  constructor(scope: Construct, id: string, props?: cdk.StackProps) {
    super(scope, id, props);

    new s3.Bucket(this, 'test', {
      removalPolicy: cdk.RemovalPolicy.DESTROY,
      autoDeleteObjects: true
    })
  }
}

I then set up a bash script that launched it 40 times, essentially simultaneously:

export constructs="
// Put any 30 values here, I just used 30 integers
"
for iteration in $constructs; do
  export STACK_NAME=stresstest$iteration
  cdk deploy -o stress$iteration --require-approval never &
done

On 1 of the 30 I saw the error I reference above.

Possible Solution

If I am interpreting the behavior correctly, it seems that adding a Dependency on the Bucket to the BucketPolicy in the L2 Construct would prevent the Policy from trying to access the bucket before it is ready. Perhaps here? https://github.com/aws/aws-cdk/blob/3318a38a6092275d461ef3549f3b92cd0d040c18/packages/aws-cdk-lib/aws-s3/lib/bucket.ts#L651

Additional Information/Context

We've seen it in several of our constructs (and newer versions of the CDK than what I cite below for the test above). Someone also mentioned they have seen it in aws-codepipline.

CDK CLI Version

2.108.0

Framework Version

2.108.0

Node.js Version

20.9.0

OS

MacOS Ventura 13.6.3

Language

TypeScript

Language Version

Typescript 5.2.2

Other information

Versions cited are for the test I cited, but it's been seen in other versions as well.

pahud commented 8 months ago

Unfortunately I can't reproduce this for a few attemps

export class Demo extends DemoStack {
    constructor(scope: Construct, id: string, props: StackProps) {
        super(scope, id, props);

            new s3.Bucket(this, 'test', {
                removalPolicy: RemovalPolicy.DESTROY,
                autoDeleteObjects: true

        })

    }
}

app.ts

for (let i=0; i<30; i++) {
    new Demo(app, `demo${i}stack`, { env });
}

And I deploy with

npx cdk deploy --all --require-approval never --concurrency 30

I didn't see any error after a few attempts.

Can you try it again?

github-actions[bot] commented 8 months ago

This issue has not received a response in a while. If you want to keep this issue open, please leave a comment below and auto-close will be canceled.

biffgaut commented 8 months ago

Working on replicating again. I'm at a loss for a method to recreate it deterministically - it appears to be triggered by the S3 Create Bucket being a bit slow. I'm going to try to set up a test that just keeps repeating the stress test indefinitely, hoping to catch the slower S3 behavior when it occurs.

knihit commented 8 months ago

I am facing the exact issue as well. It seems that cloudformation tries to create the bucket policy before the bucket creation is complete. Its inconsistent but saw it a few times in the last 2-3 weeks.

biffgaut commented 8 months ago

I was able to recreate it. I set set up an infinite loop that launched and destroyed 40 stacks in an infinite loop. Started the loop at around 11:30 AM and finally saw the issue recur at 11:34 AM. As I said, it is intermittent...

image

JamesButtress commented 8 months ago

I am also facing a similar issue. Seems to be happen intermittently and started becoming an issue just before Christmas. Note the buckets (and stacks they are in) haven't been changed for a few months, so seems like a fairly new problem.

biffgaut commented 8 months ago

Talking to some coworkers, our theory is that the issue is not CDK per se - that a change in CloudFormation led to CloudFormation ceasing to recognize the dependency of the policy on the bucket from the context of the template (I'm running my tests using the generated template rather than the CDK program to confirm this).

If this is the case, then the issue is not necessarily within the CDK - but an update to the S3 Bucket construct to explicitly set the dependency would smooth over the CFN issue.

kedbirhan commented 8 months ago

i am having the same issue with just creating a bucket with an access policy as well.

 const logBucket = new Bucket(
            this,
            ${config.kitName}-alb-logs-bucket,
            {
                blockPublicAccess: BlockPublicAccess.BLOCK_ALL,
                removalPolicy:RemovalPolicy.DESTROY,
                autoDeleteObjects: true
            }

        )

Unable to retrieve Arn attribute for AWS::S3::Bucket, with error message Bucket not found

biffgaut commented 8 months ago

This is confirmed to be a CloudFormation issue. The word from AWS is:

Due to a recent change in internal workflow of CloudFormation, our development teams have identified an issue that can cause this error intermittently. They are currently working on deploying a fix for the same.

So it seems that there's no change to CDK needed, that for the moment we just retry after a failure and it clear up entirely - hopefully soon.

whennemuth commented 8 months ago

I am seeing this issue myself quite frequently. As with everyone else who have commented, this is a new behavior that was not occurring before.

I am using the CDK BucketDeployment, which automatically generates a parallel construct containing a lambda function, IAM role and policy. It is the policy that is trying to reference the arn of the bucket with Fn::GetAtt in the synthesized output. This seems to be failing about 50% if the time. I can cope with this by retrying the stack creation and cloudformation will simply start where it left off and complete the rest of the way.

biffgaut, can you reference where you found the AWS issue being reported? This is something I would want to monitor (and possibly bug them about - it's a pain).

Thanks.

biffgaut commented 8 months ago

That message was from an internal ticket here at AWS - there isn't any further info available at the moment. I have not seen this issue referenced online anywhere but here, which is shocking to me as it has occurred on several workloads managed by our team so I would assume the impact is bigger than the few people monitoring this issue.

dale-vendia commented 8 months ago

As an FYI this has happened ~60 times in the last 60 days so @biffgaut you're not alone here.

We are also running into this issue with lambda function roles, I suspect it's not* isolated to bucket policies.

whennemuth commented 8 months ago

I opened a support ticket with the AWS cloudformation team. They repeated to me the same thing they did to biffgaut. They did say this was a high priority issue, so I'd like to think the resolution is imminent. Support tickets are not allowed to be left open for more than 10 days for known bugs, but the AWS support rep did tell me that I could contact my organizations AWS account rep to ping me when the bug is fixed, or possibly the ticket might remain open until the fix is in because I asked for it to be. In any event, it looks like I will get notified somehow. When I do, I'll update this issue.

shwetajoshi601 commented 7 months ago

I am also facing the same problem. It is really annoying as it is hampering deployments. Has anyone figured out a workaround?

davidpintotrusst commented 7 months ago

I am also experiencing the same issue.

abdulkadirdere commented 7 months ago

Work Around the Issue for now: Option 1:

Option 2:

jshaw-decides commented 6 months ago

Happening again yall...

jshaw-decides commented 6 months ago

Hi so if you're running into this issue running a static site out of an s3 bucket via cloudfront you can split the code into 2 stacks for a more reliable CI/CD process.

Bucket Stack:

 /**
     * Content bucket
     */
    new s3.Bucket(this, 'SiteBucket', {
      bucketName: `${buildDomain(props.domainSegments)}`,
      websiteIndexDocument: 'index.html',
      websiteErrorDocument: 'index.html',
      // publicReadAccess: true,
      // autoDeleteObjects: true,
      // accessControl: BucketAccessControl.PUBLIC_READ,
      /**
       * The default removal policy is RETAIN, which means that cdk destroy will not attempt to delete
       * the new bucket, and it will remain in your account until manually deleted. By setting the policy to
       * DESTROY, cdk destroy will attempt to delete the bucket, but will error if the bucket is not empty.
       */
      // removalPolicy: cdk.RemovalPolicy.DESTROY, // NOT recommended for production code
    });

Distro Stack (with domain stuff):

/**
     * Hosted zone
     */
    const zone = route53.HostedZone.fromLookup(this, 'Zone', {
      domainName: props.domainSegments.domain,
    });
    new cdk.CfnOutput(this, 'URL', {
      value: `https://${util.buildDomain(props.domainSegments)}`,
    });

    /**
     * TLS certificate
     */
    const certificate = new acm.Certificate(this, 'Certificate', {
      domainName: `${util.buildDomain(props.domainSegments)}`,
      validation: acm.CertificateValidation.fromDns(zone),
    });

    new cdk.CfnOutput(this, 'CertificateOutput', {
      value: certificate.certificateArn,
    });

    const oai = new cloudfront.OriginAccessIdentity(this, 'OAI');
    const bucket = s3.Bucket.fromBucketName(
      this,
      'StaticSiteBucket',
      `${util.buildDomain(props.domainSegments)}`
    );

    bucket.grantPublicAccess();
    const bucketPolicy = new s3.BucketPolicy(this, 'BucketPolicy', {
      bucket,
    });

    // Grant public access through the bucket policy
    bucketPolicy.document.addStatements(
      new iam.PolicyStatement({
        actions: ['s3:GetObject'],
        resources: [bucket.arnForObjects('*')],
        principals: [
          new iam.CanonicalUserPrincipal(
            oai.cloudFrontOriginAccessIdentityS3CanonicalUserId
          ),
        ],
      })
    );
    new cdk.CfnOutput(this, 'SiteBucketOutput', { value: bucket.bucketName });

    /**
     * Cloudfront OAI
     */

    /**
     * CloudFront distribution that provides HTTPS
     */
    this.distribution = new cloudfront.Distribution(this, 'myDist', {
      defaultRootObject: 'index.html',
      minimumProtocolVersion: cloudfront.SecurityPolicyProtocol.TLS_V1_2_2021,
      defaultBehavior: {
        origin: new cloudfront_origins.S3Origin(bucket, {
          originAccessIdentity: oai,
        }),
        compress: true,
        allowedMethods: cloudfront.AllowedMethods.ALLOW_GET_HEAD_OPTIONS,
        viewerProtocolPolicy: cloudfront.ViewerProtocolPolicy.REDIRECT_TO_HTTPS,
      },
      errorResponses: [
        {
          httpStatus: 403,
          responseHttpStatus: 403,
          responsePagePath: '/index.html',
          ttl: cdk.Duration.minutes(30),
        },
      ],
      domainNames: [`${util.buildDomain(props.domainSegments)}`],
      certificate: certificate,
    });
    new cdk.CfnOutput(this, 'DistributionIdOutput', {
      value: this.distribution.distributionId,
    });

    /**
     * Route53 alias record for the CloudFront distribution
     */
    new route53.ARecord(this, 'SiteAliasRecordOutput', {
      recordName: `${util.buildDomain(props.domainSegments)}`,
      target: route53.RecordTarget.fromAlias(
        new route53_targets.CloudFrontTarget(this.distribution)
      ),
      zone,
    });

    /**
     * Build sources depending on if there are more things that need to be added
     * Take the strings in extraSources and map them to extra sources
     */
    const sources = props.extraSources
      ? [
          ...props.extraSources.map((path) => s3_deployment.Source.asset(path)),
          s3_deployment.Source.asset(props.pathToAssets),
        ]
      : [s3_deployment.Source.asset(props.pathToAssets)];

    /**
     * Automated s3 deployment
     */
    new s3_deployment.BucketDeployment(this, 'DeployWithInvalidation', {
      sources: [...sources],
      destinationBucket: bucket,
      distribution: this.distribution,
      distributionPaths: ['/*'],
    });

Also, pay me.

billyjbryant commented 4 months ago

Is there any update to this? I am attempting to deploy a bucket and a stackset and the stackset fails because the bucket policy does not finish deploying, despite the policy not being built until after the bucket.