(aws-s3): declare resources that should be deleted before wiping the bucket

quixoticmonk commented 3 years ago

Description of the bug:

Access log bucket defined with a auto_delete_objects as True and removal policy of Destroy fails deletion.

Screen Shot 2021-09-01 at 5 26 21 PM

Reproduction Steps

Create an S3 bucket and access logs bucket with auto_delete_objects set to True and remove_policy as Destroy.
Deploy the stack
Delete the stack

The full set of resources in the stack are:

S3 bucket for hosting
Cloudfront
access logs bucket
lambda deployed by cdk for deleting an S3 bucket which is non-empty

What did you expect to happen?

Expected both the buckets to be deleted.

What actually happened?

Hosting bucket was emptied and deleted while the stack failed on access logs bucket deletion.

Error message

The bucket you tried to delete is not empty ( Service: Amazon S3; Status Code: 409)

Environment

CDK CLI Version : 1.116.0- 1.121.0
Language (Version): Python

This is :bug: Bug Report

otaviomacedo commented 3 years ago

That's odd. Can you please share a minimal app code that I can use to reproduce the issue?

quixoticmonk commented 3 years ago

@otaviomacedo I can't somehow reproduce the issue with a simple stack with 2 S3 buckets with one of them being an access log bucket and having one "CustomS3AutoDeleteObjectsLambda" between them. I have this same core logic failing on an application stage from a codebuild instance where the custom resource which is assigned to delete the access logs bucket gets deleted earlier than the bucket. So I think there is an order mismatch which comes up here in case of a Codebuild based deployment.

otaviomacedo commented 3 years ago

What's probably happening here is (thanks, @jogold for pointing that out):

The custom resource receives a "Delete" event and empties the log bucket.
CloudFormation deletes the custom resource.
Your server adds some more logs to the empty bucket, making it no longer empty.
CloudFormation tries to delete the bucket, but fails because it is no longer empty.

You can confirm whether this is the case by comparing at the timestamp of the DELETE_IN_PROGRESS event for the bucket and the "Last modified" field of the objects left over in the bucket.

I don't think we have any way of avoiding this race condition, unless you can make whatever is generating the logs in your case (Lambda function, EC2 instances etc) depend on the bucket (see the Dependencies section of the README for more information on this).

quixoticmonk commented 3 years ago

@otaviomacedo @jogold That assumption is not correct. Last modified : 16:28:06 10/06 Screen Shot 2021-10-08 at 8 11 53 AM Delete in progress : 16:29:24 10/06 Screen Shot 2021-10-08 at 8 12 35 AM

My infra stack only has the following:

S3 bucket to hold my static files
Cloudfront to serve them
Access logs bucket to hold the access logs for S3 and cloudfront'
Custom resources deployed due to the "Empty S3 bucket if non empty" selection on the Bucket. I would think this would be a very generic usecase

quixoticmonk commented 3 years ago

Infact, the custom resource gets the DELETE event before it hits the bucket Screen Shot 2021-10-08 at 8 16 08 AM .

otaviomacedo commented 3 years ago

I believe we're saying the same thing. The custom resource was deleted at 16:26:23, at which point it had done its job. The objects in the bucket were updated after that, at 16:27:37 and later.

quixoticmonk commented 3 years ago

@otaviomacedo Sorry for the delay in response. Wouldn't this be a normal usecase where an S3 bucket and access logs bucket exists and the order of cleanup need to honor this situation ?

Interestingly enough, I have a few situations where this works. If I deploy a stack now with the same resources and initiate a delete, it would clean up everything. So that makes me wonder if the deletion of original bucket has any relevance to this scenario since that should have also filled my access logs and errored out.

otaviomacedo commented 3 years ago

Yes. I believe you have something like this in your stack:

const logsBucket = new  Bucket(this, 'logbucket', {
  ...
});

new cloudfront.Distribution(this, 'myDist', {
  ...
  logBucket: logsBucket,
});

In which case the distribution will be deleted before the bucket, as expected. So this race condition must be coming from somewhere else. Are you using the log bucket for anything else? Can you show me the whole stack?

quixoticmonk commented 3 years ago

@otaviomacedo Kindly take a look at https://gist.github.com/quixoticmonk/65e3b01b42156bd488c9be2a6a850d73

otaviomacedo commented 3 years ago

Thanks! Thinking more about this, I realized it's still possible to get into this race condition:

The custom resource receives a "Delete" event and empties the log bucket.
CloudFront distribution adds more logs to the bucket, making it no longer empty.
CloudFormation deletes the distribution (too late for our purpose).
CloudFormation tries to delete the bucket and fails because it's not empty.

The only way to avoid this would be to make the distribution (in your case) depend on the custom resource. This way, the distribution would be deleted before the custom resource does anything.

However, the custom resource responsible for auto-delete is not exposed by the bucket (and I don't think it should be, anyway). So we would need to add some non-leaky abstraction to the Bucket construct to tell it that a certain construct depends on the custom resource. Something like:

const distro = new Distribution(app, 'distro', ...);

const bucket = new Bucket(app, 'bucket', {
  ...
  autoDeleteObjects: true,
});

// With a better name, of course
bucket.deleteThisConstructBeforeDeletingTheObjects(distro);

I'm marking this as a p2 feature request, which means that we are unable to work on this immediately.

We use +1s to help prioritize our work, and are happy to revaluate this issue based on community feedback. You can reach out to the cdk.dev community on Slack to solicit support for reprioritization.

c4c4c4 commented 2 years ago

I ran into this same error and noticed something that may help.

logBucket created with autoDeleteObjects: true and removalPolicy: RemovalPolicy.DESTROY
After CDK deployment I logged into the console and verified Cloudfront creates the bucket as expected, and the policy statement on the bucket is consistent with setting autoDeleteObjects: true and removalPolicy: RemovalPolicy.DESTROY
Running delete stack fails due to unable to delete the bucket (the bug being reported in this ticket)
I logged into the console and checked the bucket policy, and it's empty. So it may be that the policy is detached prior to bucket deletion.
I repeated this twice and the same thing happens - stack delete fails due to bucket not being deleted, and the policy statement on the bucket in the console is empty.

aws / aws-cdk