aws / aws-cdk

The AWS Cloud Development Kit is a framework for defining cloud infrastructure in code
https://aws.amazon.com/cdk
Apache License 2.0
11.67k stars 3.92k forks source link

(aws-s3): declare resources that should be deleted before wiping the bucket #16332

Open quixoticmonk opened 3 years ago

quixoticmonk commented 3 years ago

Description of the bug:

Screen Shot 2021-09-01 at 5 26 21 PM

Reproduction Steps

The full set of resources in the stack are:

What did you expect to happen?

What actually happened?

Error message

Environment


This is :bug: Bug Report

otaviomacedo commented 3 years ago

That's odd. Can you please share a minimal app code that I can use to reproduce the issue?

quixoticmonk commented 3 years ago

@otaviomacedo I can't somehow reproduce the issue with a simple stack with 2 S3 buckets with one of them being an access log bucket and having one "CustomS3AutoDeleteObjectsLambda" between them. I have this same core logic failing on an application stage from a codebuild instance where the custom resource which is assigned to delete the access logs bucket gets deleted earlier than the bucket. So I think there is an order mismatch which comes up here in case of a Codebuild based deployment.

otaviomacedo commented 3 years ago

What's probably happening here is (thanks, @jogold for pointing that out):

  1. The custom resource receives a "Delete" event and empties the log bucket.
  2. CloudFormation deletes the custom resource.
  3. Your server adds some more logs to the empty bucket, making it no longer empty.
  4. CloudFormation tries to delete the bucket, but fails because it is no longer empty.

You can confirm whether this is the case by comparing at the timestamp of the DELETE_IN_PROGRESS event for the bucket and the "Last modified" field of the objects left over in the bucket.

I don't think we have any way of avoiding this race condition, unless you can make whatever is generating the logs in your case (Lambda function, EC2 instances etc) depend on the bucket (see the Dependencies section of the README for more information on this).

quixoticmonk commented 3 years ago

@otaviomacedo @jogold That assumption is not correct. Last modified : 16:28:06 10/06 Screen Shot 2021-10-08 at 8 11 53 AM Delete in progress : 16:29:24 10/06 Screen Shot 2021-10-08 at 8 12 35 AM

My infra stack only has the following:

quixoticmonk commented 3 years ago

Infact, the custom resource gets the DELETE event before it hits the bucket Screen Shot 2021-10-08 at 8 16 08 AM .

otaviomacedo commented 3 years ago

I believe we're saying the same thing. The custom resource was deleted at 16:26:23, at which point it had done its job. The objects in the bucket were updated after that, at 16:27:37 and later.

quixoticmonk commented 3 years ago

@otaviomacedo Sorry for the delay in response. Wouldn't this be a normal usecase where an S3 bucket and access logs bucket exists and the order of cleanup need to honor this situation ?

Interestingly enough, I have a few situations where this works. If I deploy a stack now with the same resources and initiate a delete, it would clean up everything. So that makes me wonder if the deletion of original bucket has any relevance to this scenario since that should have also filled my access logs and errored out.

otaviomacedo commented 3 years ago

Yes. I believe you have something like this in your stack:

const logsBucket = new  Bucket(this, 'logbucket', {
  ...
});

new cloudfront.Distribution(this, 'myDist', {
  ...
  logBucket: logsBucket,
});

In which case the distribution will be deleted before the bucket, as expected. So this race condition must be coming from somewhere else. Are you using the log bucket for anything else? Can you show me the whole stack?

quixoticmonk commented 3 years ago

@otaviomacedo Kindly take a look at https://gist.github.com/quixoticmonk/65e3b01b42156bd488c9be2a6a850d73

otaviomacedo commented 3 years ago

Thanks! Thinking more about this, I realized it's still possible to get into this race condition:

  1. The custom resource receives a "Delete" event and empties the log bucket.
  2. CloudFront distribution adds more logs to the bucket, making it no longer empty.
  3. CloudFormation deletes the distribution (too late for our purpose).
  4. CloudFormation tries to delete the bucket and fails because it's not empty.

The only way to avoid this would be to make the distribution (in your case) depend on the custom resource. This way, the distribution would be deleted before the custom resource does anything.

However, the custom resource responsible for auto-delete is not exposed by the bucket (and I don't think it should be, anyway). So we would need to add some non-leaky abstraction to the Bucket construct to tell it that a certain construct depends on the custom resource. Something like:

const distro = new Distribution(app, 'distro', ...);

const bucket = new Bucket(app, 'bucket', {
  ...
  autoDeleteObjects: true,
});

// With a better name, of course
bucket.deleteThisConstructBeforeDeletingTheObjects(distro);

I'm marking this as a p2 feature request, which means that we are unable to work on this immediately.

We use +1s to help prioritize our work, and are happy to revaluate this issue based on community feedback. You can reach out to the cdk.dev community on Slack to solicit support for reprioritization.

c4c4c4 commented 2 years ago

I ran into this same error and noticed something that may help.