Open quixoticmonk opened 3 years ago
That's odd. Can you please share a minimal app code that I can use to reproduce the issue?
@otaviomacedo I can't somehow reproduce the issue with a simple stack with 2 S3 buckets with one of them being an access log bucket and having one "CustomS3AutoDeleteObjectsLambda" between them. I have this same core logic failing on an application stage from a codebuild instance where the custom resource which is assigned to delete the access logs bucket gets deleted earlier than the bucket. So I think there is an order mismatch which comes up here in case of a Codebuild based deployment.
What's probably happening here is (thanks, @jogold for pointing that out):
You can confirm whether this is the case by comparing at the timestamp of the DELETE_IN_PROGRESS event for the bucket and the "Last modified" field of the objects left over in the bucket.
I don't think we have any way of avoiding this race condition, unless you can make whatever is generating the logs in your case (Lambda function, EC2 instances etc) depend on the bucket (see the Dependencies section of the README for more information on this).
@otaviomacedo @jogold That assumption is not correct. Last modified : 16:28:06 10/06 Delete in progress : 16:29:24 10/06
My infra stack only has the following:
Infact, the custom resource gets the DELETE event before it hits the bucket .
I believe we're saying the same thing. The custom resource was deleted at 16:26:23, at which point it had done its job. The objects in the bucket were updated after that, at 16:27:37 and later.
@otaviomacedo Sorry for the delay in response. Wouldn't this be a normal usecase where an S3 bucket and access logs bucket exists and the order of cleanup need to honor this situation ?
Interestingly enough, I have a few situations where this works. If I deploy a stack now with the same resources and initiate a delete, it would clean up everything. So that makes me wonder if the deletion of original bucket has any relevance to this scenario since that should have also filled my access logs and errored out.
Yes. I believe you have something like this in your stack:
const logsBucket = new Bucket(this, 'logbucket', {
...
});
new cloudfront.Distribution(this, 'myDist', {
...
logBucket: logsBucket,
});
In which case the distribution will be deleted before the bucket, as expected. So this race condition must be coming from somewhere else. Are you using the log bucket for anything else? Can you show me the whole stack?
@otaviomacedo Kindly take a look at https://gist.github.com/quixoticmonk/65e3b01b42156bd488c9be2a6a850d73
Thanks! Thinking more about this, I realized it's still possible to get into this race condition:
The only way to avoid this would be to make the distribution (in your case) depend on the custom resource. This way, the distribution would be deleted before the custom resource does anything.
However, the custom resource responsible for auto-delete is not exposed by the bucket (and I don't think it should be, anyway). So we would need to add some non-leaky abstraction to the Bucket
construct to tell it that a certain construct depends on the custom resource. Something like:
const distro = new Distribution(app, 'distro', ...);
const bucket = new Bucket(app, 'bucket', {
...
autoDeleteObjects: true,
});
// With a better name, of course
bucket.deleteThisConstructBeforeDeletingTheObjects(distro);
I'm marking this as a p2 feature request, which means that we are unable to work on this immediately.
We use +1s to help prioritize our work, and are happy to revaluate this issue based on community feedback. You can reach out to the cdk.dev community on Slack to solicit support for reprioritization.
I ran into this same error and noticed something that may help.
Description of the bug:
Reproduction Steps
The full set of resources in the stack are:
What did you expect to happen?
What actually happened?
Error message
Environment
This is :bug: Bug Report