aws-samples / aws-cdk-examples

Example projects using the AWS CDK
Apache License 2.0
5.12k stars 2.15k forks source link

Empty (delete all objects) from an Amazon S3 bucket using AWS StepFunctions, Lambda functions and Custom Resources #430

Closed elgamala closed 1 year ago

elgamala commented 3 years ago

:rocket: Feature Request

General Information

Description

Cleaning up and emptying Amazon S3 buckets before deletion is quite time consuming and painful process specially for buckets with massive amount of objects. While you can do it in AWS Management Console, you will need to keep the browser tab open until all objects are deleted otherwise it will stop. If you think to automate testing on AWS so you generate multiple S3 buckets and doing the cleanup manually is not really working. You can run aws s3 rm s3://<your bucket> --recursive but here comes another problem when you have temporary credentials it will expire while deleting the entire bucket objects and again you have to keep the client where you run the aws cli command up and running until the bucket gets emptied. Whenever you try to use AWS IAM User (long living) credentials please practice social distancing and stay away at least 2 meters away from any security guy because in corporates, all access is federated using LDAP, AD or any other user forest so getting long living IAM User credentials is considered a bypass for the corporate wide identity provider. if you try using AWS CloudFormation custom resources backed by an AWS Lambda function it won't efficiently work at scale because both CFN and LambdaFn will probably timeout if you really have big amount of S3 objects (10000+) that you need to delete before deleting the Amazon S3 bucket.

Proposed Solution

Build AWS CDK Stack that looks up of an existing S3 bucket, provision an AWS StepFunction backed by 2 AWS Lambda functions (one for actual object deletion and another for checking the status) to orchestrate the deletion process at scale while being able to achieve asynchronous and server less deletion process that enables us to have longer timeout e.g. 1 day. You pass the Amazon S3 bucket name that you wanna empty as a parameter to the Stack while running cdk deploy command. The CDK Stack includes an AwsCustomResource using an AwsSdkCall to create a new execution for the newly deployed state machine powered by AWS StepFunctions.

state-machine

The solution is tested against the following corner cases of Amazon S3 Bucket configurations:

Environment

Other information

PR is ready for merge, #433

kaiz-io commented 1 year ago

This is built into the S3 Bucket construct now. https://docs.aws.amazon.com/cdk/api/v2/docs/aws-cdk-lib.aws_s3.Bucket.html#autodeleteobjects

github-actions[bot] commented 1 year ago

⚠️COMMENT VISIBILITY WARNING⚠️

Comments on closed issues are hard for our team to see. If you need more assistance, please either tag a team member or open a new issue that references this one. If you wish to keep having a conversation with other community members under this issue feel free to do so.