aws-samples / aws-big-data-blog-dmscdc-walkthrough

MIT No Attribution
32 stars 16 forks source link

Deletion of CDC files #6

Open headbull opened 2 years ago

headbull commented 2 years ago

From what I can see and understand, it doesnt delete the CDC files that have been processes. After a while, it adds up to quite a huge number of files. Is there a s3 dms bucket file deletion implemented, if it isnt - could it be implemented ?

sheridan06 commented 2 years ago

We ended up just putting S3 lifecycle policies in place. Also, AWS lake Formation now has “governed tables” and the Databricks Delta Lake framework has really matured. You might want to consider one of these instead of this dmdcdc solution. UPDATE: lake formation governed tables don’t yet support transaction/row level CDC, so it’s either this dmscdc solution or Delta by Databricks