awslabs / amazon-s3-find-and-forget

Amazon S3 Find and Forget is a solution to handle data erasure requests from data lakes stored on Amazon S3, for example, pursuant to the European General Data Protection Regulation (GDPR)
Apache License 2.0
236 stars 35 forks source link

ORC support #211

Open Abhichetan opened 3 years ago

Abhichetan commented 3 years ago

we currently use CSV and ORC rather than Parquet for our data lake objects.

In an ideal world we probably migrate to parquet which would enable us to use this project, but that's currently a project with a lot of unknowns on our side - if s3-find-and-forget supported ORC and CSV it would enable us to test out the project on our existing data sets.

We understand that this could result in increased costs and reduced performance vs. parquet - but it would still be helpful at this point.

ctd commented 3 years ago

Hi, thanks for your interest in Amazon S3 Find and Forget.

CSV support has already been requested in #175, so I'm going to rename this issue to cover ORC support only. Please follow #175 for updates on CSV support.

We use issue popularity as one input to prioritise future development, so please add a 👍 to the initial post if ORC support would be useful for your use case. We'll post any updates on ORC support in this issue.