MeltanoLabs / target-athena

Singer.io Target for AWS Athena.
Other
5 stars 16 forks source link

Suggestion: use AWS Data Wrangler instead of pyathena #37

Open ndrluis opened 2 years ago

ndrluis commented 2 years ago

Hello people, I'm starting to use this target and I'm missing some features that I'm already working to make some contributions here, but I think that we can make this codebase more simpler using AWS Data Wrangler instead of pyathena.

IDK if anyone here has worked before with this library, but aws data wr abstracts all the AWS calls and catalog/database manipulation and data upload to s3 making easier to implement the parquet writer #9 for example.

Can we discuss about?

References: https://aws-data-wrangler.readthedocs.io/en/stable/tutorials/006%20-%20Amazon%20Athena.html https://aws-data-wrangler.readthedocs.io/en/stable/tutorials/005%20-%20Glue%20Catalog.html https://aws-data-wrangler.readthedocs.io/en/stable/tutorials/003%20-%20Amazon%20S3.html https://aws-data-wrangler.readthedocs.io/en/stable/tutorials/012%20-%20CSV%20Crawler.html https://aws-data-wrangler.readthedocs.io/en/stable/tutorials/017%20-%20Partition%20Projection.html

yummydum commented 2 years ago

As another user of wrangler, I strongly agree. Many functionalities are already implemented in wrangler. I think this codebase can be a thin wrapper around wrangler to make it compliant to the Singer protocol.

andrewcstewart commented 2 years ago

Definitely worth consideration, especially as there is some discussion of rewriting the entire target at some point.

I've also come across https://github.com/akamai/pallas if anyone is familiar and can compare/contrast.

ndrluis commented 2 years ago

I created a target-s3-parquet using aws data wrangler to solves our problems with target-athena https://github.com/gupy-io/target-s3-parquet

The codebase has some hardcoded configuration, but we pretend to evolve.