dedupeio / dedupe-examples

:id: Examples for using the dedupe library
MIT License
404 stars 216 forks source link

Adding athena_example #120

Open asajadi opened 3 years ago

asajadi commented 3 years ago

Added an athena_example directory, which similar to mysql_example and pgsql_big_dedupe_example, runs the exact same flow but relying on Athena. Athena (a managed Presto by AWS), is inherently different from MySQL and PostgreSQL, and adding support to Athena is quite a bit of a challenge. I've hidden most of the complexity in the athena_utils.py module. Duo to Athena's parallel query execution behavior, the improvement is quite significant.