dacort / duckdb-athena-extension

An experimental Athena extension for DuckDB 🐤
MIT License
49 stars 3 forks source link

[design] Decide on other functions to implement #5

Open dacort opened 1 year ago

dacort commented 1 year ago

athena_scan is the most basic thing to implement, but scans an entire table. Unfortunately, the way Athena works, it will be difficult to optimize that for large tables. And in most cases, I'm assuming folks are going to want a small slice of a table so at the very least, we'll need a pushdown function. It could be interesting to utilize UNLOAD, though, and then let DuckDB load the parquet files from S3.

ghalimi commented 1 year ago

Using Athena to expose Iceberg's metadata API would dramatically simplify DuckDB's integration with Iceberg. The most useful part of this API would be TableScan, which would make it possible to retrieve Iceberg partitions for a table with a given set of filtering predicates. As far as I know, Athena's API does not support that yet unfortunately, but it should not be too difficult to add, as I'm sure the Iceberg Java API must be used internally.