Open dacort opened 1 year ago
Using Athena to expose Iceberg's metadata API would dramatically simplify DuckDB's integration with Iceberg. The most useful part of this API would be TableScan
, which would make it possible to retrieve Iceberg partitions for a table with a given set of filtering predicates. As far as I know, Athena's API does not support that yet unfortunately, but it should not be too difficult to add, as I'm sure the Iceberg Java API must be used internally.
athena_scan
is the most basic thing to implement, but scans an entire table. Unfortunately, the way Athena works, it will be difficult to optimize that for large tables. And in most cases, I'm assuming folks are going to want a small slice of a table so at the very least, we'll need apushdown
function. It could be interesting to utilizeUNLOAD
, though, and then let DuckDB load the parquet files from S3.athena_scan
- just returns all the data from a single tableathena_scan_pushdown
- similar to the postgres scanner, returns all the data filtered by certain predicates/partitionsathena_unload
- Utilizes an UNLOAD query in Athena to write results to parquet in S3, then duckdb can just load the parquet files.athena_query
- Runs an athena query and returns the results