duckdb / duckdb_iceberg

MIT License
160 stars 23 forks source link

Can't read tables with updates/deletes #67

Open humaidkidwai opened 2 months ago

humaidkidwai commented 2 months ago

I realized that DuckDB can only read Iceberg metadata files if there have been no updates/deletes in the Iceberg table. I verified this with the following setup:

Catalog: AWS Glue Iceberg table format: v2 DuckDB version: 1.0.0 Writer: AWS Firehose Update strategy: Merge on Read

Here's what my code looks like:

INSTALL iceberg;
LOAD iceberg;
INSTALL httpfs;
LOAD httpfs;

SET s3_access_key_id='key';
SET s3_secret_access_key='secretKey';
SET s3_region='us-east-1';
SET s3_use_ssl=true;
SET s3_url_style='path';

SELECT *
FROM
iceberg_scan('s3://my-bucket/observation/metadata/00004-bc91e4be-ee63-4922-89eb-f7730dbbee82.metadata.json');

SQL Error: java.sql.SQLException: Binder Error: Table "iceberg_scan_deletes" does not have a column named "file_path"

60 seems like the same problem

harel-e commented 1 month ago

I tried update/delete using Nessie as catalog and Trino as writer (the engine behind AWS Athena) DuckDB 1.1.2 has no issue reading and providing accurate results for a table with deleted/updated rows.

I'd like to verify it on Glue/Athena just to be certain.

aakashchouksey commented 2 weeks ago

were the updates made using MOR equality deletes on the iceberg tables??