duckdb / duckdb_iceberg

MIT License
135 stars 20 forks source link

Could not read iceberg table from s3 "...metadata/version-hint.text": 404 (Not Found)" #21

Closed YuriyGavrilov closed 11 months ago

YuriyGavrilov commented 11 months ago

What happens?

Can't read iceberg table created with Trino on s3.

D SELECT count(*) FROM iceberg_scan('s3://test/iceberg_p2/orders'); Error: HTTP Error: Unable to connect to URL "https://test.gateway.storjshare.io/iceberg_p2/orders/metadata/version-hint.text": 404 (Not Found) D

trying same way but local drive:

D SELECT count(*) FROM iceberg_scan('/Users/yuriygavrilov/Downloads/orders', ALLOW_MOVED_PATHS=true); Error: IO Error: Cannot open file "/Users/yuriygavrilov/Downloads/orders/metadata/version-hint.text": No such file or directory

it seems the problem in file metadata/version-hint.text but why this is mandatory to have it? .. Trino didn't ask it.

To Reproduce

try to open this with iceberg format orders.zip

OS:

14.0 (23A344)

DuckDB Version:

v0.9.1

DuckDB Client:

cli

Full Name:

Yuriy Gavrilov

Affiliation:

S7 Airlines

Have you tried this on the latest main branch?

I have not tested with any build

Have you tried the steps to reproduce? Do they include all relevant data and configuration? Does the issue you report still appear there?

samansmink commented 11 months ago

This is a duplicate of https://github.com/duckdb/duckdb_iceberg/issues/10. This extension used the version-hint produced by some services to statically read the table without a catalog. Implementation of different catalogs is planned, but in the meantime you can use a different tool to get the most recent version of a table, then pass the metadata file directly to the ICEBERG_SCAN function: https://github.com/duckdb/duckdb_iceberg/pull/18