duckdb / duckdb_iceberg

MIT License
107 stars 18 forks source link

Unable to read complex data types(e.g. Map, Struct) after upgrading to latest (0.10.0) version #41

Closed devendrasr closed 3 months ago

devendrasr commented 4 months ago

The extension is able to read complex data types using duckdb 0.9.0 -

image

We could not read complex data types after upgrading to duckdb 0.10.0 -

image

The database is throwing below error -

Error: IO Error: Invalid field found while parsing field: type
Fokko commented 4 months ago

Thanks for reporting this! I think it is a duplicate of https://github.com/duckdb/duckdb_iceberg/issues/39

devendrasr commented 4 months ago

Yes, i ran into #39 after adding a temp workaround for this one.

devendrasr commented 4 months ago

I had to disable schema parsing and supplying to parquet in order to make things work for the time being.

harel-e commented 4 months ago

@devendrasr Can you provide more details on how you overcame the issue? How did you disable schema parsing? Thanks

devendrasr commented 4 months ago

@harel-e i made few changes like this - https://github.com/devendrasr/duckdb_iceberg/commit/d9e5d720b319e4e474a39373d8d43781a0a3b028

harel-e commented 3 months ago

@devendrasr will PR#43 that you created solve this issue? Thank you

devendrasr commented 3 months ago

It just allows you to skip schema inference and fallback to parquet extension to infer schema. In this way we are able to read complex data types via iceberg extension.

harel-e commented 3 months ago

@devendrasr - I'd like to test your changes locally, especially with issue #39. Is it possible for you to provide a binary since I'm having issues building DuckDB locally? thanks.

devendrasr commented 3 months ago

@harel-e You can try them out using -

docker run --rm --name=duckdb -it devendrasr/duckdb:0.0.1 bash
harel-e commented 3 months ago

@devendrasr - thank you for providing an image for testing. I tried the scenario described here: https://github.com/duckdb/duckdb_iceberg/issues/39#issuecomment-1943244889 Unfortunately, it still fails with:

D SELECT * FROM iceberg_scan('data/iceberg/lineitem_iceberg', allow_moved_paths = true); Error: IO Error: Invalid field found while parsing field: required

I understand you were not trying to address issue #39. I thought they were somewhat related.

Thanks again.

harel-e commented 3 months ago

My bad, I forgot to add skip_schema_inference = true. I tested it again with both lineitem data and my own dataset (minio/nessie) and it works ! Thank you very much !

I'll wait for the PR to be merged.

devendrasr commented 3 months ago

The issue has been resolved in pull request - https://github.com/duckdb/duckdb_iceberg/pull/45

The request is now merged in trunk. Closing this one.