Closed devendrasr closed 3 months ago
looks good, thanks!
@samansmink - Hi, I downloaded DuckDB nightly and didn't find this feature (skip_schema_inference) Will it be part of the upcoming 0.10.2? Thanks
@harel-e are you sure? for me it works:
force install iceberg from 'http://nightly-extensions.duckdb.org';
load iceberg;
FROM iceberg_metadata("my_iceberg_table", skip_schema_inference = true);
@samansmink - I wasn't aware of force install, but it still failed.
Using the nightly build binary
./duckdb v0.10.2-dev265 2687e2d6d9 Enter ".help" for usage hints. Connected to a transient in-memory database. Use ".open FILENAME" to reopen on a persistent database.
D force install iceberg from 'http://nightly-extensions.duckdb.org'; HTTP Error: Failed to download extension "iceberg" at URL "http://nightly-extensions.duckdb.org/2687e2d6d9/osx_arm64/iceberg.duckdb_extension.gz" Extension "iceberg" is an existing extension.
Are you using a development build? In this case, extensions might not (yet) be uploaded.
@harel-e yea we don't have good update semantics (yet) for extensions. Force installing will override your current installation with whatever you provide, otherwise DuckDB will not update thinking that iceberg is already installed.
Using the nightly build binary
That's a bit quirky atm: we distribute nightly binaries for extensions that target the latest stable release of duckdb, and we distribute nightly binaries of duckdb with stable versions of extensions. But we do not distribute nightly extensions for nightly binaries of duckdb automatically so these can be behind sometimes.
I will bump the iceberg extension in duckdb main which should resolve this
@samansmink - Thank you for making this change available in the extensions. Will this PR be available in the upcoming 0.10.2 version as part of the stable extension version? (i.e. just using 'install iceberg') ?
The current version does not support complex data type parsing while inferring the schema from within the snapshot. By the time support for complex data type comes, I am introducing a flag that can be used to skip this flow. This will offload schema parsing to the underlying parquet extension. Here is how you can do it -
scan data:
scan metadata:
scan snapshots:
Note - I am closing an earlier PR that was requesting these changes and was a bit complex to understand - https://github.com/duckdb/duckdb_iceberg/pull/43