duckdb / duckdb_iceberg

MIT License
107 stars 18 forks source link

Support to skip schema inference - Archived #43

Closed devendrasr closed 3 months ago

devendrasr commented 4 months ago

Archived this one please refer - #45

The current version does not support complex data type parsing while inferring the schema from within the snapshot. By the time support for complex data type comes, I am introducing a flag that can be used to skip this flow. This will offload schema parsing to the underlying parquet extension. Here is how you can do it -

scan data:

SELECT * FROM iceberg_scan("s3://my-bucket/icebergwh/someschema/t01", skip_schema_inference = true) limit 10;

scan metadata:

SELECT * FROM iceberg_metadata("s3://my-bucket/icebergwh/someschema/t01", skip_schema_inference = true) limit 10;

scan snapshots:

SELECT * FROM iceberg_snapshots("s3://my-bucket/icebergwh/someschema/t01", skip_schema_inference = true) limit 10;

Note - This PR is built on top of changes requested in the PR - https://github.com/duckdb/duckdb_iceberg/pull/42

samansmink commented 3 months ago

hey @devendrasr would you mind rebasing this PR on current main then push? That makes reviewing a little easier here

devendrasr commented 3 months ago

Closing this pull request and renaming it to archived as I created a fresh and simplified pull request for this feature. @samansmink please review - https://github.com/duckdb/duckdb_iceberg/pull/45