Closed jacobmarble closed 3 days ago
@jacobmarble what version are you on?
Hey @mike-luabase I managed to build the plugin yesterday, and started adding printf
s to figure out what's going on. Interesting that the iceberg_scan
function essentially rewrites itself as parquet_scan
- very elegant, easy to understand.
The problem resolves itself when I add skip_schema_inference = true
, so there must be some schema mismatch between my Iceberg and Parquet data.
Back to the point of this issue: I found the skip_schema_inference
parameter by looking through the source code; the feature isn't documented (unless there's documentation aside from https://duckdb.org/docs/extensions/iceberg ). But then I found that the schema
parameter of parquet_scan
/read_parquet
is also not documented (at least not at https://duckdb.org/docs/data/parquet/overview.html#parameters ). Maybe there's a verbose logging option hidden somewhere as well.
Perhaps I can adjust my expectations of duckdb a bit. It works great when I wear my "data scientist hat" but requires a different approach when I'm wearing my "software engineer hat".
How can I understand why a particular Iceberg file set fails to query? Is there a "verbose logging" feature that isn't documented? How do developers of this plugin debug?
I'm working on a tool that generates Iceberg metadata and data. The generated file set looks like this:
select count(*) from iceberg_scan(...)
yields a correct resultselect * from iceberg_scan(...)
yields the correct schema, the correct quantity of rows, butnull
values for every column and rowIn contrast,
select * from read_parquet(...)
yields correct results