Open samansmink opened 3 months ago
This is basically caused by letter
being NULL
in one of the partitions of that table (the __HIVE_DEFAULT_PARTITION
in particular).
The value not being in the map is exactly what indicates that it should be NULL
in the output.
It seems that when you propagate to the multi-file-reader, since the constant isn't specified, it tries to get that column out, and that's what causes the odd:
IO Error: Failed to read file "../delta-kernel-rs/kernel/tests/data/basic_partitioned/letter=__HIVE_DEFAULT_PARTITION__/part-00000-8eb7f29a-e6a1-436e-a638-bbf0a7953f09.c000.snappy.parquet": schema mismatch in glob: column "letter" was read from the original file "../delta-kernel-rs/kernel/tests/data/basic_partitioned/letter=__HIVE_DEFAULT_PARTITION__/part-00000-8eb7f29a-e6a1-436e-a638-bbf0a7953f09.c000.snappy.parquet", but could not be found in file "../delta-kernel-rs/kernel/tests/data/basic_partitioned/letter=__HIVE_DEFAULT_PARTITION__/part-00000-8eb7f29a-e6a1-436e-a638-bbf0a7953f09.c000.snappy.parquet".
There is indeed no letter
column, it should be filled in with NULL
.
I see that the constant_map
in the visit_callback
is a <string>
map, so not sure if we can indicate in there that a column should be null. Might need to somehow extend it to be <Value>
map, or some other way to indicate to the reader that it should fill the column with NULL
Using duckdb_delta looking up partition constants is not always working, I'm not sure why. Some tests are passing some are failing:
(Passing) Data generated using delta-rs
This test passes. Test is here Data generated here DuckDB can correctly find the partition constant for each file using
ffi::get_from_map
inffi::visit_scan_data
callback(Failing) Test with delta_kernel/kernel/tests/data/basic_partitioned
When DuckDB calls
ffi::get_from_map
in the callback fromffi::visit_scan_data
, the letter column is not found. test is here(Failing) Test with delta_kernel/acceptance/tests/dat/out/reader_tests/generated/basic_partitioned
Same thing as with 2, the lookup with get_from_map returns NULL even though the column should be there?
@nicklan let me know if you need anything more here