Open ebyhr opened 2 months ago
This bug looks interesting while value it is returning as ZERO. Column metrics it is keeping as null.
{"x":{"column_size":31,"value_count":1,"null_value_count":1,"nan_value_count":null,"lower_bound":null,"upper_bound":null}}
and as expected, it is not allowing do insert operation with null value.
@RussellSpitzer ,Please share your thoughts here. If this is a bug, I would be happy to help resolve it.
We don't do any validation on any of the columns during add_files so this is a place where we could add some safety code. So not so much a bug as just an area we haven't really looked at. For example if you add files that don't match the columns of the table, we also just let that happen.
IMO, we should add the validation for null check at least. Else it may violate the the table definition constraints. Having extra columns to parquet is being ignored from column metrics. Also after rewrite_data_file those extra columns will be removed. Adding parquet files with null values are always being considered. It can give wrong results for ZERO count. WDYT?
Apache Iceberg version
None
Query engine
None
Please describe the bug 🐞
Steps to reproduce:
There's no relevant test in TestAddFilesProcedure as far as I confirmed, so I assume it's unexpected behavior.
Willingness to contribute