apache / datafusion

Apache DataFusion SQL Query Engine
https://datafusion.apache.org/
Apache License 2.0
6.15k stars 1.16k forks source link

Incorrect statistics read for struct array in parquet #10609

Open NGA-TRAN opened 5 months ago

NGA-TRAN commented 5 months ago

Describe the bug

I found this while adding tests https://github.com/apache/datafusion/pull/10608. The statistics of struct array returns nothing

To Reproduce

See test_struct in https://github.com/apache/datafusion/pull/10608

Expected behavior

Return some values for the statistics

Additional context

No response

Lordworms commented 5 months ago

take

xinlifoobar commented 4 months ago

8334 Related. The current statistics for structs returns null.

Lordworms commented 4 months ago

The problem here is how to effectively deal with nested struct, I don't actually know whether all the columns related to one struct are totally stored in one row group or they would separate in different row groups