delta-io / delta-rs

A native Rust library for Delta Lake, with bindings into Python
https://delta-io.github.io/delta-rs/
Apache License 2.0
1.98k stars 365 forks source link

Properly handle nested fields when computing stats / stats-schema #2572

Open roeap opened 3 weeks ago

roeap commented 3 weeks ago

Bug

Right now our logic to compute stats, specifically using the `` property only considers root level fields, but does not traverse into fields. Also we need to parse fields as the field names may be escaped and contain special characters ..

https://github.com/delta-io/delta/blob/4b102d34a2ce881b2a851b4c6cfbf2ab3ab5534f/spark/src/main/scala/org/apache/spark/sql/delta/DeltaConfig.scala#L549-L561

What you expected to happen:

Properly parse field names when generating stats and stats schema

More details:

ion-elgreco commented 3 weeks ago

@roeap fyi, stats parsing seems also not entirely working for checkpoints https://github.com/delta-io/delta-rs/issues/2571