delta-io / delta-rs

A native Rust library for Delta Lake, with bindings into Python
https://delta-io.github.io/delta-rs/
Apache License 2.0
1.98k stars 365 forks source link

Include file stats when converting a parquet directory to a Delta table #2490

Closed gruuya closed 1 month ago

gruuya commented 1 month ago

Description

Currently the ConvertToDeltaBuilder skips fetching and populating the stats https://github.com/delta-io/delta-rs/blob/81593e919497221a1a08bf8db9d20e8e4a39a8a6/crates/core/src/operations/convert_to_delta.rs#L332-L353

This results in log files missing the min/max/null count statistics.

Use Case

These stats are useful as they allow partition pruning and thus influence performance.

Granted it may be possible to use the stats from the files themselves, but that it is sub-optimal to reading from the log directly.

Related Issue(s)