datafusion-contrib / orc-rust

Rust implementation of Apache ORC
Apache License 2.0
11 stars 5 forks source link

Support selection pruning #17

Open Jefffrey opened 1 year ago

Jefffrey commented 1 year ago

Make use of file statistics, stripe statistics, column statistics, row group indexes, and bloom filters

Need way to expose this functionality so users (like datafusion) can utilize to efficiently query large ORC files, e.g. via predicate pushdown

Jefffrey commented 7 months ago

Take inspiration from how parquet handles exposing the necessary information/behaviour: https://docs.rs/parquet/latest/parquet/arrow/arrow_reader/type.ParquetRecordBatchReaderBuilder.html