Open Jefffrey opened 1 year ago
Make use of file statistics, stripe statistics, column statistics, row group indexes, and bloom filters
Need way to expose this functionality so users (like datafusion) can utilize to efficiently query large ORC files, e.g. via predicate pushdown
Take inspiration from how parquet handles exposing the necessary information/behaviour: https://docs.rs/parquet/latest/parquet/arrow/arrow_reader/type.ParquetRecordBatchReaderBuilder.html
Make use of file statistics, stripe statistics, column statistics, row group indexes, and bloom filters
Need way to expose this functionality so users (like datafusion) can utilize to efficiently query large ORC files, e.g. via predicate pushdown