datafusion-contrib / datafusion-orc

Implementation of Apache ORC file format use Apache Arrow in-memory format
Apache License 2.0
30 stars 8 forks source link

Support projection as boolean mask #19

Closed Jefffrey closed 6 months ago

Jefffrey commented 8 months ago

Be able to specify when building reader which columns to project (from root struct type)

See parquet: https://github.com/apache/arrow-rs/blob/master/parquet/src/arrow/mod.rs#L159-L165

Jefffrey commented 8 months ago

Looks like already support projection via field names:

https://github.com/datafusion-contrib/datafusion-orc/blob/218d99165fb2190764ac4345a3a13f9dbaae5135/src/arrow_reader.rs#L372

Maybe consider changing to boolean mask to be similar to parquet?