datafusion-contrib / datafusion-orc

Implementation of Apache ORC file format use Apache Arrow in-memory format
Apache License 2.0
28 stars 8 forks source link

DataFusion integration #63

Open Jefffrey opened 3 months ago

Jefffrey commented 3 months ago

I implemented a very shallow example integration with DataFusion:

https://github.com/datafusion-contrib/datafusion-orc/commit/8c68f472a24c7ff12401ccdb3704991f0c7d080d

Will want to flesh this out and move this code into src, so can provide support for stuff like predicate pushdown from DaraFusions point of view.

Could split this repo into two crates, one to focus on reading to Arrow (arrow-orc, akin to parquet in arrow-rs) then another for DataFusion integration code (datafusion-orc) such as a trait which implements ExecutionPlan , etc.