datafusion-contrib / datafusion-orc

Implementation of Apache ORC file format use Apache Arrow in-memory format
Apache License 2.0
41 stars 10 forks source link

Refactor metadata into our own classes #41

Open Jefffrey opened 10 months ago

Jefffrey commented 10 months ago

Currently the file metadata:

https://github.com/datafusion-contrib/datafusion-orc/blob/f19cc7b66762791aba00a32a20615cb5466b33ed/src/reader/metadata.rs#L18-L23

And stripe metadata:

https://github.com/datafusion-contrib/datafusion-orc/blob/f19cc7b66762791aba00a32a20615cb5466b33ed/src/arrow_reader.rs#L871-L878

Rely directly on proto structs/types.

Will work on refactoring to add our own versions of these structs/types as a sort of decoupling layer, and to potentially have a nicer interface for what we need from the metadata.