datafusion-contrib / datafusion-orc

Implementation of Apache ORC file format use Apache Arrow in-memory format
Apache License 2.0
28 stars 8 forks source link

Implement RLEv2 reader using generics #56

Closed Jefffrey closed 4 months ago

Jefffrey commented 5 months ago

Currently reader for RLEv2 implements unsigned/signed reading using same base code, but with a boolean to indicate if signed or not.

https://github.com/datafusion-contrib/datafusion-orc/blob/ebe96e070eafcb797cb33cb02aa2c935767a4825/src/reader/decode/rle_v2.rs

Want to explore using generics to enable this behaviour without that runtime check (since when we construct the RLE reader we should already know if we want signed or not, so this runtime check during decoding shouldn't be needed).

Also I'm unsure how well the current code might handle overflows/saturating adds/subs

Jefffrey commented 5 months ago

I'm currently working on this