datafusion-contrib / datafusion-orc

Implementation of Apache ORC file format use Apache Arrow in-memory format
Apache License 2.0
28 stars 8 forks source link

Dictionary array handling #72

Open Jefffrey opened 3 months ago

Jefffrey commented 3 months ago

Spun off from discussion in https://github.com/datafusion-contrib/datafusion-orc/pull/68

Find way to preserve dictionary encoding in DictionaryArray without having to cast to StringArray as introduced here:

https://github.com/datafusion-contrib/datafusion-orc/blob/bb885c03cfedb4d4e16d0203b90a9789463e2fb7/src/arrow_reader/decoder/string.rs#L180-L183

Some reference from similar parquet issue: https://github.com/apache/arrow-rs/issues/171