apache / arrow-nanoarrow

Helpers for Arrow C Data & Arrow C Stream interfaces
https://arrow.apache.org/nanoarrow
Apache License 2.0
174 stars 38 forks source link

Support dictionary-encoded types in the IPC reader and writer #622

Open paleolimbot opened 2 months ago

paleolimbot commented 2 months ago

Neither the IPC reader nor the IPC writer support dictionary encoding. The lack of writer support is problematic for the R bindings because the factor type is converted by default to a dictionary-encoded string; the lack of reader support is problematic because the IPC reader is currently forced to error for streams produced elsewhere, many of which will happily emit dictionary messages. This also complicates our integration testing since currently we have to skip any file that includes a dictionary.

Delta dictionaries are probably out of scope for the initial level of support.