chmp / serde_arrow

Convert sequences of Rust objects to Arrow tables
MIT License
60 stars 17 forks source link

Bool8 #211

Closed v1gnesh closed 1 month ago

v1gnesh commented 1 month ago

Hey, I love seeing the recent updates. Hope you're doing well :handshake:

When the new Bool8 type lands, please can you see if that can be added?

https://github.com/apache/arrow/pull/43234 https://github.com/apache/arrow/pull/43488 https://github.com/apache/arrow/pull/43323

chmp commented 1 month ago

Hey :)

Sure. As it's simply bools encoded as a 0/1 in a Int8 array, it should be a pretty trivial change. (simply add bool support to the int deserializers / serializers). The extension type itself is ignored by the serde_arrow at the moment.

At the moment I am midway through a major change of the internals of serde_arrow, but I should be able to add this change afterwards.

chmp commented 1 month ago

Details from the PR defining the extension type:

Bool8 represents a boolean value using 1 byte (8 bits) to store each value instead of only 1 bit as in the original Arrow Boolean type. Although less compact than the original representation, Bool8 may have better zero-copy compatibility with various systems that also store booleans using 1 byte.

  • Extension name: arrow.bool8.
  • The storage type of this extension is Int8 where:
    • false is denoted by the value 0.
    • true can be specified using any non-zero value. Preferably 1.
  • Extension type parameters: This type does not have any parameters.
  • Description of the serialization: Metadata is an empty string.

Impls:

chmp commented 1 month ago

Implemented with #212 and #214