chmp / serde_arrow

Convert sequences of Rust objects to Arrow tables
MIT License
60 stars 17 forks source link

Consider adding new-type marker types to simplify customization #224

Open chmp opened 3 weeks ago

chmp commented 3 weeks ago

Serde does not allow to convey any extra metadata during serialization. Using special new type structs, it could be possible to work around it. Newtype structs can be serialized by an extra call to serialize_newtype_struct, this way it would be possible to inject additional metadata via the typename.

Possible uses:

While the newtype could be ignored in array serialization and deserialization, it could be used during schema tracing.

This idea is motivated by this comment.

Possible API

#[derive(Serialize, Deserialize)]
struct S {
   #[serde(with = "serde_arrow::experimental::bool8")]
  value: bool,
}

// alternative
use serde_arrow::experimental::Bool8;

#[derive(Serialize, Deserialize)]
struct S {
   value: Bool8,
}

Mock Impl for Marker Type

// serde_arrow/src/experimental/bool8.rs

const BOOL8_TAG: str = "serde_arrow::experimental::bool8";

pub struct Bool8(pub bool);

impl Serialize for Bool8 {
  fn serialize<S: Serializer>(&self, serializer: S) -> Result<S::Ok, S::Error> {
    serializer.serialize_newtype_struct(BOOL8_TAG, &self.0)
  }
}

impl<'de> Deserialize<'de> for Bool8 {
  fn deserialize<S: Deserializer<'de>>(deserializer: D) {
     deserializer.deserialize_newtype_struct(BOOL8_TAG, visitor)
  } 
}

pub fn serialize<S: Serializer>(value: &bool, serializer: S) -> Result<S::Ok, S::Error> {
  Bool8(*value).serialize(S)
}

pub fn deserialize<'de, D: Deserializer<'de>>(deserializer: D> -> Result<bool, D::Error> {
  Ok(Bool8::deserialize(deserializer)?.0)
}
v1gnesh commented 3 weeks ago

The alternate API is nicer, as attr annotations over fields can get "noisy" very quickly. Newtype hides it away as a one-time def. Thank you for thinking through this!