chmp / serde_arrow

Convert sequences of Rust objects to Arrow tables
MIT License
60 stars 17 forks source link

DateTime<Utc> is mapped to i64, and lose type information in parquet #187

Closed Veiasai closed 3 months ago

Veiasai commented 3 months ago

with

    #[serde(with = "ts_microseconds")]
    pub expiry: DateTime<Utc>,
        let fields = Vec::<FieldRef>::from_type::<MyStruct>(
            TracingOptions::default().allow_null_fields(true),
        )
        .unwrap();

the datatype is Int64 in arrow.

is it possible to get DataType::Timestamp(TimeUnit::MicroSeconds, None) ?

chmp commented 3 months ago

Hi @Veiasai,

unfortunately, serde_arrow only sees an i64 without any other information, as this is the only info Serde passes along. I am afraid you would need to manually overwrite the type. E.g., via

for field in &mut fields {
  if field.name() == "expiry" {
    *field = Field::new(field.name(), DataType::Timestamp(TimeUnit::Microsecond, None), field.nullable());
  }
}

I am also thinking about adding the option to override individuals fields during tracing, e.g., via

let fields = Vec::<FieldRef>::from_type::<MyStruct>(
            TracingOptions::default()
               .allow_null_fields(true)
               .overwrite_field("$.expiry", json!({"name": "expiry", "data_type": "Timestamp(Microsecond, None)"}),
        )
        .unwrap();

But I'm not really sure about the API.

Veiasai commented 3 months ago

Hey @chmp . I see. Thanks!