chmp / serde_arrow

Convert sequences of Rust objects to Arrow tables
MIT License
60 stars 17 forks source link

Optional values are not handled correctly in tracing #216

Closed JohnEmhoff closed 2 weeks ago

JohnEmhoff commented 1 month ago

Hello! It seems that serde_arrow is not able to handle optional fields in some cases. Here's an example:

use arrow::datatypes::FieldRef;
use serde_arrow::schema::{SchemaLike, TracingOptions};

fn main() {
    let data = r#"[{"flavor": "delicious"}, {"flavor": null}]"#;
    let v: serde_json::Value = serde_json::from_str(data).unwrap();
    // The next line fails
    let fields = Vec::<FieldRef>::from_samples(&v, TracingOptions::default()).expect("Failed to infer schema");
}

The error message is Error: mismatched types, previous Some(LargeUtf8), current Null.

This example works in 0.11.2 or thereabouts, but fails in 0.11.6.

chmp commented 1 month ago

Hi @JohnEmhoff,

Thanks a lot for the report and the detailed sample. You're right. When I rewrote the tracing logic to use Serde directly, I introduced some bugs for Null values. I will have to figure out how to fix this.

JohnEmhoff commented 1 month ago

Ok, thanks! For now we'll just pin to 0.11.2.

chmp commented 1 month ago

Just FYI: If I traced the root cause correctly, the issue should be isolated to 0.11.6.

chmp commented 2 weeks ago

I released 0.11.7 with the fix and yanked 0.11.6