apache / arrow-rs

Official Rust implementation of Apache Arrow
https://arrow.apache.org/
Apache License 2.0
2.62k stars 802 forks source link

field_to_json() in arrow_integration_test/ field.rs does not serialize fields metadata #6700

Open pshampanier opened 2 weeks ago

pshampanier commented 2 weeks ago

Describe the bug When calling arrow_integration_test::schema_to_json(schema) metadata at the field level are not serialized.

To Reproduce

#[cfg(test)]
mod tests {
    use arrow_integration_test::schema_to_json;
    use arrow_schema::{DataType, Field, Schema};
    use std::collections::HashMap;

    #[test]
    fn test_schema_to_json() {
        let metadata = [("key1".to_string(), "value1".to_string())].iter().cloned().collect::<HashMap<_, _>>();
        let fields = vec![Field::new("a", DataType::Int32, true).with_metadata(metadata.clone())];
        let schema = Schema::new(fields).with_metadata(metadata.clone());
        let json = schema_to_json(&schema);
        assert_eq!(
            serde_json::to_string_pretty(&json).unwrap(),
            serde_json::to_string_pretty(&serde_json::json!({
                "fields": [
                    {
                        "name": "a",
                        "nullable": true,
                        "type": {
                            "bitWidth": 32,
                            "isSigned": true,
                            "name": "int"
                        },
                        "children": [],
                        "metadata": {
                            "key1": "value1"
                        },
                    },
                ],
                "metadata": {
                    "key1": "value1"
                },
            }))
            .unwrap()
        );
    }
}

Expected behavior Expected:

{
  "fields": [
    {
      "children": [],
      "metadata": {
        "key1": "value1"
      },
      "name": "a",
      "nullable": true,
      "type": {
        "bitWidth": 32,
        "isSigned": true,
        "name": "int"
      }
    }
  ],
  "metadata": {
    "key1": "value1"
  }
}

Found:

{
  "fields": [
    {
      "children": [],
      "name": "a",
      "nullable": true,
      "type": {
        "bitWidth": 32,
        "isSigned": true,
        "name": "int"
      }
    }
  ],
  "metadata": {
    "key1": "value1"
  }
}

The metadata key is available at the schema level but missing for fields. Additional context Tested with arrow version = "53.1.0"

alamb commented 2 weeks ago

Thanks for the report -- so that sounds like field metadata is being lost somewhere?

pshampanier commented 1 week ago

Not lost, just omitted at serialization:

https://github.com/apache/arrow-rs/blob/0e9abcd69eedb4080f74e0631ca3cf065cf6553e/arrow-integration-test/src/field.rs#L266-L296

Line 292 should be:

 "children": children,
 "metadata": serde_json::to_value(field.metadata()).unwrap()

Just like in: https://github.com/apache/arrow-rs/blob/0e9abcd69eedb4080f74e0631ca3cf065cf6553e/arrow-integration-test/src/schema.rs#L24-L29

I tested it, and it's working fine.

alamb commented 1 week ago

Would you be willing to create a PR to fix this issue @pshampanier ?

Thank you