flavray / avro-rs

Avro client library implementation in Rust
MIT License
169 stars 95 forks source link

Record serialization is sensitive to order of fields in struct #47

Open collosi opened 6 years ago

collosi commented 6 years ago

I'm not sure this is not intended behavior, but I just got tripped up by this so I thought I'd point it out. If you declare a schema like this:

{
  "type": "record",
  "name": "my_record",
  "fields": [
    {"name": "a", "type": "long"},
    {"name": "b", "type": "string"},
  ]
}

and then try to de/serialize it into a rust struct of this type:

#[derive(Deserialize, Serialize)]
pub struct MyRecord {
  b: String,
  a: i64,
}

You will get an error, "value does not match schema". I'm not sure that should be an error, seems like there's an obvious way to translate from one to another.

The operation that is erroring for me is:

to_avro_datum(&record_scheme, to_value(record)?)?

flavray commented 6 years ago

Hmm good point, thanks for raising that. I feel like it should still be an error, as for these two different schemas:

"fields": [
  {"name": "a", "type": "long"},
  {"name": "b", "type": "string"},
]

and

"fields": [
  {"name": "b", "type": "string"},
  {"name": "a", "type": "long"},
]

Avro will encode them differently, and schema resolution will not succeed if writing from one and reading from the other. 🙂


Another limitation is that we currently do not rely on the schema to transform a Serialize-able struct into a Value. This could (probably) be changed if it is really needed, but would make the code a bit less straightforward.

collosi commented 6 years ago

Yeah, I understand the point about two differing schema, but I think it may be confusing for people coming from the Rust community where I don't think order of fields on a struct ever matters. I don't know much about other implementations of Serde serializers, but if, for example, many other implementations where able to handle this case, it might be a sticking point.

Either way, I don't think its critical, once you understand what's going on, it's easy to handle. Maybe a note in the documentation somewhere?

samschlegel commented 5 years ago

Avro fields are resolved by name not order, so reordering field order is completely legal according to the Schema Resolution rules. I haven't looked into avro-rs at all so I'm not sure how much work this would be, but this is how it's done in Java for reference. We generate a lot of our Avro schemas, so this is a crucial feature in practice.

EDIT: Also here are their parsing and resolution docs https://avro.apache.org/docs/1.8.1/api/java/org/apache/avro/io/parsing/doc-files/parsing.html

Pritesh-Patel commented 5 years ago

I just ran into this - was pretty confusing, if the plan is to stick this - it would be helpful if it was mentioned somewhere.

codehearts commented 4 years ago

I've changed the record validation loop to this just to remove the ordering constraint, although this implementation isn't the cleanest. This issue makes it difficult to enforce the presence of fields with a common Event struct when pre-existing schemas may not be in the same order, which is something I've been dealing with

            use std::collections::HashMap;
            if fields.len() == record_fields.len() {
                let data_fields: HashMap<String, Value> = record_fields.iter().cloned().collect();

                fields
                        .iter()
                        .try_for_each(|field| {
                            if let Some(value) = data_fields.get(&field.name) {
                                validate(value, &field.schema)
                            } else {
                                Err(Error::from(format!(
                                    "Expected field named {} but could not find it",
                                    field.name
                                )))
                            }
                        })
// …
mirosval commented 3 years ago

It appears that this is also a problem when you want to use #[serde(flatten)] as the Struct that gets spliced will mess up the order of the parent struct, fascinating.

martin-g commented 2 years ago

A PR with a fix at https://github.com/apache/avro/pull/1650

ctb commented 1 year ago

Looks like https://github.com/apache/avro/pull/1650 was merged - should this be closed?