flavray / avro-rs

Avro client library implementation in Rust
MIT License
169 stars 95 forks source link

Converting Avro JSON to Avro Binary #161

Closed aaronaaeng closed 3 years ago

aaronaaeng commented 3 years ago

I'm writing a system to take Avro JSON and a schema and serialize the data to binary. Both of these will have already been validated by other systems; all I need to do is serialize them. I saw #154 which seems to be the same question but I am still struggling with it. Right now I'm trying to get an incredibly small example working.

let data = r#"{
    "stringVal": "this is a value"
}"#;

let schema = r#"{
    "name":"test",
    "type":"record",
    "fields":[
        {
            "name":"stringVal",
            "type":"string"
        }
    ]
}"#;

let schema = Schema::parse_str(schema).unwrap();
let value = avro_rs::to_value(data).unwrap();
let out = avro_rs::to_avro_datum(&schema, value);

When running this, out is just Err(Validation) with not other info. That seems to imply a schema/data validation problem but I can't figure out how to fix it. Is there something else I should be doing?

apohrebniak commented 3 years ago

@aaronaaeng Hi. The data variable is just a &str which is implementing the ser::Serialize serde trait and is serialized into the avro_rs::types::Values::String as a result. Then it is being validated against the avro_rs::types::Values::Record which is the type your schema describes. What you can do is to deserialize your string into the serde_json::Value and then pass it into the avro_rs::to_value or use https://github.com/flavray/avro-rs/blob/master/src/types.rs#L231

poros commented 3 years ago

Thanks for answering @apohrebniak ! I hope this clears the issue. If not, feel free to comment again.

aaronaaeng commented 3 years ago

Sorry I never responded to this one. I got it figured out but just wanted to briefly describe what I did in case anyone else comes across this.

@apohrebniak's answer would work in a lot of cases but didn't work for mine. The avro_rs::to_value method couldn't properly handle nested records which caused a whole host of errors. In the end, I made a copy of the data/schema validation method used within the library and changed it to build avro_rs::types::Value from serde_json::Value.

Basically, I turned something like this

pub fn validate(&self, schema: &Schema) -> bool {
    match (self, schema) {
                                  ⋮
        ((&Value::Long(_), &Schema::Long) => true,
                                  ⋮
    }
}

into this

fn json_to_avro(json: serde_json::Value, schema: &Schema) -> avro_rs::types::Value {
    match (json, schema) {
                                  ⋮
        (&serde_json::Value::Number(n), &Schema::Long) => avro_rs::types::Value::Long(n.as_i64().unwrap())
                                  ⋮
    }
}

for each possible case. From there, passing those avro values into avro_rs::to_avro_datum worked with no problems.

Using that method still has all of the same safety caveats discussed in the docs. But, for my usecase, it worked like a charm.