flavray / avro-rs

Avro client library implementation in Rust
MIT License
169 stars 95 forks source link

Append to an existing file? #160

Open joshua-cooper opened 4 years ago

joshua-cooper commented 4 years ago

Is it possible to append to an existing file using this crate?

Writer will always add headers so I don't think it can be used for this.

poros commented 4 years ago

I honestly never tried, but I think that the Avro reader should be able to pick up a new header just fine. Have you given it a try already?

joshua-cooper commented 4 years ago

Correct me if I'm wrong but to append using the reader I would need to read the entire file into memory first. Since the files I'm working with can be arbitrarily large that won't be possible.

I think there needs to be a way to opt out of writing the headers to get around this. Perhaps it's possible with the lower level parts of the crate but I haven't had any luck so far.

JuliDi commented 3 years ago

Did you find any solution for this?

I am looking for a way to append additional fields/data to a file. Preferably without reading the whole file first and then writing the whole file again.

Something like this, using the Readme example: First schema:

        let raw_schema = r#"
    {
        "type": "record",
        "name": "test",
        "doc": "just for testing purposes",
        "fields": [
            {"name": "a", "type": "long", "default": 42},
            {"name": "b", "type": "string"},
        ]
    }
"#;

Updated schema:

        let raw_schema = r#"
    {
        "type": "record",
        "name": "test",
        "doc": "just for testing purposes",
        "fields": [
            {"name": "a", "type": "long", "default": 42},
            {"name": "b", "type": "string"},
            {"name": "c", "type": "long", "default": 43}
        ]
    }
"#;

And now I would like to read (with the updated schema) an Avro file that has been created with the first schema and append the field "c" to that file. Is this possible with avro-rs?

Opening the file and just doing something like

        let mut record = Record::new(writer.schema()).unwrap();
        record.put("c", 33i64);

        // schema validation happens here
        writer.append(record).unwrap();

does not work for me (Validation error). If this was (in theory) the right approach, I could provide a complete example of what I have tried.

joshua-cooper commented 3 years ago

Did you find any solution for this?

Not yet unfortunately.

poros commented 3 years ago

Have you tried using the https://docs.rs/avro-rs/0.11.0/avro_rs/fn.to_avro_datum.html and https://docs.rs/avro-rs/0.11.0/avro_rs/fn.from_avro_datum.html functions by any chance? They don't do some of the validation and header handling that writer does and perhaps they could work better in a "seek file then read" kind of scenario.

JuliDi commented 3 years ago

Thanks, I'll give that a try!