Ten0 / serde_avro_fast

An idiomatic implementation of serde/avro (de)serialization
GNU Lesser General Public License v3.0
20 stars 4 forks source link

Object container file encoding serialization support #4

Closed droundy closed 10 months ago

droundy commented 11 months ago

The documentation I can find involves (raw data, no headers…), but it would be nice to have clear a top-level example showing how to create a proper avro file (or Vec that could be written to a file) and how to read one (including confirmation that the header matches the schema).

Ten0 commented 11 months ago

IIUC you're talking about object container file encoding.

If that's indeed the case, it is documented here : https://docs.rs/serde_avro_fast/0.3.2/serde_avro_fast/object_container_file_encoding/index.html

However I haven't implemented the serialization to object container files yet (only got deserialization atm), notably because it involved a few opinionated choices with regards to blocks sizes.

I should probably have a look at the sizes chosen by other implementations and make it possible for the user to choose it.

Another related potential improvement would be to update the reader so that if blocks are not(/never) too large, it doesn't use stream decompression but instead decompresses block by block. (This is a compromise between how much time it takes to to obtain the first record of the block, and how much time total it takes to process the file.)

Is that indeed object container file encoding that you're talking about? (If yes I may implement it soon.)

Ten0 commented 11 months ago

So I've been working on this and I'll be releasing a new version with writer support soon. (And a fix in the reader which would only read the first block.)

Ten0 commented 10 months ago

I have published object container file Writer support in v0.4.0. 🚀

Documentation is here for Writer and here for write_all.

Please let me know if anything is unclear 😊