avro-kotlin / avro4k

Avro format support for Kotlin
Apache License 2.0
188 stars 36 forks source link

Support avro single object encoding #141

Open jangalinski opened 1 year ago

jangalinski commented 1 year ago

Currently, the README states that single object encoding is not supported.

Following up on the discussion in #138, I believe this should be possible to implement.

From the Specification:

Single Avro objects are encoded as follows:

  1. A two-byte marker, C3 01, to show that the message is Avro and uses this single-record format (version 1).
  2. The 8-byte little-endian CRC-64-AVRO fingerprint of the object’s schema.
  3. The Avro object encoded using Avro’s binary encoding.

Binary encoding is already supported, and the fingerprint of the schema should be accessible from

SchemaNormalization.parsingFingerprint64(Avro.default.schema(MySerializableDataClass.serializer()))

so serialization should come down to concatenating bytes.

For deserialization, we need to deconstruct the single object encoded byte array, check the first two marker bytes, extract the fingerprint, lookup the writer schema from a SchemaStore and decode the remaining binary payload to GenericRecord and then to the target data class.

jangalinski commented 1 year ago

I will try to provide this feature.

jangalinski commented 1 year ago

see #142 142

Chuckame commented 4 months ago

Hello, after some work at revamping the Avro api #186 (WIP), there is a new easier and promising way of implementing the single object (for the moment, named serialization delegate). The avro input and output stream would be removed. Don't hesitate to comment it as you worked hard to integrate this single object format in the current version of avro4k.

Thanks for being patient and sorry for the delay again.