Open Chuckame opened 10 months ago
@Chuckame, thanks for the comprehensive issue for this really important topic! I also had this in mind and think this is the way to go.
I think it would be cool to get rid of the avro dependency and create an extension library that maps our types to avro types. This would open the doors for multiplatform of the core library while still supporting users that rely on avro types.
Just as a little heads up, I've created a new branch https://github.com/avro-kotlin/avro4k/tree/way-to-multiplatform, which implements a PoC that directly encodes to avro binary without using the avro library. So far, everything looks very promising.
I've created a small benchmark that compares the speed of Avro4k (old), Avro4kDirect (new), and Jackson. Here are the results:
Benchmark Mode Cnt Score Error Units
Avro4kBenchmark.clients thrpt 3 29557,032 ± 3002,497 ops/s
Avro4kBenchmark.users thrpt 3 290324,243 ± 23988,233 ops/s
Avro4kDirectBenchmark.clients thrpt 3 583662,180 ± 68148,244 ops/s
Avro4kDirectBenchmark.users thrpt 3 425400,322 ± 29352,531 ops/s
JacksonBenchmark.clients thrpt 3 239120,408 ± 22238,596 ops/s
JacksonBenchmark.users thrpt 3 304450,006 ± 63307,328 ops/s
I will publish the code for the benchmark soon.
Open stuff for the branch:
So cool! And wait for my refacto to be merged, where it should improve a lot of little things 👼 And mostly the encoder/decoder architecture that was simplified
Can you create the PR as Draft to easily open discussions ? Done !
The sources of the benchmark used for the table above can now be found at https://github.com/avro-kotlin/kotlin-avro-benchmark.
Can you add a little of polymorphism? With using the sealed root type as the serializer descriptor for avro4k? E.g Avro.decode(RootType.serializer(), childInstance)
that way it have to lookup the good descriptor for the implementation during decoding. I think Jackson do the same, where you can force the root type to be used as the root serializer
I don't actually get what you mean, sry. Can you provide a PR?
@thake Just a global comment on the recent work on avro4k, but we should talk about the changes we want to do inside the codebase and synchronise. Because as I can see, I reworked all the codebase as already said, but you are also moving a lot of stuff (tests, classes). I'm just afraid about rebases and duplicate work. Are you available in next few days to meet (discord, zoom, google meet, ..) to clarify and maybe make some roadmap ?
encodeToByteArray
and decodeFromByteArray
so we are now able of implementing direct binary implementation without changing the API
Currently, the library is just interfacing as it can to the official apache avro library using GenericRecord and other GenericData stuff.
On avro4k codebase simplicity, it seems easy while it's not really since the codebase is doing a lot of adaptation to let the GenericData happy with the generic stuff we generated.
On performance side, we are checking, converting and adapting data to fit GenericData stuff, and then this adapted stuff is also tried to be re-adapted and checked, to be then serialized to binary. Also, all is runtime specific while kotlinx serialization is mainly prepared at build time (except contextual).
On user side, each user that just want to do avro format (and not directly use the generated GenericRecords) will have to call the apache avro library to serialize it to binary or json, or to use the libraries helping this stuff.
On compatibility and spec-fitting, we are like covering all the avro spec without knowing how it works. We just put the record from
Avro.toRecord()
into the apache avro lib, and 🎉 it works. On tests, we are currently not in testing the avro format compliance, but the generated GenericRecords with another generic record.And the last axis, kotlin multi platform. Since a big part of the codebase and the mechanisms is highly linked to the apache avro Java lib, the multiplaform dream seems complicated to reach.
Now, how to improve that? Let kotlinx serialization do his best and why it has been created : easily encode whatever the kotlin object. Have a look to the avro
Encoder
methods : https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/io/Encoder.javaAll the methods fits exactly to kotlinx encoders, and the way of calling them fits perfectly with kotlinx workflow :
The other advantage is that there is absolutely no primitive autoboxing, also no more callbacks hell thanks to direct use of avro Encoder as output.
@thake, any comment on this? You are in this project since a long time. This will be a big refactoring, while it could be game changer.
Plan for making the changes: