djkoloski / rust_serialization_benchmark

Benchmarks for rust serialization frameworks
567 stars 49 forks source link

Zero-copy serialization frameworks (capnproto, flatbuffers) are misrepresented #84

Closed aljazerzen closed 3 days ago

aljazerzen commented 6 days ago

The core selling idea of "zero-copy frameworks" (capnproto, flatbuffers) is that there is "no serialization to be done".

This benchmark works with plain Rust types and benches how fast they can be converted into serialized representation.

"zero-copy frameworks" work with some special (generated) type definitions that are fancy pointers into a buffer. That buffer is the serialized representation of the types, so serializing would take literally 0qs, always.


I'm not saying "zero-copy frameworks should have higher scores somehow".

I'm saying "if you don't need plain Rust types, there is an option to have infinitely faster serialization".

aljazerzen commented 6 days ago

(I want to leave this note here for people coming to compare serialization, please don't close the issue.)

(Please do hide this comment when you read it.)

And also: maintainers of this repo, thank you for your work!

mumbleskates commented 4 days ago

if there is no serialization step as you argue, there is nothing to bench; this is not an interesting assertion, imo.

if the (opinionated and intrusive) zero-serialization types are to be used as you say, they would already be used directly. however, if data is not already of these buffer types, it needs to be converted into it; building this buffer and its relative pointers and so on is effectively the transmuted work of serialization, and it has to happen at some point (since data in the world is typically not already in this zero-copy form). This work may be amortized out among all the stores as the value is built up incrementally during the program, but it is not free! Note that the "serialization" cost of this populating operation on flatbuffers is significantly higher than many other serialization techniques which work differently.

benching the cost of the bookkeeping involved in constructing the zero-copy buffer is completely fair.

mumbleskates commented 4 days ago

i think overall if someone is going to choose a zero-copy framework, they're going to probably have some idea of the reasons to do it given the extra friction it involves

djkoloski commented 3 days ago

The core selling idea of "zero-copy frameworks" (capnproto, flatbuffers) is that there is "no serialization to be done".

I think this is the fundamental misunderstanding. The "zero-copy" feature of "zero-copy frameworks" is not skipping serialization, it's skipping deserialization. Per the terminology in the README:

All tests benchmark the following properties (time or size):

  • Serialize: serialize data into a buffer
  • Deserialize: deserializes a buffer into a normal rust object ...

Zero-copy deserialization libraries have an additional set of benchmarks:

  • Access: accesses a buffer as structured data
  • Read: runs through a buffer and reads fields out of it
  • Update: updates a buffer as structured data

Use these definitions.

This benchmark works with plain Rust types and benches how fast they can be converted into serialized representation.

"zero-copy frameworks" work with some special (generated) type definitions that are fancy pointers into a buffer. That buffer is the serialized representation of the types, so serializing would take literally 0qs, always.

Zero-copy frameworks can either provide serialization/encode logic (rkyv, abomonation, alkahest somewhat), or omit serialization/encode logic (flatbuffers, capnproto). So rkyv, abomonation, and alkahest are definitely reasonable to benchmark serialization performance for. Flatbuffers and capnproto give you the tools to write serialization code but leave it up to you to write them. These benchmarks contain manually-written serialization code for those frameworks and intend to accurately portray the cost of serializing using them.

Deserializing is different because there are no utilities provided, and so benchmarking would end up being a benchmark of our own code. That's why serialization benchmarks are included for all zero-copy frameworks but deserialization is not.

I'm not saying "zero-copy frameworks should have higher scores somehow".

I'm saying "if you don't need plain Rust types, there is an option to have infinitely faster serialization".

I don't agree with this statement. We never benchmark the cost of creating the Rust data structures that get serialized, and serialization always takes time whether it's zero-copy or not. You could argue that ZCD provides infinitely faster deserialization, but even that's somewhat misleading as there's always a cost to access the data (admittedly, this is what capnp claims nonetheless). That's why we provide access and read benchmarks as well.

I am going to close this comment because I think that all of the relevant points have been addressed and I don't see any further avenues for action.