djkoloski / rust_serialization_benchmark

Benchmarks for rust serialization frameworks
527 stars 48 forks source link

Bincode deserialization could be faster #25

Open ajeetdsouza opened 2 years ago

ajeetdsouza commented 2 years ago

I noticed that the structs you're using with serde-based deserializers actually have an owned String in them. However, bincode doesn't need to copy strings (I'm not aware about the others).

I'm actually using this in zoxide right now. Here's my struct definitions:

#[derive(Debug, Deserialize, Serialize)]
pub struct DirList<'a>(#[serde(borrow)] Vec<Dir<'a>>);

#[derive(Clone, Debug, Deserialize, Serialize)]
pub struct Dir<'a> {
    #[serde(borrow)]
    pub path: Cow<'a, str>,
    pub rank: Rank,
    pub last_accessed: Epoch,
}

Since the necessary bytes are already in the Vec<u8> that you're trying to deserialize, bincode will just give you a pointer to it instead of copying a whole new String into memory, which is much faster. You can verify this by checking that the deserialized struct actually contains a Cow::Borrowed. You might be able to use a regular &str too, although I'm not sure how bincode would handle different endianness in that case.

P.S. thanks for your fantastic work on rkyv and these benchmarks! Not only is rkyv's performance amazing, it's ideas like structver that would make this a serious contender for anyone wanting a serialization library. I'm currently using a poor man's structver myself, so the idea of rkyv handling this for me sounds absolutely great.

djkoloski commented 2 years ago

Thanks for the suggestion and the kind words. I think that adding zero-copy benchmarks for serde-based crates is a good idea, though some work will have to go into making a set of structs that are compatible. The deserialize benchmark is meant to benchmark the cost of creating a completely owned structure, which is why this would have to go in a new benchmark. This is a bit unintuitive, so I'll add some clarification to the deserialize benchmark description.

The deserialize benchmark bans all zero-copy features so that a fair comparison can be made between the zero-copy frameworks (e.g. rkyv, abomonation) and the more traditional serialization frameworks (e.g. bincode, serde-json). This is so that users who plan on completely deserializing their objects have a benchmark for all of the available options.

Two questions that I have are:

ajeetdsouza commented 2 years ago

How much ZCD should serde-like libraries do? Just strings?

As much as possible. In bincode's case, I don't think they have anything other than strings (str/OsStr/Path/[u8]). For your benchmarks, I think the natural way would be to add another entry for bincode (borrowed) in the first table itself with an asterisk explaining the situation. Bincode doesn't really fit in the ZCD table, because it's generating a native Rust struct, so there's no concept of access / update here.

Should the deserialize benchmark be measured by deserializing directly into an owned structure, or by deserializing into a borrowed structure and then cloning?

If there's a substantial difference, I think I'd want to see both. It would help people writing code in performance-sensitive contexts to know what to expect in each case.