dropbox / pb-jelly

A protobuf code generation framework for the Rust language developed at Dropbox.
Apache License 2.0
610 stars 25 forks source link

[WIP] Zero Copy String Deserialization #88

Open ParkMyCar opened 3 years ago

ParkMyCar commented 3 years ago

I thought this was a pretty interesting task, adding zero copy de-serialization for Strings! Still a work in progress, but basically I created a type StrBytes which is a wrapper around a Bytes struct, and on creation we assert it's valid UTF-8, which it should be because based on the protobuf spec, strings are encoded in valid UTF-8.

Benchmarks (2014 MacBook Pro with an i7)

test bench::benches::bench_deserialize_string                               ... bench:     106,025 ns/iter (+/- 12,029)
test bench::benches::bench_deserialize_vec_bytes                            ... bench:     259,522 ns/iter (+/- 20,860)
test bench::benches::bench_deserialize_zero_copy_bytes                      ... bench:          98 ns/iter (+/- 14)
test bench::benches::bench_deserialize_zero_copy_string                     ... bench:      42,821 ns/iter (+/- 15,625)
test bench::prost::bench_deserialize_prost_bytes                            ... bench:     266,399 ns/iter (+/- 66,497)
test bench::prost::bench_deserialize_prost_string                           ... bench:     107,524 ns/iter (+/- 8,994)
test bench::rust_protobuf::bench_deserialize_rust_protobuf_zero_copy_bytes  ... bench:          49 ns/iter (+/- 10)
test bench::rust_protobuf::bench_deserialize_rust_protobuf_zero_copy_string ... bench:      44,083 ns/iter (+/- 12,499)

Note: The reason zero copy strings are not as fast as zero copy bytes is because we do the extra validation step

ParkMyCar commented 3 years ago

Because Strings encoded in a proto message should be UTF8, I added a feature flag zero_copy_string_no_utf8_check, to skip utf8 validation, when using zero-copy strings. Using this flag, we get performance similar to zero copy bytes

test bench::benches::bench_deserialize_zero_copy_bytes                      ... bench:          98 ns/iter (+/- 4)
test bench::benches::bench_deserialize_zero_copy_string                     ... bench:         101 ns/iter (+/- 2)