TimelyDataflow / abomonation

A mortifying serialization library for Rust
MIT License
322 stars 30 forks source link

Support Write trait #7

Closed frankmcsherry closed 6 years ago

frankmcsherry commented 6 years ago

This PR is a major version bump, and is a breaking change in a bunch of ways. The most significant is that encode and entomb are both generic with respect to a W: Write, and return an ::std::io::Result<()> rather than nothing at all. The second most significant is that embalm doesn't exist any more, and that was the method that cleaned up your pointers to avoid leaking info about your memory addresses.

These two were fundamentally in conflict, and the second was also in conflict with performance (as we will see). I'm open to thoughts on their reconciliation; embalm is not conceptually complicated, it just requires post memcpy write access to the writer.

Most of this PR is simplification and result propagation. There are some performance wins in the benchmarks, presumably due to not having to return to the data to overwrite addresses. The benchmarks should be unaffected on the decode side, as no code was changed there. Other than not erasing addresses, the formats should be the same too (equivalent to in memory representation).

Here are the benchmark changes in bench.rs:

benchmark old new
empty_enc 3428 MB/s 24000 MB/s
string10_enc 4360 MB/s 6320 MB/s
string20_enc 6050 MB/s 9042 MB/s
u64_enc 58685 MB/s 67900 MB/s
u8_u64_enc 60411 MB/s 57454 MB/s
vec_u_s_enc 5639 MB/s 8408 MB/s
vec_u_vn_s_enc 5543 MB/s 10378 MB/s

And the changes in serde.rs:

benchmark old new
bench_serialize 6469 MB/s 10274 MB/s

These are pretty decent improvements where there were pointers to update, and the only "regression" is u8_u64, which I have to imagine is just down to randomness.

At the same time, we should now be able to encode into types other than Vec<u8>, including but not limited to &mut [u8], File, TcpStream, and other friends. Note that &mut [u8] will error if it isn't large enough to hold the results, and so you should use the new AbomonationSize trait to be sure that it is before calling encode.

frankmcsherry commented 6 years ago

For example, once candidate reconciliation would be to write a method

fn embalm<T: Abomonation>(bytes: &mut [u8])

which interprets bytes as a &T and performs all of the appropriate address sanitization, retaining that ability for people who are excited about this, but making it be part of a second pass for people who are less excited about it.

frankmcsherry commented 6 years ago

I think the ability to put in embalm afterwards is a good solution. I'm less certain I know how to sanitize things with all the new "stash discriminators in special fields" logic going on, so happy to have that demoted as a core feature.