Benchmark including I/O

elbaro commented 2 years ago

Adding a benchmark including I/O time would be useful. Usually serialization/deserialization involves network or disk I/O, and the medium affects the overall performance.

For example, it is unclear from the table in README.md whether rkyv/speedy are faster than prost in real scenarios because the disk read is faster with protobuf. Probably rkyv/speedy are faster than protobuf in LAN. Are they faster in SSD? We do not know.

These additions will help readers make a quick choice on the library.

HDD with low IOPS or RPM + Serialization/Deseiralization
HDD with high IOPS or RPM + Serialization/Deseiralization
SSD + Serialization/Deseiralization
Network 10/100/1000 Mbps + Serialization/Deseiralization

djkoloski commented 2 years ago

I think this would be nice to have, but I'm unsure how feasible/useful it would be to provide these numbers. The IO time should depend solely on the IO speed of the device for reads and writes, since an apples-to-apples comparison would:

Serialize all the bytes then write them to media with a buffered writer
Read all the bytes from media with a buffered reader then deserialize them

Benchmarking an mmap to acquire the bytes could be useful for some users, but I would definitely be concerned about the portability of those results across hardware. Even HDDs can have a lot of variability in performance across manufacturers, especially with regards to random-access reads.

Additionally, there's currently no dedicated hardware for benchmarking and I do not have some of the suggested configurations. Perhaps some of these (like Network) could be emulated?

If you have any suggestions on how to approach these problems, I'd be glad to hear them!

elbaro commented 2 years ago

I meant to suggest providing very approximate numbers like this.

For example, consider a case of log:

Format / Lib	Serialize	Deserialize	Size	Zlib
rkyv	306.63 us	3.2056 ms* 3.9919 ms*	1011488	269353

Using this very rough number Read 1,000,000 bytes sequentially from disk: 825,000ns, ignoring disk-seek time and other factors, reading 1011488 bytes on HDD takes 1011488/1000000*825us = 834us, which already dominates the unverified zero-copy numbers. (834us >> 16.632us) This tells us that zero-copy deserialization with abomination vs flatbuffers vs rkyv vs alkahest makes little difference on HDD.

Zero-copy deserialization speed

Format / Lib	Access	Read	Update
abomonation	36.589 us*	57.773 us*	‡
capnp	146.66 ns*	496.42 us*	‡
flatbuffers	2.9546 ns* 2.0092 ms*	137.99 us* 2.1892 ms*	‡
rkyv	1.3871 ns* 756.30 us*	16.632 us* 776.72 us*	66.600 us
alkahest	2.0442 ns*	81.230 us*	‡

So I agree many details (HDD rpm, manufacture, mmap, buffered or not, usage pattern, ..) affect the result, I still find it useful that we can eyeball the orders of magnitude of the result. There are several options:

Provide a script with IO that people can run on their own hardware.
Pick a specific hardware and usage pattern, and clarify 'Seagate 5400rpm mmap sequential read 10MB ... with warmup'.

..or add a warning that HDD latency dominates some numbers but network/SSD latencies are negligible. Just realized that network/SSD may be fast enough to be ignored.

djkoloski commented 2 years ago

Those are good suggestions. I think that, of the options available, the first would probably be the gold standard and the second would be good for people passing by. If you really want to know how each library will perform on your hardware, you'll need to run them in the proper environment. How about these concrete actions:

[ ] Add benchmarks that read/write the serialized bytes for each library with a user-specified source/destination. These results would be omitted from the published results, but will allow users to run their own tests. This might be a bit more difficult than it sounds.
[ ] Add some links to external sources that can provide HDD/SSD/Net read/write speeds. UserBenchmark seems to have an abundance of quality numbers. Along with the link you provided, I think those would be suitable.
[ ] Provide a table that provides the serialized size for each library along with estimations of the read/write times to each of HDD, SSD, and network. This will mostly just be data processing. In the interest of keeping the main markdown file readable (this would add 6 columns), it's worth considering separating these out into a separate markdown file so that those interested can get an idea of roughly what numbers to expect. Maybe discoverability could be increased by linking or adding collapsed sections instead? I'm open to suggestions on this.

As part of this, I'll probably also need to nicen up the formatting tool and get it in version control as well.

djkoloski / rust_serialization_benchmark

Benchmark including I/O #22

Zero-copy deserialization speed