Closed jeffgrunewald closed 5 years ago
Results:
*** &Xip.base/1 ***
1.1 sec 65K iterations 16.83 μs/op
*** &Xip.zlib/1 ***
1.3 sec 32K iterations 40.09 μs/op
*** &Xip.snappy/1 ***
1.3 sec 65K iterations 20.42 μs/op
*** &Xip.protobuf/1 ***
1.0 sec 32K iterations 30.75 μs/op
*** &Xip.lz4/1 ***
1.2 sec 65K iterations 19.26 μs/op
[byte_size: [base: 400, snappy: 370, zlib: 293, protobuf: 290, lz4: 367]]
Here we see that zlib is by far the slowest, but only achieves the same compression as protobuf for taking 10μ longer. Zlib's gzip format, however, is more portable.
Snappy performs well overall, tied with lz4 for the fastest speed. However its compression for a single message isn't much better than the base json.
Protobuf is a clear winner for speed and compression ratio, but ties our messages to a specific schema that we have to distribute and version. Some additional performance might be gained by making our SmartCity structs into protobuf(able) structs, saving the extra step of destructing them prior to restructing them into Protobuf generated structs.
Lz4 in general is supposed to be a superior compression scheme to snappy, but the erlang nif of it only barely edges it out.
Methodology: https://github.com/SmartColumbusOS/Xip
Can we improve resource utilization and reduce risk of big datasets to the system by compressing messages when they're written to Kafka topics?
Questions:
Tech Note:
ACs