UrbanOS-Public / smartcitiesdata

The core micro services of UrbanOS as an umbrella project with component documentation
Apache License 2.0
61 stars 11 forks source link

Should we compress messages to and from Kafka? #186

Closed jeffgrunewald closed 5 years ago

jeffgrunewald commented 5 years ago

Can we improve resource utilization and reduce risk of big datasets to the system by compressing messages when they're written to Kafka topics?

Questions:

Tech Note:

ACs

LtChae commented 5 years ago

Results:

*** &Xip.base/1 ***
1.1 sec    65K iterations   16.83 μs/op

*** &Xip.zlib/1 ***
1.3 sec    32K iterations   40.09 μs/op

*** &Xip.snappy/1 ***
1.3 sec    65K iterations   20.42 μs/op

*** &Xip.protobuf/1 ***
1.0 sec    32K iterations   30.75 μs/op

*** &Xip.lz4/1 ***
1.2 sec    65K iterations   19.26 μs/op

[byte_size: [base: 400, snappy: 370, zlib: 293, protobuf: 290, lz4: 367]]

Here we see that zlib is by far the slowest, but only achieves the same compression as protobuf for taking 10μ longer. Zlib's gzip format, however, is more portable.

Snappy performs well overall, tied with lz4 for the fastest speed. However its compression for a single message isn't much better than the base json.

Protobuf is a clear winner for speed and compression ratio, but ties our messages to a specific schema that we have to distribute and version. Some additional performance might be gained by making our SmartCity structs into protobuf(able) structs, saving the extra step of destructing them prior to restructing them into Protobuf generated structs.

Lz4 in general is supposed to be a superior compression scheme to snappy, but the erlang nif of it only barely edges it out.

LtChae commented 5 years ago

Methodology: https://github.com/SmartColumbusOS/Xip