Open snej opened 6 years ago
KV-Engine did some comparison of Snappy, LZ4 and zstd recently for per-document compression (on mobile so don’t have the doc link to hand) but we found much lower compression ratios than the published numbers when compressing “typical” documents - I.e JSON of the order of 1K in size.
Much of the compression comes from having a good sized corpus, which isn’t the case when compressing just one doc at a time. We decided to stick with Snappy (as it’s already used in parts of the stack / more easily available), but LZ4 was the winner otherwise - zstd didn’t compress that much more than LZ4 and was a lot more expensive.
Much of the compression comes from having a good sized corpus, which isn’t the case when compressing just one doc at a time.
BLIP is now using inter-message compression — it uses a single compression context for the entire stream and runs each frame through it — so we do get [almost] the same compression as if it were all one big document.
(I say "[almost]" because we do have to flush the compressor at the end of every message, and that adds some overhead; it looks like zlib has to rewrite some Huffman encoding state at the start of every block.)
Cool; that'll certainly give different considerations.
BTW, I thought this was a really nice way of looking at the tradeoffs between the different algorithms: http://fastcompression.blogspot.co.uk/p/compression-benchmark.html
Alternate, newer compression algorithms we can consider:
Of these, Zstandard seems the most attractive. According to the comparison table on its home page, its speed is 3x that of zlib — almost as good as Snappy — while the compression ratio is even better than zlib. And a Go wrapper package is available.
lz4 is the speed champion but compresses about the same as Snappy (i.e. not great.)