Use better compression algorithm than 'deflate'

snej commented 6 years ago

Alternate, newer compression algorithms we can consider:

Zstandard: "providing high compression ratios … a very wide range of compression / speed trade-off, while being backed by a very fast decoder"
lz4: "compression speed at 400 MB/s per core … extremely fast decoder, with speed in multiple GB/s per core"
Brotli: "compression ratio comparable to the best currently available general-purpose compression methods … similar in speed with deflate but offers more dense compression"
lzfse: "compresses with a ratio comparable to that of zlib (DEFLATE) and decompresses 2–3x faster while using fewer resources"
Snappy: "aims for very high speeds and reasonable compression … an order of magnitude faster [than zlib] for most inputs, but the resulting compressed files are anywhere from 20% to 100% bigger"

Of these, Zstandard seems the most attractive. According to the comparison table on its home page, its speed is 3x that of zlib — almost as good as Snappy — while the compression ratio is even better than zlib. And a Go wrapper package is available.

lz4 is the speed champion but compresses about the same as Snappy (i.e. not great.)

daverigby commented 6 years ago

KV-Engine did some comparison of Snappy, LZ4 and zstd recently for per-document compression (on mobile so don’t have the doc link to hand) but we found much lower compression ratios than the published numbers when compressing “typical” documents - I.e JSON of the order of 1K in size.

Much of the compression comes from having a good sized corpus, which isn’t the case when compressing just one doc at a time. We decided to stick with Snappy (as it’s already used in parts of the stack / more easily available), but LZ4 was the winner otherwise - zstd didn’t compress that much more than LZ4 and was a lot more expensive.

snej commented 6 years ago

Much of the compression comes from having a good sized corpus, which isn’t the case when compressing just one doc at a time.

BLIP is now using inter-message compression — it uses a single compression context for the entire stream and runs each frame through it — so we do get [almost] the same compression as if it were all one big document.

(I say "[almost]" because we do have to flush the compressor at the end of every message, and that adds some overhead; it looks like zlib has to rewrite some Huffman encoding state at the start of every block.)

daverigby commented 6 years ago

Cool; that'll certainly give different considerations.

BTW, I thought this was a really nice way of looking at the tradeoffs between the different algorithms: http://fastcompression.blogspot.co.uk/p/compression-benchmark.html

couchbaselabs / BLIP-Cpp

Use better compression algorithm than 'deflate' #6