PSeitz / lz4_flex

Fastest pure Rust implementation of LZ4 compression/decompression.
MIT License
461 stars 31 forks source link

Include wider set of comparison points #10

Closed 101arrowz closed 3 years ago

101arrowz commented 3 years ago

Hello! I'm the author of fflate, the JS library you use as a comparison point in your WASM build of lz4_flex. I wanted to ask that you include a wider set of comparisons than just redundant JSON data to offer a more fair comparison of Zlib vs LZ4 and WASM vs JavaScript.

On an algorithmic level, I agree that LZ4 blows Zlib out of the water in terms of performance, but the main reason this is the case is Huffman coding, but your tests only include data that doesn't benefit too much from Huffman coding. The JSON you use in the comparison is highly compressible because it has many redundant keys and values. Specifically, it favors LZ references over Huffman coding (the other part of the DEFLATE algorithm, which is effectively the Zlib algorithm minus a few headers). If you were to use a large text file, say the raw text for Moby Dick, Huffman coding becomes much more important. From my tests, the WASM lz4_flex build takes 7ms to yield a 0.62 compression ratio, while fflate takes 30ms to yield a 0.46 compression ratio. This becomes more significant for data without many references; lz4_flex yields a ratio of 0.41 in 1465ms, fflate yields 0.28 in 5329ms on this Wikipedia title dump.

Also, you compare the performance of pure JavaScript and WASM through the demo of lz4js vs the WASM build of lz4_flex, but I believe the awful performance of lz4js is due to a lack of optimization and not necessarily JavaScript being slow. Proof of this is in the fact that my Zlib implementation was able to compress and decompress faster than it when I used a larger text sample, despite the Zlib algorithm being slower.

Zlib shouldn't be faster than LZ4

In addition, comparing fflate with wasm-flate or Node.js's C++-based Zlib implementation, it's faster than wasm-flate and within 20% of native Zlib, so I have reason to believe that a more optimized LZ4 library in JavaScript would perform much better.

For these reasons, I recommend mentioning that the focus of LZ4 is on extreme performance at the expense of worse compression ratios, and that the lz4js library is particularly unoptimized. In addition, adding a wider variety of test data would give context for the existence of Zlib at all, since at the moment LZ4 appears far superior in speed and Zlib appears to have only negligible improvements in compression ratio.

Regardless of those concerns, lz4_flex is impressive and definitely a step up from prior implementations. Congratulations on creating such a fast library, and thank you for giving it for free to the open source community :)

PSeitz commented 3 years ago

Hi, you are right. I should add a test where entropy encoders can play out their strengths. It wasn't my intention to skew the results, I just wanted to provide anything as comparison and JSON is just very common. I will add the the test files in the benches folder.

Also, there is measurement bug for the compression speed of the fflate, I used the compressed size and not the uncompressed size in the calculation, sorry for that.

In addition, comparing fflate with wasm-flate or Node.js's C++-based Zlib implementation, it's faster than wasm-flate and within 20% of native Zlib, so I have reason to believe that a more optimized LZ4 library in JavaScript would perform much better.

Hmm, I think if the native is not at least 2x faster, there is something wrong with the implementation, like missing optimizations or the comparison.

The wasm implementation is 4x slower here as the native one: lz4_flex unsafe 947 MiB/s 5017 MiB/s 0.2270

I have another test JS vs wasm benchmark here, where native is around 8x faster and wasm 4x faster than JS. https://github.com/PSeitz/wana_kana_rust/tree/master/bench_compare

PSeitz commented 3 years ago

https://pseitz.github.io/lz4-wasm/ includes now also two examples with plain text 65k and 1k.

I think there should be an example with 1MB JSON and text, but the 10MB dickens is too much, when opening the webpage.

101arrowz commented 3 years ago

Thanks!

Hmm, I think if the native is not at least 2x faster, there is something wrong with the implementation, like missing optimizations or the comparison.

V8 is more optimized than you may think. Try running the benchmark in fflate if you'd like. My implementation has many optimizations over the original Zlib, and JS is only marginally slower once the V8 JIT optimizes the hot spots, so I'm not much slower than native overall. Should as much effort go into optimizing WASM as it has JavaScript, no doubt WASM would be faster (or at least more memory/power efficient), but at the moment fflate outperforms WASM Zlib by 80% from my tests. Even when I ported my algorithmic improvements to Rust, 40% faster than JS at best (even though the Rust port outperformed all existing Zlib implementations, 15% faster than flate2.)

The idea "JS is slow" is due to bad programming habits the language allows, if not encourages. Also the fact it appeals to beginners and/or "agile" developers, many of which will write O(2^N) implementations and not care because it's good enough. Not necessarily poor performance of the engine. If you try hard enough, JS can be pretty fast.