PSeitz / lz4_flex

Fastest pure Rust implementation of LZ4 compression/decompression.
MIT License
441 stars 28 forks source link

Better documentation about what a "dictionary" is. #41

Open inodentry opened 2 years ago

inodentry commented 2 years ago

Looking at the documentation, I see that there are functions for compressing/decompressing "using an external dictionary". It is just a slice of bytes. What should those bytes be? How is it used by the algorithm?

I'm not super well versed in data compression theory and trying to learn...

My guess is that, since the algorithm works by finding back-references to previously-encountered data, the dictionary is just a bunch of bytes that will be treated as if they had come before the start of the actual data to compress? (to give the algorithm something it can refer to while it is still at the start of the compression sequence and hasn't encountered much "real" data yet)

Is my guess/assumption correct?

If so, would be nice if something about this gets added to the documentation...

PSeitz commented 2 years ago

Sorry for the late reply, but yes that's exactly how the dictionary works.