lz4hc compression support

Hi,

generally yes, I would really like to have it in lz4_flex. I did some experiments some time ago, but they are outdated now.

I think conceptually it should be quite simple, instead of having one entry in the hashmap to search for duplicates back in the stream we would have multiple buckets for one hash and they would all get checked to find the best one. There may be other strategies, but I think this is the main one, for better compression in lz4.

Multiple Positions per Hash in the Hashmap

e.g. imagine this byte sequence

[1, 2, 3, 4, 98, 1, 2, 3, 4, 97, 1, 2, 3, 4, 98]

While walking over the data, we put hash of every 4 bytes and its position into the hashmap.

1, 2, 3, 4 -> (hash 999, position 0) 2, 3, 4, 96 -> (hash 555, position 1)

... when we arrive a the second entry, currently it would overwrite the old entry 1, 2, 3, 4 -> (hash 999, position 5)

If we would keep multiple entries for the hash, the last sequence 1, 2, 3, 4, 98, could compare both entries and see that the first is a longer match.

Currently the hashmap is simply a Vec, and the hash is right shifted to fit in the Vec bounds. It should be possible to double (or quadruple) the Vec and then have multiple posititons for one hash.

PSeitz / lz4_flex

lz4hc compression support #21

Multiple Positions per Hash in the Hashmap