101arrowz / fzstd

High performance Zstandard decompression in a pure JavaScript, 8kB package
MIT License
218 stars 11 forks source link

Remove maximum backreference distance limitation #1

Closed fabiospampinato closed 3 years ago

fabiospampinato commented 3 years ago

Would it be possible to remove the current 2^25 backreference limitation?

I don't think that's too much of a problem for me personally, but it's a bit of a "gotcha" that the library has, which I think eventually should be removed to move it closer to perfection.

101arrowz commented 3 years ago

This limit was chosen because 25 bits is the maximum number of useable bits that can consistently be read into a single number using the standard JS bit-reading algorithm. There are other algorithms I could employ, but they will incur a pretty severe performance penalty.

The main reason I found this acceptable was that the Zstandard specification makes a suggestion to all encoders to output at most an 8MB backreference, and to all decoders to support at least an 8MB backreference. Indeed, unless specific options are used (e.g. ultra mode or --long) the backreference distance is capped at 8MB in the standard zstd CLI tool. Since fzstd supports 32MB backreferences, I thought it a reasonable sacrifice to maintain acceptable performance.

I probably won't change this unless there is sufficient demand.

fabiospampinato commented 3 years ago

I see, makes sense.

Just out of curiosity: would using BigInts for this incur in a major performance penalty too? I'm thinking perhaps whether longer backreferences are needed or not could be detected and a decision could be made on whether to use regular numbers of BigIns from that?

101arrowz commented 3 years ago

BigInts aren't necessary because it is still possible to fit 32 bits into a single number; it just won't fit into an Smi (30-bit signed integer) in V8, causing performance to suffer. In fact, BigInts are even worse than non-Smi numbers for performance: they can have a 10-100x performance penalty.

101arrowz commented 3 years ago

Closing until there's more demand for a higher backreference distance.