101arrowz / fflate

High performance (de)compression in an 8kB package
https://101arrowz.github.io/fflate
MIT License
2.21k stars 77 forks source link

Dictionary support #162

Closed danielgtaylor closed 1 year ago

danielgtaylor commented 1 year ago

What can't you do right now? I can't seem to find any options to enable compression using a pre-filled dictionary like with zlib and pako. If you have a good idea of the possible contents being compressed you can get a much better compression ratio by using a dictionary on both the deflate and inflate operations (as long as inflate has access to the same dictionary). This works best for small datasets (a couple hundred bytes to a few KB) without much repetition in them.

As for use-case, I have a stateless static site for which I'm generating short URLs containing structured data where many of the field's contents can be known beforehand or built from a combination of a set of a few hundred names/words. The structured data is optimized to a few hundred KB, serialized into a binary format and then deflated using a dictionary. Pako is adding over 40Kb to my bundle size for the browser so I would love to use fflate instead.

An optimal solution It would be nice to add an optional parameter to pass such a dictionary.

(How) is this done by other libraries? Here's an example from Pako:

import { deflateRaw } from 'pako';

const marshalled = JSON.stringify({
  some: ["long", "data", "that's", "guessable"],
  without: "repetition",
  verified: true,
});

console.log("Marshalled:\n", marshalled);

const deflated = deflateRaw(marshalled, { level: 9 });
console.log(
  "Normal:\n",
  btoa(String.fromCharCode(...deflated).replace(/=*$/, ""))
);

const deflatedDict = deflateRaw(marshalled, {
  level: 9,
  dictionary: "somelongdatathat'sguessablewithoutrepetitionverifiedtrue",
});
console.log(
  "Dictionary:\n",
  btoa(String.fromCharCode(...deflatedDict)).replace(/=*$/, "")
);

Output:

Marshalled:
 {"some":["long","data","that's","guessable"],"without":"repetition","verified":true}
Normal:
 DcoxDoAgDEbhu/yLCyfgKsahhgpNkBpadDDeXab3De+F6cmIK6q2jIBETjNeyBebyIPNaK+MLeARLzocEZ0vdnHRNp+buxzCCdH74O8H
Dictionary:
 q1YqBupUsopWAulW0lECGQCkIGYAGXBjlGJ1lKBGKVkpIUwDqoEZqGQFMrIWAA

Using a dictionary for this silly example the output is 60% of the compressed size without a dictionary.

Some great docs about the feature:

101arrowz commented 1 year ago

I'll look into this. I know there's a standard dictionary spec for the Zlib format so I might try to piggyback off of that to maximize compatibility.

101arrowz commented 1 year ago

I've made a prototype of this feature work locally and will release it after more extensive testing.

I'm still implementing the decompression half though; that may take a few more days.

danielgtaylor commented 1 year ago

Awesome, I'm happy to hear that! I forgot to update here that I also did a tiny prototype for inflate with a dictionary, but found uzip's code a little simpler given I'm not a deflate expert. I couldn't figure out the deflate side unfortunately. Looking forward to switching to fflate instead!

101arrowz commented 1 year ago

Nice! That code should actually help me implement the feature in fflate, thanks for sharing!

101arrowz commented 1 year ago

Added in v0.8.0, thanks for your patience!