binast / binjs-ref

Reference implementation for the JavaScript Binary AST format
https://binast.github.io/binjs-ref/binjs/index.html
Other
433 stars 38 forks source link

Investigate compressing string references with bitpacker #291

Closed Yoric closed 5 years ago

Yoric commented 5 years ago

See https://crates.io/crates/bitpacking.

Yoric commented 5 years ago

Protocol:

# Generate dictionary for Facebook.
$ cargo run --bin binjs_generate_prediction_tables -- --in tests/data/facebook/single/*.js --out /tmp/binjs/facebook/dict/

# Encode and generate streams using custom build using `BitPacker4x`.
$ cargo run --bin binjs_encode -- --in tests/data/facebook/single/*.js --out /tmp/binjs/facebook/ advanced entropy --split-streams

# Compare compression with brotli
$ cargo run --example investigate_streams -- /tmp/binjs/facebook/

Using https://github.com/Yoric/binjs-ref/tree/entropy-0.4-bitpacker.

Result:

File,                                     raw (b),  brotli (b),    lzma (b),
js,                                      43134534,     8016723,     8300975,
binjs,                                    9999879,     9963507,    10087205,
floats.content,                            466192,      251488,      250457,
identifier_names.content,                 3067168,     2482068,     2424498,
list_lengths.content,                     1057008,      718786,      764708,
property_keys.content,                    2072992,     1932783,     1917003,
string_literals.content,                  2366688,     1917712,     1870428,
unsigned_longs.content,                    203776,      157756,      173512,
main.entropy,                             1610749,     1600642,     1629942,
floats.prelude,                             13817,       10849,       13092,
identifier_names.prelude,                    2907,        1878,        2418,
identifier_names_len.prelude,                1012,         251,         806,
list_lengths.prelude,                         222,         498,        1818,
property_keys.prelude,                     930522,      255228,      291136,
property_keys_len.prelude,                  47918,       33494,       42749,
string_literals.prelude,                  1354161,      456487,      522092,
string_literals_len.prelude,                73044,       55172,       67582,
unsigned_longs.prelude,                         6,          14,          52,
Yoric commented 5 years ago

If I read correctly, we have.

Entropy 0.3.x (from https://github.com/binast/binjs-ref/issues/250):

Stream,                      before brotli (b), after brotli (b)
identifier_names.content,    4911852,           1213747
property_keys.content,       2785813,           1348177
string_literals.content,     3100358,           1495086

Bitpacker (see above):

Stream,                      before brotli (b), after brotli (b)
identifier_names.content,    3067168,           2482068
property_keys.content,       2072992,           1932783
string_literals.content,     2366688,           1917712

This makes bitpacker+brotli really, really worse than brotli.

Verdict: No Good.