binast / binjs-ref

Reference implementation for the JavaScript Binary AST format
https://binast.github.io/binjs-ref/binjs/index.html
Other
433 stars 38 forks source link

Compression criteria #227

Open Yoric opened 5 years ago

Yoric commented 5 years ago

We need compression criteria to determine what is good enough. This will help us:

Assigning @dominiccooney, as discussed.

dominiccooney commented 5 years ago

I did some model development, based on these assumptions:

For now I have been estimating parameters this way:

Some obvious ways to elaborate the model:

Here's some takeaways from even this basic model:

Yoric commented 5 years ago

Focus on producing better compression than Brotli when hitting a dictionary of any size.

I have difficulties parsing this sentence. Does this mean that we shouldn't care about dictionary size yet, just producing better compression than Brotli?

On the upside, there are a number of levers that we haven't used yet to improve compression.

"Success/defeat" may be very site specific and depend on site-specific factors like repeat visit rate and code churn.

Good point.

dominiccooney commented 5 years ago

Sorry for the slow reply, I missed this earlier.

Focus on producing better compression than Brotli when hitting a dictionary of any size.

I have difficulties parsing this sentence. Does this mean that we shouldn't care about dictionary size yet, just producing better compression than Brotli?

For now, yes.

It would be ideal if dictionaries + data were smaller than Brotli. If that is not possible, data has to be smaller than Brotli; then we have a chance to amortize the extra dictionary cost over time. (That situation is not necessarily feasible depending on how often people repeat visits and how dictionaries age; however the situation where the data, without the dictionary, is bigger than Brotli is definitely infeasible.)