Open Yoric opened 5 years ago
I did some model development, based on these assumptions:
For now I have been estimating parameters this way:
binjs_encode advanced entropy -i X
etc. with a dictionary built from exactly X (excluding the dictionary size, naturally) divided by the size of brotli -q 11 -w 20 X
.Some obvious ways to elaborate the model:
Here's some takeaways from even this basic model:
Focus on producing better compression than Brotli when hitting a dictionary of any size.
I have difficulties parsing this sentence. Does this mean that we shouldn't care about dictionary size yet, just producing better compression than Brotli?
On the upside, there are a number of levers that we haven't used yet to improve compression.
"Success/defeat" may be very site specific and depend on site-specific factors like repeat visit rate and code churn.
Good point.
Sorry for the slow reply, I missed this earlier.
Focus on producing better compression than Brotli when hitting a dictionary of any size.
I have difficulties parsing this sentence. Does this mean that we shouldn't care about dictionary size yet, just producing better compression than Brotli?
For now, yes.
It would be ideal if dictionaries + data were smaller than Brotli. If that is not possible, data has to be smaller than Brotli; then we have a chance to amortize the extra dictionary cost over time. (That situation is not necessarily feasible depending on how often people repeat visits and how dictionaries age; however the situation where the data, without the dictionary, is bigger than Brotli is definitely infeasible.)
We need compression criteria to determine what is good enough. This will help us:
Assigning @dominiccooney, as discussed.