Open Yoric opened 5 years ago
One way to do this would be to:
1/ somehow patch our webidl to expand Expression
and add @@LiteralExpression
, @@ArithmeticExpression
;
2/ have Shift=>ast conversion analyze the values it converts and, if useful, produce instances of @@LiteralExpression
or @@ArithmeticExpression
instead of the more general variants;
3/ similarly, convert back from @@LiteralExpression
or @@ArithmeticExpression
into their respective general variants when converting ast=>Shift.
Perhaps we could somehow inject a node
interface BinASTExpressionWithProbabilityTable {
expression: Expression;
table: string;
}
into Expression
, and then rely on heuristics to decide when to replace an expression foo
with BinASTExpressionWithProbabilityTable { foo, table }
.
For the time being, I'm going to assume that we can switch the probability tables, but not the content dictionaries.
On the Facebook test set, I see a very, very slight improvement (less than 1/1000).
Testing with https://github.com/binast/binjs-ref/pull/372
File | raw | brotli | ratio vs. master |
---|---|---|---|
js | 43134534 | 8016723 | 1 |
binjs | 8930118 | 8860979 | 1.00058085996408 |
floats.content | 336910 | 110130 | 1 |
floats.prelude | 126094 | 72487 | 1 |
identifier_names.content | 2524124 | 997583 | 1 |
identifier_names.prelude | 86304 | 52927 | 1 |
identifier_names_len.prelude | 26637 | 15806 | 1 |
list_lengths.content | 1985830 | 550497 | 1 |
list_lengths.prelude | 10284 | 12132 | 1 |
main.entropy | 1598083 | 1590968 | 0.993700399861591 |
property_keys.content | 1630992 | 986205 | 1 |
property_keys.prelude | 3150015 | 1010811 | 1 |
property_keys_len.prelude | 220936 | 144541 | 1 |
string_literals.content | 1813042 | 889629 | 1.01520359326537 |
string_literals.prelude | 6207076 | 1977802 | 1.00089320189855 |
string_literals_len.prelude | 380927 | 250932 | 1.00120096875486 |
unsigned_longs.content | 449125 | 91089 | 1 |
unsigned_longs.prelude | 4489 | 6337 | 1 |
Note It's not clear to me why there is any impact on string_literals*.prelude
.
~0.7% improvement on the main stream, ~1.5% regression on string_literals_content
. Overall, a slight regression. I suspect that we can improve it, though.
JavaScript has subsets that we can probably encode more efficiently, either with different dictionaries or different strategies.
For instance, if we know by earlier analysis that an expression is a JSON-style literal declaration, we could shift, for this subtree, to a dictionary that assigns non-0 probabilities only to literals. This would make compression more efficient on that subtree.
Similarly, if we know by earlier analysis that an expression is an arithmetic expression, we could shift to an arithmetic expression dictionary.