Open tyrak opened 7 years ago
Good idea!
Internally, there could be a cost function: string => number
. For UTF-8 and UTF-16, the cost of теѕт
would be 8
:
UTF-8 : D1 82 D0 B5 D1 95 D1 82
UTF-16: 04 42 04 35 04 55 04 42
For ASCII, it would be 24
:
\u0442\u0435\u0455\u0442
123456789012345678901234
10 20 24
Babili could potentially run the script through all of the encodings to determine which is shortest.
How do I disable this string-literal-mangling feature? It's bloating my data-heavy files by a significant amount.
This file balloons up monstrously when run through babel-minify: https://github.com/TehShrike/majority-text-family-35-revelation/blob/master/revelation.json
Any chances it will be added before v1.0
? Significantly increases the bundle size by escaping UTF-8 chars in strings.
This seems to be something babel-core is doing, rather than babel-minify. If using the 'minify'
preset, it sets minified: true
by default.
Passing
{"presets": ["minify", "env"], "minified": false}
instead of
{"presets": ["minify", "env"]}
seems to be a workaround, although it results in more spaces in the output.
It shouldn't matter what the output of babel-core
is. If babel-minify
sees in its input a sting literal like "\u0442\u0435\u0455\u0442"
, and if it was told in its configuration that utf-8
output is acceptable, it should simply output "теѕт"
.
I understand that you are only suggesting a workaround, I just wanted to clear up any possible misconception that this may not be a babel-minify
bug.
Actually minify makes code larger for utf-8 files, for now you can use this:
code = code.replace(/\\u([\d\w]{4})/gi, (m, g) => String.fromCharCode(parseInt(g, 16)))
This seems to be something babel-core is doing, rather than babel-minify.
Your workaround helps, however, babel-minify
does this also, if babel-core
did not. At least, when used as part of babel-minify-webpack-plugin
, when minification becomes a separate stage...
I was experimenting with
babili
and found that the minified code it produces is significantly larger thanclosure
(100 KiB vs 120 KiB). As it turns out, the problem is caused by the waybabili
handles (unicode) string literals.Suppose that the code contains the string
"теѕт"
(all Cyrillic characters). Thenbabili
converts it to"\u0442\u0435\u0455\u0442"
. OTOH,closure
with the--charset utf8
option leaves the string in the original form. In fact, with that flag,closure
converts"\u0442\u0435\u0455\u0442"
to"теѕт"
.So I propose to introduce to
babili
an option similar toclosure
's--charset
. Of course, this should use a conservative setting by default (eg ascii), because otherwise the minified script would then require to be loaded withcharset="..."
in the<script>
tag.