Do not escape characters in string literals when they are supported by the specified encoding

babel / minify

:scissors: An ES6+ aware minifier based on the Babel toolchain (beta)

https://babeljs.io/repl

MIT License

4.39k stars 225 forks source link

Do not escape characters in string literals when they are supported by the specified encoding #619

Open tyrak opened 7 years ago

tyrak commented 7 years ago

I was experimenting with babili and found that the minified code it produces is significantly larger than closure (100 KiB vs 120 KiB). As it turns out, the problem is caused by the way babili handles (unicode) string literals.

Suppose that the code contains the string "теѕт" (all Cyrillic characters). Then babili converts it to "\u0442\u0435\u0455\u0442". OTOH, closure with the --charset utf8 option leaves the string in the original form. In fact, with that flag, closure converts "\u0442\u0435\u0455\u0442" to "теѕт".

So I propose to introduce to babili an option similar to closure's --charset. Of course, this should use a conservative setting by default (eg ascii), because otherwise the minified script would then require to be loaded with charset="..." in the <script> tag.

j-f1 commented 7 years ago

Good idea!

Internally, there could be a cost function: string => number. For UTF-8 and UTF-16, the cost of теѕт would be 8:

UTF-8 : D1 82 D0 B5 D1 95 D1 82
UTF-16: 04 42 04 35 04 55 04 42

For ASCII, it would be 24:

\u0442\u0435\u0455\u0442
123456789012345678901234
        10        20  24

Babili could potentially run the script through all of the encodings to determine which is shortest.

TehShrike commented 7 years ago

How do I disable this string-literal-mangling feature? It's bloating my data-heavy files by a significant amount.

TehShrike commented 7 years ago

This file balloons up monstrously when run through babel-minify: https://github.com/TehShrike/majority-text-family-35-revelation/blob/master/revelation.json

dmythro commented 6 years ago

Any chances it will be added before v1.0? Significantly increases the bundle size by escaping UTF-8 chars in strings.

Cyp commented 6 years ago

This seems to be something babel-core is doing, rather than babel-minify. If using the 'minify' preset, it sets minified: true by default.

Passing

{"presets": ["minify", "env"], "minified": false}

instead of

{"presets": ["minify", "env"]}

seems to be a workaround, although it results in more spaces in the output.

tyrak commented 6 years ago

It shouldn't matter what the output of babel-core is. If babel-minify sees in its input a sting literal like "\u0442\u0435\u0455\u0442", and if it was told in its configuration that utf-8 output is acceptable, it should simply output "теѕт".

I understand that you are only suggesting a workaround, I just wanted to clear up any possible misconception that this may not be a babel-minify bug.

rahbari commented 6 years ago

Actually minify makes code larger for utf-8 files, for now you can use this: code = code.replace(/\\u([\d\w]{4})/gi, (m, g) => String.fromCharCode(parseInt(g, 16)))

Mihara commented 6 years ago

This seems to be something babel-core is doing, rather than babel-minify.

Your workaround helps, however, babel-minify does this also, if babel-core did not. At least, when used as part of babel-minify-webpack-plugin, when minification becomes a separate stage...