Sort variable names by frequency

Rich-Harris commented 7 years ago

I was wondering how uglify-es was able to compete with Butternut by only mangling variable names. For example, Preact:

	minified	zipped	time
butternut	8152	3398	20ms
uglify-es	8074	3355	167ms
uglify-es (mangle only)	8456	3381	45ms

Butternut's unzipped output is closer to that of Uglify with default settings than with just mangling, yet in this case the zipped output is actually smaller than Butternut's (normally it's the other way around, but it's often close).

Turns out that Uglify is doing something fiendishly clever — it's computing the frequency of characters that end up in the output, and assigning the most common ones first. So instead of a, b, c etc you might have o, a, n and so on.

We could possibly do something similar. Since Butternut isn't 'generating' code, perhaps the way to do it would be to replace variable names with some cryptic Unicode (e.g. instead of a, b, c we do ⊂0⊃, ⊂1⊃, ⊂2⊃) then at the end of the process tally up all the characters that aren't enclosed in ⊂...⊃ and replace the variable names with a regex.

Obviously we'd need to ensure we were using characters that weren't in the code in the first place.

kzc commented 7 years ago

Just curious - which version of node were used to produce those timings?

My results are very different with latest butternut 27639ce506848565c3bf42d061aed42519c2f425

node 6.9.0:

$ /usr/bin/time node690 bin/squash test/fixture/input/preact.js | wc -c
        0.22 real         0.23 user         0.02 sys
    8151

$ /usr/bin/time node690 node_modules/uglify-es/bin/uglifyjs -cm -- test/fixture/input/preact.js | wc -c
        0.46 real         0.54 user         0.03 sys
    8075

$ /usr/bin/time node690 node_modules/uglify-es/bin/uglifyjs -m -- test/fixture/input/preact.js | wc -c
        0.26 real         0.28 user         0.02 sys
    8457

node 7.7.3:

$ /usr/bin/time node773 bin/squash test/fixture/input/preact.js | wc -c
        0.19 real         0.21 user         0.02 sys
    8151

$ /usr/bin/time node773 node_modules/uglify-es/bin/uglifyjs -cm -- test/fixture/input/preact.js | wc -c
        0.46 real         0.51 user         0.02 sys
    8075

$ /usr/bin/time node773 node_modules/uglify-es/bin/uglifyjs -m -- test/fixture/input/preact.js | wc -c
        0.25 real         0.26 user         0.02 sys
    8457

Rich-Harris commented 7 years ago

7.8.0. This is using Benchmark.js which is probably somewhat misleading as everything gets warmed up, whereas minifiers usually run cold. I intend to replace the current benchmarks with more realistic ones that run each minifier once in a fresh process.

kzc commented 7 years ago

My timings from the command line are pretty cold. :-)

Here's the preact numbers from the previous bench in https://github.com/Rich-Harris/butternut/pull/44

preact.js (20.5 kB) without sourcemap:
  ✓ babili             :  8.41 kB /  3.47 kB in 377ms
  ✓ butternut          :  8.15 kB /   3.4 kB in 29ms
  ✓ closure            :  7.89 kB /  3.35 kB in 2.4s
  ✓ uglify             :  8.07 kB /  3.35 kB in 119ms
  ✓ uglify-mangle-only :  8.46 kB /  3.38 kB in 31ms
  ✓ uglify-es          :  8.07 kB /  3.35 kB in 144ms

Edit: same machine used for timings seen in https://github.com/Rich-Harris/butternut/issues/110#issuecomment-301896527

kzc commented 7 years ago

@Rich-Harris You'll get a kick out of this:

Running Closure produced three.min.js through Uglify saves additional 1744 bytes uncompressed, 2860 gzipped

https://github.com/mrdoob/three.js/issues/11003

Props to @mishoo on the Uglify mangling algorithm.

Rich-Harris commented 7 years ago

That's wild! Is that mostly due to the mangling, or is it hard to know?

kzc commented 7 years ago

Hmm. I thought it was primarily due to mangling, but that's not the case.

Rather than building it this time, I grabbed the file from a CDN:

original size, not gzipped:

$ wc -c three.min.js
  510005 three.min.js

original gzipped:

$ cat three.min.js | gzip | wc -c
  129119

uglified with compress=false, mangle=false, gzipped:

$ cat three.min.js | bin/uglifyjs | gzip | wc -c
  127298

Very strange! 1821 bytes were saved without compress and without mangle - just eliminating whitespace and perhaps making numbers more compact? That doesn't make much sense - unless there's a ton of copyright comments that were stripped.

Let's call 127298 the baseline.

uglified with compress=false, mangle=true, gzipped:

$ cat three.min.js | bin/uglifyjs -m | gzip | wc -c
  126549

mangle savings: 749 bytes from the baseline

uglified with compress=true, mangle=false, gzipped:

$ cat three.min.js | bin/uglifyjs -c | gzip | wc -c
  127243

compress savings: 55 bytes from the baseline

uglified with compress=true, mangle=true, gzipped:

$ cat three.min.js | bin/uglifyjs -cm | gzip | wc -c
  126369

compress+mangle savings: 929 bytes from the baseline

So mangle did have some effect, but not to the extent I thought.

Rich-Harris / butternut

Sort variable names by frequency #110