Open RReverser opened 4 years ago
Semi-relevant: I've noticed we also have uglify-js
in third_party
, which is an older version of terser
. I don't suppose we need both of these either?
uglify-js
seems to be used only in tools/js-optimizer.js
, which, in turn, doesn't seem to be used anywhere aside from one test? Am I missing external usages or can it and uglify-js be both removed? cc @sbc100 @kripken
Would be good to experiment with!
When I implemented the initial acorn integration I wrote this:
For outputting I experimented with astring, which is small and nice, and escodegen, which looks very robust, but neither could output compact-enough JS to not regress our JS code sizes.
Perhaps things have changed since then, worth checking.
Semi-relevant: I've noticed we also have uglify-js in third_party
tools/js-optimizer.js
has a bunch of optimization passes that still use uglify. Hopefully they can get rewritten for acorn eventually.
For outputting I experimented with astring, which is small and nice, and escodegen, which looks very robust, but neither could output compact-enough JS to not regress our JS code sizes.
Perhaps things have changed since then, worth checking.
escodegen has options for formatting - you can either output pretty-printed code, or a compact one.
Of course, it will never match Terser's minified output, but IIUC from your comment the original issue, that's a non-goal anyway, since we use Closure Compiler for proper minification.
Semi-relevant: I've noticed we also have uglify-js in third_party
tools/js-optimizer.js
has a bunch of optimization passes that still use uglify. Hopefully they can get rewritten for acorn eventually.
Can it be switched to Terser at least? They should be API-compatible (Terser is strict successor), and, for now, it would at least get rid of one old-ish dependency. (UPD: looking at versions, it seems to be UglifyJS@1.x, which actually is even older, and maybe not as compatible as I thought...)
I'm a big fan of any kind of cleanups / simplifications that can be done in this area!
Specifically if we can delete another checked in module from third_party I would be very happy.
escodegen has options for formatting - you can either output pretty-printed code, or a compact one.
I actually just noticed that we never had them documented on the wiki page apparently. That's odd, but just added for visibility: https://github.com/estools/escodegen/wiki/API#optionformat
tools/js-optimizer.js
has a bunch of optimization passes that still use uglify. Hopefully they can get rewritten for acorn eventually.
Okay, but where is it called from? I can't find a single place that would use it aside from one test...
Of course, it will never match Terser's minified output, but IIUC from your comment the original issue, that's a non-goal anyway,
Well, there is more here... See the full context in the link there, I just quoted one sentence, which was maybe not clear enough, sorry. To summarize the context too, we do expect closure to be used when code size is absolutely critical. But we don't run closure by default, due to speed and also that it may break custom code a user adds that isn't closure-safe. And the breakage issue means that some users can't use closure at all. So while we don't work to squeeze every last byte out of non-closure builds, we do want them to be tiny. So a significant regression would not be good, I think. That's why I ended up adding terser - it was easy, and it made a big difference at the time.
(UPD: looking at versions, it seems to be UglifyJS@1.x, which actually is even older, and maybe not as compatible as I thought...)
Correct, that code uses Uglify1, which is a different API. Those passes would be needed to be rewritten to use a modern AST. But after removing fastcomp we need a lot fewer of them...
And the breakage issue means that some users can't use closure at all. So while we don't work to squeeze every last byte out of non-closure builds, we do want them to be tiny. So a significant regression would not be good, I think. That's why I ended up adding terser - it was easy, and it made a big difference at the time.
Makes sense. I guess worth experimenting with FORMAT_MINIFY
quoted above and comparing final JS+gzip numbers on some real codebase to see how much we'd lose by leaving Terser.
As for the Closure breaking the code - it should be happening only under advanced optimisations, right? In a "simple" mode it does pretty much the same things as Terser does, so perhaps we could just switch between them.
Okay, but where is it called from?
Hmm... tools/js_optimizer.py
calls tools/js-optimizer.js
to run passes. And the python should be called from emcc.py
. But it looks like we removed it... which was 99% correct, but I think we removed too much @sbc100 ! We should be calling it to run safeHeap
, for example. That means we've lost SAFE_HEAP coverage on custom user JS code :sob:
I'll look into fixing that!
Heh, glad we found it I guess 😅
Okay, but where is it called from?
Hmm...
tools/js_optimizer.py
callstools/js-optimizer.js
to run passes. And the python should be called fromemcc.py
. But it looks like we removed it... which was 99% correct, but I think we removed too much @sbc100 ! We should be calling it to runsafeHeap
, for example. That means we've lost SAFE_HEAP coverage on custom user JS codeI'll look into fixing that!
But how was it possible we removed too much? Do we lack test coverage?
But how was it possible we removed too much? Do we lack test coverage?
I'm still looking, but I'm pretty sure the wasm backend path never applied safe heap on user code, and yes, we lacked a test so this was missed...
Oh, and the other place we use js_optimizer.py
, which exists and is correct, is for wasm2js, in building.py
(def js_optimizer
etc.).
Terser itself switched away from ESCodegen to astring - perhaps relevant to cinsider astring here as well?
Does anyone feel like working on this?
This issue has been automatically marked as stale because there has been no activity in the past year. It will be closed automatically if no further activity occurs in the next 30 days. Feel free to re-open at any time if this issue is still relevant.
Terser itself switched away from ESCodegen to astring - perhaps relevant to cinsider astring here as well?
Note that they switched it only in test utils, not as an implementation.
astring could be an interesting alternative but can't be used until https://github.com/davidbonnet/astring/issues/203 is resolved as Emscripten relies on at least whitespace minification.
Raising an issue for tracking: such switch should simplify and speed up code generation, and allow us to avoid custom patches we have in Terser.
Background (https://github.com/emscripten-core/emscripten/issues/11303#issuecomment-702853752):