emscripten-core / emscripten

Emscripten: An LLVM-to-WebAssembly Compiler
Other
25.75k stars 3.3k forks source link

Switch from Terser to escodegen #12410

Open RReverser opened 4 years ago

RReverser commented 4 years ago

Raising an issue for tracking: such switch should simplify and speed up code generation, and allow us to avoid custom patches we have in Terser.

Background (https://github.com/emscripten-core/emscripten/issues/11303#issuecomment-702853752):

@kripken: We don't use terser for minification. In tools/acorn_optimizer.js it's used to convert an acorn AST to JS (so it's in support of the passes in that file, which are almost all not minification-related). Maybe there's a better way or simpler tool for that? I remember I couldn't find one at the time.

@rreverser: Ah, that's what I thought after looking at the code. Yeah, there is estools/escodegen which, among other ESTree tools in that org, was created specifically as a simple AST-to-JS generator. It should be both simpler and much faster than going via all Terser's conversions.

RReverser commented 4 years ago

Semi-relevant: I've noticed we also have uglify-js in third_party, which is an older version of terser. I don't suppose we need both of these either?

uglify-js seems to be used only in tools/js-optimizer.js, which, in turn, doesn't seem to be used anywhere aside from one test? Am I missing external usages or can it and uglify-js be both removed? cc @sbc100 @kripken

kripken commented 4 years ago

Would be good to experiment with!

When I implemented the initial acorn integration I wrote this:

For outputting I experimented with astring, which is small and nice, and escodegen, which looks very robust, but neither could output compact-enough JS to not regress our JS code sizes.

Perhaps things have changed since then, worth checking.

kripken commented 4 years ago

Semi-relevant: I've noticed we also have uglify-js in third_party

tools/js-optimizer.js has a bunch of optimization passes that still use uglify. Hopefully they can get rewritten for acorn eventually.

RReverser commented 4 years ago

For outputting I experimented with astring, which is small and nice, and escodegen, which looks very robust, but neither could output compact-enough JS to not regress our JS code sizes.

Perhaps things have changed since then, worth checking.

escodegen has options for formatting - you can either output pretty-printed code, or a compact one.

Of course, it will never match Terser's minified output, but IIUC from your comment the original issue, that's a non-goal anyway, since we use Closure Compiler for proper minification.

RReverser commented 4 years ago

Semi-relevant: I've noticed we also have uglify-js in third_party

tools/js-optimizer.js has a bunch of optimization passes that still use uglify. Hopefully they can get rewritten for acorn eventually.

Can it be switched to Terser at least? They should be API-compatible (Terser is strict successor), and, for now, it would at least get rid of one old-ish dependency. (UPD: looking at versions, it seems to be UglifyJS@1.x, which actually is even older, and maybe not as compatible as I thought...)

sbc100 commented 4 years ago

I'm a big fan of any kind of cleanups / simplifications that can be done in this area!

sbc100 commented 4 years ago

Specifically if we can delete another checked in module from third_party I would be very happy.

RReverser commented 4 years ago

escodegen has options for formatting - you can either output pretty-printed code, or a compact one.

I actually just noticed that we never had them documented on the wiki page apparently. That's odd, but just added for visibility: https://github.com/estools/escodegen/wiki/API#optionformat

RReverser commented 4 years ago

tools/js-optimizer.js has a bunch of optimization passes that still use uglify. Hopefully they can get rewritten for acorn eventually.

Okay, but where is it called from? I can't find a single place that would use it aside from one test...

kripken commented 4 years ago

Of course, it will never match Terser's minified output, but IIUC from your comment the original issue, that's a non-goal anyway,

Well, there is more here... See the full context in the link there, I just quoted one sentence, which was maybe not clear enough, sorry. To summarize the context too, we do expect closure to be used when code size is absolutely critical. But we don't run closure by default, due to speed and also that it may break custom code a user adds that isn't closure-safe. And the breakage issue means that some users can't use closure at all. So while we don't work to squeeze every last byte out of non-closure builds, we do want them to be tiny. So a significant regression would not be good, I think. That's why I ended up adding terser - it was easy, and it made a big difference at the time.

(UPD: looking at versions, it seems to be UglifyJS@1.x, which actually is even older, and maybe not as compatible as I thought...)

Correct, that code uses Uglify1, which is a different API. Those passes would be needed to be rewritten to use a modern AST. But after removing fastcomp we need a lot fewer of them...

RReverser commented 4 years ago

And the breakage issue means that some users can't use closure at all. So while we don't work to squeeze every last byte out of non-closure builds, we do want them to be tiny. So a significant regression would not be good, I think. That's why I ended up adding terser - it was easy, and it made a big difference at the time.

Makes sense. I guess worth experimenting with FORMAT_MINIFY quoted above and comparing final JS+gzip numbers on some real codebase to see how much we'd lose by leaving Terser.

As for the Closure breaking the code - it should be happening only under advanced optimisations, right? In a "simple" mode it does pretty much the same things as Terser does, so perhaps we could just switch between them.

kripken commented 4 years ago

Okay, but where is it called from?

Hmm... tools/js_optimizer.py calls tools/js-optimizer.js to run passes. And the python should be called from emcc.py. But it looks like we removed it... which was 99% correct, but I think we removed too much @sbc100 ! We should be calling it to run safeHeap, for example. That means we've lost SAFE_HEAP coverage on custom user JS code :sob:

I'll look into fixing that!

RReverser commented 4 years ago

Heh, glad we found it I guess 😅

sbc100 commented 4 years ago

Okay, but where is it called from?

Hmm... tools/js_optimizer.py calls tools/js-optimizer.js to run passes. And the python should be called from emcc.py. But it looks like we removed it... which was 99% correct, but I think we removed too much @sbc100 ! We should be calling it to run safeHeap, for example. That means we've lost SAFE_HEAP coverage on custom user JS code

I'll look into fixing that!

But how was it possible we removed too much? Do we lack test coverage?

kripken commented 4 years ago

But how was it possible we removed too much? Do we lack test coverage?

I'm still looking, but I'm pretty sure the wasm backend path never applied safe heap on user code, and yes, we lacked a test so this was missed...

kripken commented 4 years ago

Oh, and the other place we use js_optimizer.py, which exists and is correct, is for wasm2js, in building.py (def js_optimizer etc.).

jonassmedegaard commented 3 years ago

Terser itself switched away from ESCodegen to astring - perhaps relevant to cinsider astring here as well?

sbc100 commented 3 years ago

Does anyone feel like working on this?

stale[bot] commented 2 years ago

This issue has been automatically marked as stale because there has been no activity in the past year. It will be closed automatically if no further activity occurs in the next 30 days. Feel free to re-open at any time if this issue is still relevant.

RReverser commented 1 year ago

Terser itself switched away from ESCodegen to astring - perhaps relevant to cinsider astring here as well?

Note that they switched it only in test utils, not as an implementation.

astring could be an interesting alternative but can't be used until https://github.com/davidbonnet/astring/issues/203 is resolved as Emscripten relies on at least whitespace minification.