esbuild minify fails some recent uglify-js tests

evanw / esbuild

An extremely fast bundler for the web

https://esbuild.github.io/

MIT License

38.05k stars 1.14k forks source link

esbuild minify fails some recent uglify-js tests #1305

Closed kzc closed 3 years ago

kzc commented 3 years ago

Using node v16.1.0 with the patch below, esbuild e76b49fa3c9d8ef4dc13b4d7a3eb04b524ff4626 produces:

$ make clean-all uglify
...
29 failed out of 3163
make: *** [uglify] Error 1

Note that all the original unminified expect_stdout test cases will run successfully on node v16.1.0.

Some failures are due to esbuild being more strict than recent versions of NodeJS, but around half are genuine esbuild minify errors.

diff --git a/Makefile b/Makefile
index 19dfe06..cf97c76 100644
--- a/Makefile
+++ b/Makefile
@@ -372,3 +372,3 @@ github/uglify:
        cd github/uglify && git init && git remote add origin https://github.com/mishoo/uglifyjs.git
-       cd github/uglify && git fetch --depth 1 origin 83a3cbf1514e81292b749655f2f712e82a5a2ba8 && git checkout FETCH_HEAD
+       cd github/uglify && git fetch --depth 1 origin d2a45ba441ed6975021b3524215c01a011dfb46a && git checkout FETCH_HEAD

diff --git a/scripts/uglify-tests.js b/scripts/uglify-tests.js
index 4214959..e4cf715 100644
--- a/scripts/uglify-tests.js
+++ b/scripts/uglify-tests.js
@@ -102,2 +102,14 @@ async function test_case(esbuild, test) {

+  // Ignore tests without expect_stdout, as these tests cannot be verified when run
+  if (!("expect_stdout" in test)) return;
+
+  // Ignore tests that will likely result in a syntax error in recent NodeJS versions.
+  // uglify-js generally accepts whatever inputs that NodeJS allowed/allows
+  // as long as it is unambiguous with grammar lookahead -
+  // even if it deviates from the latest ECMAScript specification.
+  // `true` in this context means "the minified test program output matches the
+  // unminified test program output for this version of NodeJS - whether valid
+  // or an error."
+  if (test.expect_stdout === true) return;
+
   // Ignore tests that no longer pass in modern versions of node. These tests
@@ -117,17 +129,4 @@ async function test_case(esbuild, test) {
       minify: true,
-      target: 'es5',
     });
   } catch (e) {
-    // These two tests fail because they contain setters without arguments,
-    // which is a syntax error. These test failures do not indicate anything
-    // wrong with esbuild so the failures are ignored. Here is one of the
-    // tests:
-    //
-    //   function f(){var a={get b(){},set b(){}};return{a:a}}
-    //
-    if (test.name === 'unsafe_object_accessor' || test.name === 'keep_name_of_setter') {
-      console.error("*** skipping test with known syntax error:", test.name);
-      return;
-    }
-
     const formatError = ({ text, location }) => {

evanw commented 3 years ago

Thanks for reporting this. I appreciate the heads up. Looking into these now...

kzc commented 3 years ago

29 failed out of 3163

Hmm... I think the skipped test cases have been incorrectly folded into passed.

The figures with the following patch look more reasonable:

$ make clean-all uglify
...
29 failed out of 2180, with 983 skipped
make: *** [uglify] Error 1

diff --git a/Makefile b/Makefile
index 19dfe06..cf97c76 100644
--- a/Makefile
+++ b/Makefile
@@ -372,3 +372,3 @@ github/uglify:
        cd github/uglify && git init && git remote add origin https://github.com/mishoo/uglifyjs.git
-       cd github/uglify && git fetch --depth 1 origin 83a3cbf1514e81292b749655f2f712e82a5a2ba8 && git checkout FETCH_HEAD
+       cd github/uglify && git fetch --depth 1 origin d2a45ba441ed6975021b3524215c01a011dfb46a && git checkout FETCH_HEAD

diff --git a/scripts/uglify-tests.js b/scripts/uglify-tests.js
index 4214959..61745a6 100644
--- a/scripts/uglify-tests.js
+++ b/scripts/uglify-tests.js
@@ -9,2 +9,3 @@ const testDir = path.join(repoDir, 'scripts', '.uglify-tests');
 const uglifyDir = path.join(repoDir, 'demo', 'uglify');
+const SKIP = new Error("SKIP");
 let U;
@@ -36,6 +37,8 @@ async function main() {
   let failedTotal = 0;
+  let skippedTotal = 0;
   const runTest = file => test_file(esbuild, path.join(compressDir, file))
-    .then(({ passed, failed }) => {
+    .then(({ passed, failed, skipped }) => {
       passedTotal += passed;
       failedTotal += failed;
+      skippedTotal += skipped;
     });
@@ -46,3 +49,3 @@ async function main() {

-  console.log(`${failedTotal} failed out of ${passedTotal + failedTotal}`);
+  console.log(`${failedTotal} failed out of ${passedTotal + failedTotal}, with ${skippedTotal} skipped`);
   if (failedTotal) {
@@ -55,2 +58,3 @@ async function test_file(esbuild, file) {
   let failed = 0;
+  let skipped = 0;
   const tests = parse_test(file);
@@ -59,8 +63,11 @@ async function test_file(esbuild, file) {
     .catch(e => {
-      failed++;
-      console.error(`<U+274C> ${file}: ${name}: ${(e && e.message || e).trim()}\n`);
-      pass = false;
+      if (e === SKIP) {
+        skipped++;
+      } else {
+        failed++;
+        console.error(`<U+274C> ${file}: ${name}: ${(e && e.message || e).trim()}\n`);
+      }
     });
   await Promise.all(Object.keys(tests).map(runTest));
-  return { passed, failed };
+  return { passed, failed, skipped };
 }
@@ -102,2 +109,14 @@ async function test_case(esbuild, test) {

+  // Ignore tests without expect_stdout, as these tests cannot be verified when run
+  if (!("expect_stdout" in test)) throw SKIP;
+
+  // Ignore tests that will likely result in a syntax error in recent NodeJS versions.
+  // uglify-js generally accepts whatever inputs that NodeJS allowed/allows
+  // as long as it is unambiguous with grammar lookahead -
+  // even if it deviates from the latest ECMAScript specification.
+  // `true` in this context means "the minified test output matches the
+  // unminified test output for this version of NodeJS - whether valid
+  // or an error."
+  if (test.expect_stdout === true) throw SKIP;
+
   // Ignore tests that no longer pass in modern versions of node. These tests
@@ -110,3 +129,3 @@ async function test_case(esbuild, test) {
     console.error("*** skipping test %j with node_version %j", test.name, test.node_version);
-    return;
+    throw SKIP;
   }
@@ -117,17 +136,4 @@ async function test_case(esbuild, test) {
       minify: true,
-      target: 'es5',
     });
   } catch (e) {
-    // These two tests fail because they contain setters without arguments,
-    // which is a syntax error. These test failures do not indicate anything
-    // wrong with esbuild so the failures are ignored. Here is one of the
-    // tests:
-    //
-    //   function f(){var a={get b(){},set b(){}};return{a:a}}
-    //
-    if (test.name === 'unsafe_object_accessor' || test.name === 'keep_name_of_setter') {
-      console.error("*** skipping test with known syntax error:", test.name);
-      return;
-    }
-
     const formatError = ({ text, location }) => {

kzc commented 3 years ago

If this line in the latest patch is commented out:

  if (test.expect_stdout === true) throw SKIP;

there will be false failures in some tests, but more tests will be run:

51 failed out of 2488, with 675 skipped

Additional code may be added to selectively skip certain tests.

evanw commented 3 years ago

Some of the tests look to me like they are actually encoding a bug in V8 (specifically https://github.com/mishoo/uglifyjs/issues/4269). I have filed the bug here: https://bugs.chromium.org/p/v8/issues/detail?id=11810. I could be wrong of course but I believe properties should be defined in source order.

If the behavior of the input code is buggy in a certain implementation, then I don't think esbuild necessarily has to avoid optimizations that might accidentally generate the correct output. Either that or esbuild should automatically apply a workaround for these bugs if the target environment includes a V8-based VM. Basically I think esbuild should either have a garbage-in-garbage-out philosophy or should always generate the correct output, but it shouldn't bake in implementation bugs into its optimization passes.

kzc commented 3 years ago

uglify-js employs a garbage-in/garbage-out philosophy for the most part.

When possible it will try to agree with NodeJS to prevent unwanted noise in its automated test case generating fuzzer. It has encountered and/or exposed a few V8 bugs.

https://github.com/mishoo/UglifyJS/actions

evanw commented 3 years ago

Some other cases test TDZ handling. I'm not currently preserving TDZ checks because doing so inhibits a lot of optimizations, both at compile-time and run-time. I suspect TDZ checks are rarely relied upon in real-world code but that disabling optimizations which assume local variable references have no side effects would be a very unpopular change (it would likely disable most tree-shaking done by the bundler). Preserving top-level TDZ checks also has extremely negative performance consequences. So I'm inclined to skip over failures due to TDZ checks, although I could potentially be convinced otherwise.

kzc commented 3 years ago

It's up to you. TDZ is a part of the ES spec. One could make the argument that avoiding triggering V8/Chrome bugs is not much different than avoiding the Safari JS engine TDZ slowness with correct ES code. uglify-js does manage to preserve TDZ semantics while providing good minification.

I've also noticed failures related to https://tc39.es/proposal-template-literal-revision/.

evanw commented 3 years ago

One could make the argument that avoiding triggering V8/Chrome bugs is not much different than avoiding the Safari JS engine TDZ slowness with correct ES code.

Yes, one could argue that. However, it seems different to me because of the extreme scale of the problem (the VM locks up for many seconds and it affects virtually every modern code base due to the use of class and let/const). Although it only affects top-level variables so you could potentially argue that the minifier shouldn't consider references side-effect free if they are either non-top-level or maybe inside a try/catch or something. I could see potentially making a change like that.

I've also noticed failures related to https://tc39.es/proposal-template-literal-revision/.

Yeah. I'm aware of that one but haven't fixed it yet because no one has asked for it. I should probably fix it though. It comes up in test262 as well.

By the way the failures due to await are because of esbuild's top-level await support. The parser does not make a distinction between the script and module goals so it always parses top-level await. This hasn't ever been an issue with real-world code and is basically only a problem for conformance tests. But I think it's an acceptable trade-off.

I have fixed the following issues so far, although there are still more issues to be fixed. Here is the text from the release notes:

The in operator is now surrounded parentheses inside arrow function expression bodies inside for loop initializers:
```
// Original code
for ((x => y in z); 0; ) ;

// Old output
for ((x) => y in z; 0; ) ;

// New output
for ((x) => (y in z); 0; ) ;
```
Without this, the in operator would cause the for loop to be considered a for-in loop instead.
The statement return undefined; is no longer minified to return; inside async generator functions:
```
// Original code
return undefined;

// Old output
return;

// New output
return void 0;
```
Using return undefined; inside an async generator function has the same effect as return await undefined; which schedules a task in the event loop and runs code in a different order than just return;, which doesn't hide an implicit await expression.
Property access expressions are no longer inlined in template tag position:
```
// Original code
(null, a.b)``, (null, a[b])``;

// Old output
a.b``, a[b]``;

// New output
(0, a.b)``, (0, a[b])``;
```
The expression a.b`c` is different than the expression (0, a.b)`c`. The first calls the function a.b with a as the value for this but the second calls the function a.b with the default value for this (the global object in non-strict mode or undefined in strict mode).
Verbatim __proto__ properties inside object spread are no longer inlined when minifying:
```
// Original code
x = { ...{ __proto__: { y: true } } }.y;

// Old output
x = { __proto__: { y: !0 } }.y;

// New output
x = { ...{ __proto__: { y: !0 } } }.y;
```
A verbatim (i.e. non-computed non-method) property called __proto__ inside an object literal actually sets the prototype of the surrounding object literal. It does not add an "own property" called __proto__ to that object literal, so inlining it into the parent object literal would be incorrect. The presence of a __proto__ property now stops esbuild from applying the object spread inlining optimization when minifying.

The value of this has now been fixed for lowered private class members that are used as template tags:

// Original code
x = (new (class {
  a = this.#c``;
  b = 1;
  #c() { return this }
})).a.b;

// Old output
var _c, c_fn, _a;
x = new (_a = class {
  constructor() {
    __privateAdd(this, _c);
    __publicField(this, "a", __privateMethod(this, _c, c_fn)``);
    __publicField(this, "b", 1);
  }
}, _c = new WeakSet(), c_fn = function() {
  return this;
}, _a)().a.b;

// New output
var _c, c_fn, _a;
x = new (_a = class {
  constructor() {
    __privateAdd(this, _c);
    __publicField(this, "a", __privateMethod(this, _c, c_fn).bind(this)``);
    __publicField(this, "b", 1);
  }
}, _c = new WeakSet(), c_fn = function() {
  return this;
}, _a)().a.b;

The value of this here should be an instance of the class because the template tag is a property access expression. However, it was previously the default value (the global object in non-strict mode or undefined in strict mode) instead due to the private member transformation, which is incorrect.

kzc commented 3 years ago

Some of the features in ECMAScript are just bonkers or seldom used. And TDZ is the bane of any fuzzer implementor trying to generate valid random programs that don't throw an error. But if a decision is made to have esbuild deviate from the ES specification or not support some of its features by design then I think it ought to be documented somewhere. At the very least it might start a discussion to deprecate some less than useful ES features.

kzc commented 3 years ago

However, it seems different to me because of the extreme scale of the problem (the VM locks up for many seconds and it affects virtually every modern code base due to the use of class and let/const).

let/const/class are not fringe features of ES6 - they are central to the language and ought to be implemented efficiently as Chrome/V8 and Mozilla have. In the case of Safari/JavaScriptCore the result was not incorrect - just slow for large bundles. The way I see it, when performance issues in a particular JS engine are papered over by minifiers and bundlers it provides less of an incentive for the vendor or project to push out fixes in a timely manner. Look at IE8 as an example of an entire developer ecosystem extending the life of an obsolete browser a decade longer than it ought to have lasted.

evanw commented 3 years ago

Another issue is that I'm relying on lack of TDZ to implement lazily-initialized modules for correct ordering of dynamically-loaded modules (both require() and import()). For example:

// entry.mjs
console.log('before')
await import('./a.mjs')
console.log('after')

// a.mjs
import { b } from './b.mjs'
export let a = b

// b.mjs
import { a } from './a.mjs'
export let b = a

Evaluating that code should throw an error, and does when you run node entry.mjs:

before
b.mjs:2
export let b = a
               ^

ReferenceError: Cannot access 'a' before initialization
    at b.mjs:2:16
    at ModuleJob.run (node:internal/modules/esm/module_job:175:25)
    at async Loader.import (node:internal/modules/esm/loader:178:24)
    at async entry.mjs:2:1

However, it doesn't when bundled with esbuild (via esbuild --bundle entry.mjs --format=esm --outfile=out.mjs). The reason is that esbuild generates the following code:

var __defProp = Object.defineProperty;
var __esm = (fn, res) => function __init() {
  return fn && (res = (0, fn[Object.keys(fn)[0]])(fn = 0)), res;
};
var __export = (target, all) => {
  for (var name in all)
    __defProp(target, name, { get: all[name], enumerable: true });
};

// b.mjs
var b;
var init_b = __esm({
  "b.mjs"() {
    init_a();
    b = a;
  }
});

// a.mjs
var a_exports = {};
__export(a_exports, {
  a: () => a
});
var a;
var init_a = __esm({
  "a.mjs"() {
    init_b();
    a = b;
  }
});

// entry.mjs
console.log("before");
await Promise.resolve().then(() => (init_a(), a_exports));
console.log("after");

Even exchanging var with let doesn't fix this. If TDZ checks were important then I think I would have to actually generate code to manually track all of the initialized states of all of the variables and also code to manually check whether the variables are initialized before evaluating each reference. That would come at both a code size and run-time speed cost.

I assume enforcing TDZ checks like this would be an unpopular choice (I certainly wouldn't want this behavior for my code in production) but I can't think of another way to do it. Or, well, you could always refuse to bundle the code and generate individual files instead, but that sort of defeats the point of a bundler.

kzc commented 3 years ago

You don't have to convince me - I'm no fan of TDZ and appreciate why it's a hindrance to bundling. I'm just saying it should be documented, that's all. Perhaps you already have.

kzc commented 3 years ago

fwiw, if the multi-module TDZ example above is run through rollup it does preserve NodeJS's TDZ semantics:

$ rollup -v
rollup v2.49.0

$ rollup ./entry.mjs --inlineDynamicImports --silent
console.log('before');
await Promise.resolve().then(function () { return a$1; });
console.log('after');

let b = a;

let a = b;

var a$1 = /*#__PURE__*/Object.freeze({
    __proto__: null,
    a: a
});

$ rollup ./entry.mjs --inlineDynamicImports --silent | node --input-type=module
before
after
file:///[eval1]:5
let b = a;
        ^

ReferenceError: Cannot access 'a' before initialization

rollup can also generate multiple output chunks using:

$ rollup ./entry.mjs -d out --silent

$ ls -1 out/*
out/a-303bc69c.js
out/entry.js

But I have no idea how to get it to produce .mjs suffixed outputs to work with node with top-level await.

However, this non-top-level-await entry point will work:

// entry2.mjs
console.log('before');
(async function() {
    await import('./a.mjs');
})();
console.log('after');

$ rollup ./entry2.mjs -f cjs -d out --silent
$ node out/entry2.js 
before
after
out/a-3134a405.js:3
let b = a;
        ^

ReferenceError: Cannot access 'a' before initialization

kzc commented 3 years ago

But I have no idea how to get it to produce .mjs suffixed outputs to work with node with top-level await.

Finally cracked the code to get rollup to work with NodeJS TLA...

$ rollup ./entry.mjs --entryFileNames "[name].mjs" --chunkFileNames "[name]-[hash].mjs" -d out --silent

$ ls -1 out/*
out/a-a643a2b9.mjs
out/entry.mjs

$ cat out/entry.mjs 
console.log('before');
await import('./a-a643a2b9.mjs');
console.log('after');

$ cat out/a-a643a2b9.mjs 
let b = a;

let a = b;

export { a };

kzc commented 3 years ago

@evanw await as an identifier, TDZ and V8 bugs aside, most of the issues of consequence have been addressed. But I think the failing destructuring catch scope test case issue_4420 was missed in the noise. Here's a slightly more comprehensive catch parameter scope test that also fails in Rollup.

By the way, is Rollup's approach to TDZ semantics preservation as seen above something you would consider to support in esbuild? It doesn't appear to impose any significant code size nor run-time speed cost.

evanw commented 3 years ago

But I have no idea how to get it to produce .mjs suffixed outputs to work with node with top-level await.

The --out-extension feature is intended to be used for this.

But I think the failing destructuring catch scope test case issue_4420 was missed in the noise.

It wasn't missed. That's part of why this issue is still open. I have limited time this week so I was only able to fix some of the things so far.

By the way, is Rollup's approach to TDZ semantics preservation as seen above something you would consider to support in esbuild? It doesn't appear to impose any significant code size nor run-time speed cost.

This should be what happens in esbuild too with --splitting enabled. But code splitting can also be disabled, which may not be a feature that Rollup has.

Also it doesn't look like Rollup treats TDZ checks as a side effect. For example bundling let a = a with Rollup generates an empty file even though that code will throw an exception, which is a side effect.

kzc commented 3 years ago

I have limited time this week so I was only able to fix some of the things so far.

My bad. I mistakenly assumed that the remaining test failures were there by design. No one truly cares about using await as an identifier, for example.

Also it doesn't look like Rollup treats TDZ checks as a side effect.

I'd consider any lack of TDZ checking in Rollup to be a bug. But just using its REPL now I see it's pretty common. It's progressing though - rollup has recently added support for side effects for setters and getters, among other checks now.

A workaround to support TDZ with Rollup would be to use --no-treeshake and employ a minifier like uglify-js ~~or terser~~ that respects TDZ side effects. There's a lot of overlap between Rollup optimizations and minifier optimizations. The only significant drawback with this approach is that webpack-style sideEffects: false code pruning would also be disabled, if the bundle happens to significantly rely upon it. The minifier wouldn't be able to drop the side-effect code for bundled sideEffects: false modules - unless pure annotations were used.

Edit: terser does not reliably respect TDZ side effects:

$ node_modules/.bin/terser -V
terser 5.7.0

$ echo 'w; let w, x = 1+x, y = console.log(123), z;' | node_modules/.bin/terser --toplevel -mc
console.log(123);

uglify-js does however:

$ uglify-js -V
uglify-js 3.13.9

$ echo 'w; let w, x = 1+x, y = console.log(123), z;' | uglify-js --toplevel -mc
l;let l,o=1+o;console.log(123);

kzc commented 3 years ago

I'm going to close this issue because it's a moving target with uglify-js adding a dozen or so new test cases per week. Feel free to advance the uglify-js commit hash and retest esbuild from time to time at your convenience.

On a related note, Rollup has just started to retain some of the more common TDZ violations in code, although not as comprehensively as uglify-js.