facebookarchive / prepack

A JavaScript bundle optimizer.
http://prepack.io
Other
14.21k stars 424 forks source link

Wrong serialization order when residual function is called during the optimized code path #1973

Closed trueadm closed 6 years ago

trueadm commented 6 years ago

This serializes the Object.assign after the a.callFunc call which is the wrong order:

function foo(a, b, c) {
  if (!a) {
    return null;
  }
  var b = Object.assign({}, c);

  return a.callFunc(function() {
    return b.foo;
  });
}

inspect = function() {
  return JSON.stringify(
    foo({
      a: {
        callFunc(x) {
          return x() 
        },
      },
      b: {
        foo() {
          return 'works!';
        },
      },
      c: {},
    })
  );
}

this.__optimize && __optimize(foo)

Interestingly, if I remove

  if (!a) {
    return null;
  }

The output it serializes is correct.

trueadm commented 6 years ago

@NTillmann I've spent most the day looking into how we might fix this problem. In the above case, a.callFunc bails-out, resulting in a temporal CallExpression abstract value where its contents (including b are havoced). When visiting the entries, we visit the function and address its bindings (b in this case) in this._enqueueWithUnrelatedScope. Which means, we see a different binding from what we would have done with the effects of the condition applied (which is the entry for the Object.assign(...)). When we get to serializing the conditional generators, the binding serializes out to be an abstract conditional but this gets emitted after the condition generators are serialized.

I thought how we might get around this. This seems related to CallExpression bailing out for now, so I looked into adding in more temporals for each of the bindings in the closure before we emit the bailed out call expression. So the idea is that we then do this:

  var _1 = function (a, b, c) {
    var _C;  // <-- we emit all variable declarations for inner closures at the top of the parent
    var _A = function () {
      return _C.foo;
    };

    var _2 = !a;

    var _B = _$5;

    if (!_2) {
      var _4 = {};

      var _$0 = _B.assign(_4, c);

      var _6 = _$6(a);

      _C = $0;  // <-- we know assign the value at this point

      var _$1 = _6.callFunc;

      var _$2 = a.callFunc(_A);
    }

    _C = _2 ? b : _$0;  // <-- we leave this as before, except we don't add a var

    var _7 = _2 ? null : _$2;

    return _7;
  };

it relies on making a single variable declaration for all bindings inside inner closures. Thinking about it though, we should probably be doing this anyway.

I have a hacky solution that makes this issue work: https://github.com/facebook/prepack/pull/1975. I don't hoist the declarations to the top of the function though, but it follows the same premise I mentioned about in regards to emitting a temporal before the abstract function call occurs.

NTillmann commented 6 years ago

Let me try to explain what is happening...

First of all, here's the Prepack output I am seeing, with some random comments:

  var _1 = function (a, b, c) {
    var _A = function () {
      return _C.foo;
    };

    var _2 = !a;
    var _B = _$4; // _$4 is Object
    if (!_2) {
      var _4 = {};
      var _$0 = _B.assign(_4, c);

      // I don't understand why the next two statements exist; a bug or inconsistency somewhere?
      var _6 = _$5(a);
      var _$1 = _6.callFunc;

      var _$2 = a.callFunc(_A);
    }
    var _C = _2 ? b : _$0;

    var _7 = _2 ? null : _$2;
    return _7;
  };

What is happening is...

So, while unfortunate, there's no simple bug here, but we are hitting a rather significant design limitation.

NTillmann commented 6 years ago

So, what do we do?

NTillmann commented 6 years ago

High-level thought: The underlying design flaw here really is that Prepack doesn't have a proper notion of state snapshots. Prepack only serializes the final state. Here, the residual function gets called exactly once at the end. But it could be called multiple times in some intermediate states which certainly won't be properly captured the way things are.

NTillmann commented 6 years ago

Here's a quick look at whether the mutated-captured-variables mechanism can help.

So I modified the code slightly to mutate the captured variable (just so that the right machinery gets triggered, eventually we could do this automatically; this is also related to #1245):

function foo(a, b, c) {
    if (!a) {
        return null;
    }
    var b = Object.assign({}, c);

    return a.callFunc(function() {
            let foo = b.foo;
            b = null; // <-- mutating just to trigger different code generation
            return foo;
        });
}

And then we get this code, whose beauty is in the eye of the beholder:

var _1 = function (a, b, c) {
    var __scope_0 = new Array(1);

    var __scope_1 = function (__selector) {
      var __captured;

      switch (__selector) {
        case 0:
          __captured = [_C];
          break;

        default:
          throw new Error("Unknown scope selector");
      }

      __scope_0[__selector] = __captured;
      return __captured;
    };

    var _A = function () {
      var __captured__scope_2 = __scope_0[0] || __scope_1(0);

      let foo = __captured__scope_2[0].foo;
      __captured__scope_2[0] = null;
      return foo;
    };

    var _2 = !a;

    var _B = _$4;

    if (!_2) {
      var _4 = {};

      var _$0 = _B.assign(_4, c);

      var _6 = _$5(a);

      var _$1 = _6.callFunc;

      var _$2 = a.callFunc(_A);
    }

    var _C = _2 ? b : _$0;

    var _7 = _2 ? null : _$2;

    return _7;
  };

Unfortunately, this is not good enough. It just shuffled things around, but _C is still getting initialized and accessed just as before.

NTillmann commented 6 years ago

Okay, there's another actually much more relevant delaying mechanism: --delayInitializations. This puts value initialization code that only referenced by a residual function into the residual function.

However, for dubious historical reasons and/or implementation limitations, this is currently guarded to not kick in for residual functions nested within optimized functions. When I removed that check in this line... https://github.com/facebook/prepack/blob/f807a17b7419c198fd4ecac3fc556d123aa206b4/src/serializer/ResidualHeapSerializer.js#L783 ... and prepack with --delayInitializations, then I get this, which looks like it would almost work:

  var _1 = function (a, b, c) {
    if (_initialized0 === void 0) { // This seems quite pointless, just as pointless as `_B` ever was (another inefficiency or maybe even dormant bug), but not wrong in itself.
      _initialized0 = null;
      _B = _$4;
    }

    var _A = function () {
      if (_initialized1 === void 0) {
        _initialized1 = null;
        _C = _2 ? b : _$0; // Initialization of _C got moved here!
      }

      return _C.foo;
    };

    var _2 = !a;

    if (!_2) {
      var _4 = {};

      var _$0 = _B.assign(_4, c);

      var _6 = _$5(a);

      var _$1 = _6.callFunc;

      var _$2 = a.callFunc(_A);
    }

    var _7 = _2 ? null : _$2;

    return _7;
  };

  var _initialized0;

  var _initialized1; // <-- bug: this variable should live in `_1`.

Okay, so there's a reason this feature is currently disabled within optimized functions since things are slightly broken, but it looks promising.

Question:

  1. Given the current design limitations due to lack of snapshots (calling residual functions in intermediate states will not do the right thing), does it make sense to move forward at all?
  2. If so, is the overhead of the "delayed"-initialization scheme acceptable?

If yes to both, I can look into making --delayInitializations work for optimized functions. I feel that's time well-spent. But while it will make this example work, it still won't be bullet-proof, as it doesn't delay all dependent computations, and we are really abusing the "delay" idea here to move things earlier. But things would be strictly better.

If not, it's time to go back to the drawing board.

trueadm commented 6 years ago

I think we should go ahead and try it out. @sebmarkbage and @gaearon may have different thoughts, but our aim for this half was always to be correct first. We can always further optimize and redesign things better next half.

NTillmann commented 6 years ago

Thinking more about it, it feels like some notion of snapshots, or at least putting a timestamp on when something got captured/havoced and respecting that in the serializer, would be the right way of making this "correct".

gaearon commented 6 years ago

To be clear I'm not entirely sure we have exactly this problem in our bundles. We didn't know about this design limitation when we were reducing the actual case so it's plausible we don't hit this bigger issue in reality, and it's not high priority to us. This needs additional investigation.