Closed trueadm closed 6 years ago
@NTillmann I've spent most the day looking into how we might fix this problem. In the above case, a.callFunc
bails-out, resulting in a temporal CallExpression
abstract value where its contents (including b
are havoced). When visiting the entries, we visit the function and address its bindings (b
in this case) in this._enqueueWithUnrelatedScope
. Which means, we see a different binding from what we would have done with the effects of the condition applied (which is the entry for the Object.assign(...)
). When we get to serializing the conditional generators, the binding serializes out to be an abstract conditional but this gets emitted after the condition generators are serialized.
I thought how we might get around this. This seems related to CallExpression
bailing out for now, so I looked into adding in more temporals for each of the bindings in the closure before we emit the bailed out call expression. So the idea is that we then do this:
var _1 = function (a, b, c) {
var _C; // <-- we emit all variable declarations for inner closures at the top of the parent
var _A = function () {
return _C.foo;
};
var _2 = !a;
var _B = _$5;
if (!_2) {
var _4 = {};
var _$0 = _B.assign(_4, c);
var _6 = _$6(a);
_C = $0; // <-- we know assign the value at this point
var _$1 = _6.callFunc;
var _$2 = a.callFunc(_A);
}
_C = _2 ? b : _$0; // <-- we leave this as before, except we don't add a var
var _7 = _2 ? null : _$2;
return _7;
};
it relies on making a single variable declaration for all bindings inside inner closures. Thinking about it though, we should probably be doing this anyway.
I have a hacky solution that makes this issue work: https://github.com/facebook/prepack/pull/1975. I don't hoist the declarations to the top of the function though, but it follows the same premise I mentioned about in regards to emitting a temporal before the abstract function call occurs.
Let me try to explain what is happening...
First of all, here's the Prepack output I am seeing, with some random comments:
var _1 = function (a, b, c) {
var _A = function () {
return _C.foo;
};
var _2 = !a;
var _B = _$4; // _$4 is Object
if (!_2) {
var _4 = {};
var _$0 = _B.assign(_4, c);
// I don't understand why the next two statements exist; a bug or inconsistency somewhere?
var _6 = _$5(a);
var _$1 = _6.callFunc;
var _$2 = a.callFunc(_A);
}
var _C = _2 ? b : _$0;
var _7 = _2 ? null : _$2;
return _7;
};
What is happening is...
_A
, are hoisted to the top level._C
) at any point in time. Prepack's serializer still tries to initialize such values as close to the first reference of the function value as possible, but there's no guarantee here; it's just on a best-effort basis. What's happening here, leaking a function to the outside world and expecting all state reachable from it to be fully initialized at that time, goes beyond the serializer's current design._C
) has a value that depends on _$0
, a derived value that only gets initialized conditionally in a nested generator. When this happens, Prepack's serializer delays the initialization of that variable (_C
) until all generators that define values that we depend on here have finished. And this is now "too late".So, while unfortunate, there's no simple bug here, but we are hitting a rather significant design limitation.
So, what do we do?
_C
is initialized eager in the prepacked code at some particular point in time. While ugly, the mechanism prepack has for captured variables that get mutated somewhere, that mechanism also effectly "delays" the initialization of the capture variable until the function that uses the captured variable gets invoked for the first time. What generally amounts to "delaying", would actually make it more eager here, and might just make it work, without adding too much hackery, although still possibly fragile. Let me look into this.High-level thought: The underlying design flaw here really is that Prepack doesn't have a proper notion of state snapshots. Prepack only serializes the final state. Here, the residual function gets called exactly once at the end. But it could be called multiple times in some intermediate states which certainly won't be properly captured the way things are.
Here's a quick look at whether the mutated-captured-variables mechanism can help.
So I modified the code slightly to mutate the captured variable (just so that the right machinery gets triggered, eventually we could do this automatically; this is also related to #1245):
function foo(a, b, c) {
if (!a) {
return null;
}
var b = Object.assign({}, c);
return a.callFunc(function() {
let foo = b.foo;
b = null; // <-- mutating just to trigger different code generation
return foo;
});
}
And then we get this code, whose beauty is in the eye of the beholder:
var _1 = function (a, b, c) {
var __scope_0 = new Array(1);
var __scope_1 = function (__selector) {
var __captured;
switch (__selector) {
case 0:
__captured = [_C];
break;
default:
throw new Error("Unknown scope selector");
}
__scope_0[__selector] = __captured;
return __captured;
};
var _A = function () {
var __captured__scope_2 = __scope_0[0] || __scope_1(0);
let foo = __captured__scope_2[0].foo;
__captured__scope_2[0] = null;
return foo;
};
var _2 = !a;
var _B = _$4;
if (!_2) {
var _4 = {};
var _$0 = _B.assign(_4, c);
var _6 = _$5(a);
var _$1 = _6.callFunc;
var _$2 = a.callFunc(_A);
}
var _C = _2 ? b : _$0;
var _7 = _2 ? null : _$2;
return _7;
};
Unfortunately, this is not good enough. It just shuffled things around, but _C
is still getting initialized and accessed just as before.
Okay, there's another actually much more relevant delaying mechanism: --delayInitializations. This puts value initialization code that only referenced by a residual function into the residual function.
However, for dubious historical reasons and/or implementation limitations, this is currently guarded to not kick in for residual functions nested within optimized functions. When I removed that check in this line... https://github.com/facebook/prepack/blob/f807a17b7419c198fd4ecac3fc556d123aa206b4/src/serializer/ResidualHeapSerializer.js#L783 ... and prepack with --delayInitializations, then I get this, which looks like it would almost work:
var _1 = function (a, b, c) {
if (_initialized0 === void 0) { // This seems quite pointless, just as pointless as `_B` ever was (another inefficiency or maybe even dormant bug), but not wrong in itself.
_initialized0 = null;
_B = _$4;
}
var _A = function () {
if (_initialized1 === void 0) {
_initialized1 = null;
_C = _2 ? b : _$0; // Initialization of _C got moved here!
}
return _C.foo;
};
var _2 = !a;
if (!_2) {
var _4 = {};
var _$0 = _B.assign(_4, c);
var _6 = _$5(a);
var _$1 = _6.callFunc;
var _$2 = a.callFunc(_A);
}
var _7 = _2 ? null : _$2;
return _7;
};
var _initialized0;
var _initialized1; // <-- bug: this variable should live in `_1`.
Okay, so there's a reason this feature is currently disabled within optimized functions since things are slightly broken, but it looks promising.
Question:
If yes to both, I can look into making --delayInitializations work for optimized functions. I feel that's time well-spent. But while it will make this example work, it still won't be bullet-proof, as it doesn't delay all dependent computations, and we are really abusing the "delay" idea here to move things earlier. But things would be strictly better.
If not, it's time to go back to the drawing board.
I think we should go ahead and try it out. @sebmarkbage and @gaearon may have different thoughts, but our aim for this half was always to be correct first. We can always further optimize and redesign things better next half.
Thinking more about it, it feels like some notion of snapshots, or at least putting a timestamp on when something got captured/havoced and respecting that in the serializer, would be the right way of making this "correct".
To be clear I'm not entirely sure we have exactly this problem in our bundles. We didn't know about this design limitation when we were reducing the actual case so it's plausible we don't hit this bigger issue in reality, and it's not high priority to us. This needs additional investigation.
This serializes the
Object.assign
after thea.callFunc
call which is the wrong order:Interestingly, if I remove
The output it serializes is correct.