Open mkustermann opened 4 years ago
This looks like an easier version of TEST_CASE(IsolateReload_PendingStaticCall_DefinedToNSM). This was definitely handled before, but likely broken by kernel.
I don't know how it was handled before, but it could be handled in the flow graph building code: Once hot-reload is done we clear out all code. Then we continue running, which will invoke the closure. That will trigger unoptimized compile of closure. The flow graph builder can then see that the call to the constructor (or any static method) is invalid due to mismatch in signatures (or types) and can call NSM directly.
/cc @alexmarkov @mraleph WDYT?
This looks like an easier version of TEST_CASE(IsolateReload_PendingStaticCall_DefinedToNSM). This was definitely handled before, but likely broken by kernel.
Out of curiosity I did a bit of archeology: IsolateReload_PendingStaticCall_DefinedToNSM
had Fail, Crash
expectation when hot reload was initially landed 48c8ffa7f3d778fb6ee39c558f7ffae9a134e7ad, then it morphed into Fail, Crash, Timeout
, about which I complained. Then it became Skip
and SkipSlow
(unclear why?). Then @bkonyi triaged these tests as aspirational and reinstated Fail, Crash
expectations for them.
So it seems we have never really supported the case when the code is not recompiled from scratch.
The flow graph builder can then see that the call to the constructor (or any static method) is invalid due to mismatch in signatures (or types) and can call NSM directly.
Checking number of parameters is easy (though it would still require changing flow graph builder to stop trusting Kernel, it's a bit of a slippery slope) - but with types things would get complicated, we would essentially need to duplicate some of the front end code (especially because static types of expressions are not serialised into kernel).
How about a different approach: in unoptimized code our static calls are already going through a stub. There is even a code which is supposed to rebind static target on reload, and this code even does some argument descriptor validation, which does not actually do anything if argument descriptors does not match (there is TODO to redirect to NSM).
Can it be that fixing this TODO would actually fix the crash?
@mraleph What about types? If we keep number of positional/named parameters but change the types of parameters, it becomes unsound, because we have old callers not providing the right type the new callee expects.
@mkustermann rebinder has access to the old target - so it can compare the signature and check if new one is a valid override for the old one.
It is not sufficient to check static calls. Old code was compiled with different class hierarchy and different surrounding code, and it may not be valid with hot-reloaded code. Executing old code of a closure is totally unsound and may cause crashes.
For example, consider the following code:
class A {}
class B extends A {}
main() {
A foo() => new B();
// hot-reload here
A a = foo();
}
In the closure code the instance of class B
is returned as a value of type A
without any casts.
It could be hot-reloaded to
class A {}
class B {}
...
Executing old code of a closure breaches soundness, although no signatures changed.
Some time ago we discussed this problem in the context of hot-reloading bytecode, and I remeber that @rmacnak-google had a prototype to match closure instances to the new hot-reloaded code if possible, otherwise throw an error if unmatched closure is executed. This would guarantee that we will not attempt to execute invalid old code. @rmacnak-google could you remind what was the result of that experiment?
@mraleph Right, IsolateReload_PendingStaticCall_DefinedToNSM never passed. It is a more difficult version of the test Martin provided, which historically did pass in the VM's frontend.
Executing old code, either through a stored closure or activations still on the stack, or executing additional code, such as loaded through eval or loadUri, can breach soundness because they assume but do not verify the world of types is the same.
The experiment @alexmarkov mentioned is in https://dart-review.googlesource.com/c/sdk/+/111316/ It seemed okay for the few cases I tried in Flutter Gallery, but my intuition is still that this will reject more valid changes that than it will prevent invalid changes, which is par for Dart 2. Soundness will also require disallowing reload except at the event loop boundary, which means fix-and-continue debugging will be impossible.
Following. We should try to have a fix before Friday if we want to cherrypick this to Beta.
Is this issue fixed now after https://github.com/dart-lang/sdk/commit/847a1275133685f615a0c354292ba3f8e4304999 ? can it be closed ?
Is this issue fixed now after 847a127 ? can it be closed ?
No - there are still potential issues with old closures. They were compiled to kernel under the old static knowledge (e.g. class hierarchy). Even after hot-reload, the re-compilation of old closures in the VM will use the old kernel and therefore assume the old static knowledge is still valid, which can be no longer the case after hot-reload. (Alex provided an example in https://github.com/dart-lang/sdk/issues/41482#issuecomment-614177695 )
We should find someone to look into this, possibly in collaboration with CFE team.
Ideally the CFE (which knows about the static knowledge) could give the vm a hot-reload diff, containing additional information about closures. The VM could then refuse the reload if there is an old closure in the heap which the CFE told us is no longer valid.
@mkustermann Please re-evaluate this and do one of the following:
@johnniwinther is going to take a look at the suggestion made in #41482 and see if it is feasible for CFE to provide that information. Based on that investigation we will decide on how to proceed on this issue.
@jensjoha Can you look into whether we can provide information as suggested in https://github.com/dart-lang/sdk/issues/41482#issuecomment-666995294 ?
Either I don't understand, or it's also a problem for non-closures. E.g. if I have this flutter library:
import 'package:flutter/material.dart';
void main() {
runApp(MyApp());
}
class MyApp extends StatelessWidget {
@override
Widget build(BuildContext context) {
return MaterialApp(
title: 'Flutter Demo',
theme: ThemeData(
primarySwatch: Colors.blue,
visualDensity: VisualDensity.adaptivePlatformDensity,
),
home: MyHomePage(title: 'Flutter Demo Home Page'),
);
}
}
class MyHomePage extends StatefulWidget {
MyHomePage({Key key, this.title}) : super(key: key);
final String title;
@override
_MyHomePageState createState() => _MyHomePageState();
}
class A {
String foo() {
print("Hello from A!");
return "foo!";
}
}
class B extends A {}
A Function() lookAtMe;
A lookAtMe2;
foo() {
print("Setting data to B's");
lookAtMe = () => new B();
lookAtMe2 = new B();
}
bar() {
A a = lookAtMe();
print(a);
if (a is A) {
print("Closure returns an A as promised :)");
} else {
print("Closure doesn't return an A as promised :(");
}
print("");
print(lookAtMe2);
if (lookAtMe2 is A) {
print("Field is an A as promised :)");
} else {
print("Field is not an A as promised :(");
}
print("");
try {
print(a.foo());
} catch (e) {
print(" => trying to print a.foo crashed :(");
}
try {
print(lookAtMe2.foo());
} catch (e) {
print(" => trying to print lookAtMe2.foo crashed :(");
}
}
class _MyHomePageState extends State<MyHomePage> {
int _counter = 0;
void _incrementCounter() {
if (_counter.isEven) {
foo();
} else {
bar();
}
setState(() {
_counter++;
});
}
@override
Widget build(BuildContext context) {
return Scaffold(
appBar: AppBar(
title: Text(widget.title),
),
body: Center(
child: Column(
mainAxisAlignment: MainAxisAlignment.center,
children: <Widget>[
Text(
'You have pushed the button this many times:',
),
Text(
'$_counter',
style: Theme.of(context).textTheme.headline4,
),
],
),
),
floatingActionButton: FloatingActionButton(
onPressed: _incrementCounter,
tooltip: 'Increment',
child: Icon(Icons.add),
),
);
}
}
and run it, pressing the + 3 times I get this terminal output:
flutter: Setting data to B's
flutter: Instance of 'B'
flutter: Closure returns an A as promised :)
flutter:
flutter: Instance of 'B'
flutter: Field is an A as promised :)
flutter:
flutter: Hello from A!
flutter: foo!
flutter: Hello from A!
flutter: foo!
flutter: Setting data to B's
(looking at the code, notice that lookAtMe contains a closure claiming to return an A returning a B (which at this point is an A) and a field of claimed type A containing a B (which at this point is an A).
Now apply these changes:
diff --git a/lib/main.dart b/lib/main.dart
index e73638d..8dfb383 100644
--- a/lib/main.dart
+++ b/lib/main.dart
@@ -34,15 +34,15 @@ class A {
}
}
-class B extends A {}
+class B /*extends A*/ {}
A Function() lookAtMe;
A lookAtMe2;
foo() {
- print("Setting data to B's");
- lookAtMe = () => new B();
- lookAtMe2 = new B();
+ print("Setting data to A's");
+ lookAtMe = () => new A();
+ lookAtMe2 = new A();
}
bar() {
hot reload, press the + 1 time and the terminal now says this:
Performing hot reload...
Reloaded 1 of 526 libraries in 225ms.
flutter: Instance of 'B'
flutter: Closure doesn't return an A as promised :(
flutter:
flutter: Instance of 'B'
flutter: Field is not an A as promised :(
flutter:
flutter: => trying to print a.foo crashed :(
flutter: => trying to print lookAtMe2.foo crashed :(
Isn't this the situation - both for closures and fields - that is expressed in https://github.com/dart-lang/sdk/issues/41482#issuecomment-614177695 and thus a problem not centered around closures? (neither of these seem to crash the VM though --- is that the only concern?)
(pressing it twice more gives
flutter: Setting data to A's
flutter: Instance of 'A'
flutter: Closure returns an A as promised :)
flutter:
flutter: Instance of 'A'
flutter: Field is an A as promised :)
flutter:
flutter: Hello from A!
flutter: foo!
flutter: Hello from A!
flutter: foo!
i.e. the old stuff is cleared and the new stuff is set and everything is fine again)
What's the status on this?
After further discussions with @rmacnak-google and @alexmarkov it is clear that fixing this would require some prototyping to evaluate the options available and the fix is going to take longer so moving this issue out of the October milestone and the plan is to complete this in Q4.
Any updates here? Is this issue still coming up in practice? (I.e., should this be a P1?)
I don't think this needs to be a P1.
//cc @rmacnak-google @alexmarkov do we still think this is something that we will plan to fix in the near term, should it still be a 'p2' ?
The following flutter issues reporting SEGVs
might be caused by this problem.
This example vm/cc test can cause a segfault:
Tentatively assigning to @rmacnak-google due to hot-reload.