dart-lang / sdk

The Dart SDK, including the VM, JS and Wasm compilers, analysis, core libraries, and more.
https://dart.dev
BSD 3-Clause "New" or "Revised" License
10.06k stars 1.55k forks source link

Hot-Reload can cause crashes with old closures in the heap #41482

Open mkustermann opened 4 years ago

mkustermann commented 4 years ago

The following flutter issues reporting SEGVs

might be caused by this problem.

This example vm/cc test can cause a segfault:

TEST_CASE(IsolateReload_ShapeChangeClosure) {
  const char* kScript = R"(
    class A {
      final x;
      final y;
      const A(this.x, this.y);
    }

    dynamic closure;

    main() {
      closure = () => A(1, 2);
      return "okay";
    }
  )";

 Dart_Handle lib = TestCase::LoadTestScript(kScript, NULL);
  EXPECT_VALID(lib);
  EXPECT_STREQ("okay", SimpleInvokeStr(lib, "main"));

  const char* kReloadScript = R"(
    class A {
      final x;
      const A(this.x);
    }

    dynamic closure;

    main() {
      // Call the old closure, which will try to call A(1, 2).
      closure();

      return "okay";
    }
  )";

  lib = TestCase::ReloadTestScript(kReloadScript);
  EXPECT_VALID(lib);
  EXPECT_STREQ("okay", SimpleInvokeStr(lib, "main"));
}
% ninja -C out/ReleaseX64 runtime_kernel
ninja: Entering directory `out/ReleaseX64'
ninja: no work to do.
%  out/ReleaseX64/run_vm_tests --dfe=$PWD/out/ReleaseX64/gen/kernel-service.dart.snapshot IsolateReload_ShapeChangeClosure
Running test: IsolateReload_ShapeChangeClosure
RELOAD REPORT:
{"type":"ReloadReport","success":true,"details":{"finalLibraryCount":20,"receivedLibraryCount":0,"receivedLibrariesBytes":0,"receivedClassesCount":0,"receivedProceduresCount":0,"savedLibraryCount":19,"loadedLibraryCount":1,"shapeChangeMappings":[]}}

===== CRASH =====
si_signo=Segmentation fault(11), si_code=1, si_addr=0x9
version=2.8.0-edge.22ad590c6829cfdf5ef246ecb7344482ca976dcb (be) (Tue Apr 14 14:09:56 2020 +0000) on "linux_x64"
pid=223413, thread=223413, isolate=isolate(0x560d9ee54900)
isolate_instructions=560d9bd7b1e0, vm_instructions=560d9bd7b1e0
  pc 0x00007f6451100672 fp 0x00007ffc807c0b20 Unknown symbol
  pc 0x00007f6451100596 fp 0x00007ffc807c0b80 Unknown symbol
  pc 0x00007f6451100447 fp 0x00007ffc807c0bb8 Unknown symbol
  pc 0x00007f6451100366 fp 0x00007ffc807c0be8 Unknown symbol
  pc 0x00007f645198174c fp 0x00007ffc807c0c48 Unknown symbol
  pc 0x0000560d9bf74652 fp 0x00007ffc807c0ce0 dart::DartEntry::InvokeCode(dart::Code const&, dart::Array const&, dart::Array const&, dart::Thread*)+0x112
  pc 0x0000560d9bf743b9 fp 0x00007ffc807c0d70 dart::DartEntry::InvokeFunction(dart::Function const&, dart::Array const&, dart::Array const&, unsigned long)+0x2e9
  pc 0x0000560d9c021f49 fp 0x00007ffc807c0de0 dart::Library::Invoke(dart::String const&, dart::Array const&, dart::Array const&, bool, bool) const+0x279
  pc 0x0000560d9c50f107 fp 0x00007ffc807c0ec0 Dart_Invoke+0x5d7
  pc 0x0000560d9bc2f143 fp 0x00007ffc807c0ef0 dart::SimpleInvokeStr(_Dart_Handle*, char const*)+0x23
  pc 0x0000560d9bc40f4f fp 0x00007ffc807c0fb0 dart::Dart_TestIsolateReload_ShapeChangeClosure()+0x27f
  pc 0x0000560d9bbabf3e fp 0x00007ffc807c0fd0 dart::TestCase::Run()+0x1e
  pc 0x0000560d9bbac083 fp 0x00007ffc807c1000 dart::TestCaseBase::RunTest()+0x53
  pc 0x0000560d9bcd2ae7 fp 0x00007ffc807c1020 dart::TestCaseBase::RunAll()+0x27
  pc 0x0000560d9bbac6d1 fp 0x00007ffc807c10b0 out/ReleaseX64/run_vm_tests+0x37f96d1
  pc 0x0000560d9bbac239 fp 0x00007ffc807c10c0 main+0x9
-- End of DumpStackTrace
zsh: abort      out/ReleaseX64/run_vm_tests  IsolateReload_ShapeChangeClosure

Tentatively assigning to @rmacnak-google due to hot-reload.

rmacnak-google commented 4 years ago

This looks like an easier version of TEST_CASE(IsolateReload_PendingStaticCall_DefinedToNSM). This was definitely handled before, but likely broken by kernel.

mkustermann commented 4 years ago

I don't know how it was handled before, but it could be handled in the flow graph building code: Once hot-reload is done we clear out all code. Then we continue running, which will invoke the closure. That will trigger unoptimized compile of closure. The flow graph builder can then see that the call to the constructor (or any static method) is invalid due to mismatch in signatures (or types) and can call NSM directly.

/cc @alexmarkov @mraleph WDYT?

mraleph commented 4 years ago

This looks like an easier version of TEST_CASE(IsolateReload_PendingStaticCall_DefinedToNSM). This was definitely handled before, but likely broken by kernel.

Out of curiosity I did a bit of archeology: IsolateReload_PendingStaticCall_DefinedToNSM had Fail, Crash expectation when hot reload was initially landed 48c8ffa7f3d778fb6ee39c558f7ffae9a134e7ad, then it morphed into Fail, Crash, Timeout, about which I complained. Then it became Skip and SkipSlow(unclear why?). Then @bkonyi triaged these tests as aspirational and reinstated Fail, Crash expectations for them.

So it seems we have never really supported the case when the code is not recompiled from scratch.

The flow graph builder can then see that the call to the constructor (or any static method) is invalid due to mismatch in signatures (or types) and can call NSM directly.

Checking number of parameters is easy (though it would still require changing flow graph builder to stop trusting Kernel, it's a bit of a slippery slope) - but with types things would get complicated, we would essentially need to duplicate some of the front end code (especially because static types of expressions are not serialised into kernel).

How about a different approach: in unoptimized code our static calls are already going through a stub. There is even a code which is supposed to rebind static target on reload, and this code even does some argument descriptor validation, which does not actually do anything if argument descriptors does not match (there is TODO to redirect to NSM).

Can it be that fixing this TODO would actually fix the crash?

mkustermann commented 4 years ago

@mraleph What about types? If we keep number of positional/named parameters but change the types of parameters, it becomes unsound, because we have old callers not providing the right type the new callee expects.

mraleph commented 4 years ago

@mkustermann rebinder has access to the old target - so it can compare the signature and check if new one is a valid override for the old one.

alexmarkov commented 4 years ago

It is not sufficient to check static calls. Old code was compiled with different class hierarchy and different surrounding code, and it may not be valid with hot-reloaded code. Executing old code of a closure is totally unsound and may cause crashes.

For example, consider the following code:

class A {}
class B extends A {}

main() {
  A foo() => new B();
  // hot-reload here
  A a = foo();
}

In the closure code the instance of class B is returned as a value of type A without any casts.

It could be hot-reloaded to

class A {}
class B {}
...

Executing old code of a closure breaches soundness, although no signatures changed.

Some time ago we discussed this problem in the context of hot-reloading bytecode, and I remeber that @rmacnak-google had a prototype to match closure instances to the new hot-reloaded code if possible, otherwise throw an error if unmatched closure is executed. This would guarantee that we will not attempt to execute invalid old code. @rmacnak-google could you remind what was the result of that experiment?

rmacnak-google commented 4 years ago

@mraleph Right, IsolateReload_PendingStaticCall_DefinedToNSM never passed. It is a more difficult version of the test Martin provided, which historically did pass in the VM's frontend.

Executing old code, either through a stored closure or activations still on the stack, or executing additional code, such as loaded through eval or loadUri, can breach soundness because they assume but do not verify the world of types is the same.

The experiment @alexmarkov mentioned is in https://dart-review.googlesource.com/c/sdk/+/111316/ It seemed okay for the few cases I tried in Flutter Gallery, but my intuition is still that this will reject more valid changes that than it will prevent invalid changes, which is par for Dart 2. Soundness will also require disallowing reload except at the event loop boundary, which means fix-and-continue debugging will be impossible.

franklinyow commented 4 years ago

Following. We should try to have a fix before Friday if we want to cherrypick this to Beta.

a-siva commented 4 years ago

Is this issue fixed now after https://github.com/dart-lang/sdk/commit/847a1275133685f615a0c354292ba3f8e4304999 ? can it be closed ?

mkustermann commented 4 years ago

Is this issue fixed now after 847a127 ? can it be closed ?

No - there are still potential issues with old closures. They were compiled to kernel under the old static knowledge (e.g. class hierarchy). Even after hot-reload, the re-compilation of old closures in the VM will use the old kernel and therefore assume the old static knowledge is still valid, which can be no longer the case after hot-reload. (Alex provided an example in https://github.com/dart-lang/sdk/issues/41482#issuecomment-614177695 )

We should find someone to look into this, possibly in collaboration with CFE team.

Ideally the CFE (which knows about the static knowledge) could give the vm a hot-reload diff, containing additional information about closures. The VM could then refuse the reload if there is an old closure in the heap which the CFE told us is no longer valid.

franklinyow commented 4 years ago

@mkustermann Please re-evaluate this and do one of the following:

  1. Add it to a milestone if it is still P1
  2. Change priority label and add milestone if no longer P1
  3. Close, if this issue is no longer needed
a-siva commented 4 years ago

@johnniwinther is going to take a look at the suggestion made in #41482 and see if it is feasible for CFE to provide that information. Based on that investigation we will decide on how to proceed on this issue.

johnniwinther commented 4 years ago

@jensjoha Can you look into whether we can provide information as suggested in https://github.com/dart-lang/sdk/issues/41482#issuecomment-666995294 ?

jensjoha commented 4 years ago

Either I don't understand, or it's also a problem for non-closures. E.g. if I have this flutter library:

import 'package:flutter/material.dart';

void main() {
  runApp(MyApp());
}

class MyApp extends StatelessWidget {
  @override
  Widget build(BuildContext context) {
    return MaterialApp(
      title: 'Flutter Demo',
      theme: ThemeData(
        primarySwatch: Colors.blue,
        visualDensity: VisualDensity.adaptivePlatformDensity,
      ),
      home: MyHomePage(title: 'Flutter Demo Home Page'),
    );
  }
}

class MyHomePage extends StatefulWidget {
  MyHomePage({Key key, this.title}) : super(key: key);

  final String title;

  @override
  _MyHomePageState createState() => _MyHomePageState();
}

class A {
  String foo() {
    print("Hello from A!");
    return "foo!";
  }
}

class B extends A {}

A Function() lookAtMe;
A lookAtMe2;

foo() {
  print("Setting data to B's");
  lookAtMe = () => new B();
  lookAtMe2 = new B();
}

bar() {
  A a = lookAtMe();
  print(a);
  if (a is A) {
    print("Closure returns an A as promised :)");
  } else {
    print("Closure doesn't return an A as promised :(");
  }
  print("");

  print(lookAtMe2);
  if (lookAtMe2 is A) {
    print("Field is an A as promised :)");
  } else {
    print("Field is not an A as promised :(");
  }
  print("");

  try {
    print(a.foo());
  } catch (e) {
    print(" => trying to print a.foo crashed :(");
  }
  try {
    print(lookAtMe2.foo());
  } catch (e) {
    print(" => trying to print lookAtMe2.foo crashed :(");
  }
}

class _MyHomePageState extends State<MyHomePage> {
  int _counter = 0;

  void _incrementCounter() {
    if (_counter.isEven) {
      foo();
    } else {
      bar();
    }

    setState(() {
      _counter++;
    });
  }

  @override
  Widget build(BuildContext context) {
    return Scaffold(
      appBar: AppBar(
        title: Text(widget.title),
      ),
      body: Center(
        child: Column(
          mainAxisAlignment: MainAxisAlignment.center,
          children: <Widget>[
            Text(
              'You have pushed the button this many times:',
            ),
            Text(
              '$_counter',
              style: Theme.of(context).textTheme.headline4,
            ),
          ],
        ),
      ),
      floatingActionButton: FloatingActionButton(
        onPressed: _incrementCounter,
        tooltip: 'Increment',
        child: Icon(Icons.add),
      ),
    );
  }
}

and run it, pressing the + 3 times I get this terminal output:

flutter: Setting data to B's
flutter: Instance of 'B'
flutter: Closure returns an A as promised :)
flutter: 
flutter: Instance of 'B'
flutter: Field is an A as promised :)
flutter: 
flutter: Hello from A!
flutter: foo!
flutter: Hello from A!
flutter: foo!
flutter: Setting data to B's

(looking at the code, notice that lookAtMe contains a closure claiming to return an A returning a B (which at this point is an A) and a field of claimed type A containing a B (which at this point is an A).

Now apply these changes:

diff --git a/lib/main.dart b/lib/main.dart
index e73638d..8dfb383 100644
--- a/lib/main.dart
+++ b/lib/main.dart
@@ -34,15 +34,15 @@ class A {
   }
 }

-class B extends A {}
+class B /*extends A*/ {}

 A Function() lookAtMe;
 A lookAtMe2;

 foo() {
-  print("Setting data to B's");
-  lookAtMe = () => new B();
-  lookAtMe2 = new B();
+  print("Setting data to A's");
+  lookAtMe = () => new A();
+  lookAtMe2 = new A();
 }

 bar() {

hot reload, press the + 1 time and the terminal now says this:

Performing hot reload...                                                
Reloaded 1 of 526 libraries in 225ms.
flutter: Instance of 'B'
flutter: Closure doesn't return an A as promised :(
flutter: 
flutter: Instance of 'B'
flutter: Field is not an A as promised :(
flutter: 
flutter:  => trying to print a.foo crashed :(
flutter:  => trying to print lookAtMe2.foo crashed :(

Isn't this the situation - both for closures and fields - that is expressed in https://github.com/dart-lang/sdk/issues/41482#issuecomment-614177695 and thus a problem not centered around closures? (neither of these seem to crash the VM though --- is that the only concern?)

(pressing it twice more gives

flutter: Setting data to A's
flutter: Instance of 'A'
flutter: Closure returns an A as promised :)
flutter: 
flutter: Instance of 'A'
flutter: Field is an A as promised :)
flutter: 
flutter: Hello from A!
flutter: foo!
flutter: Hello from A!
flutter: foo!

i.e. the old stuff is cleared and the new stuff is set and everything is fine again)

franklinyow commented 3 years ago

What's the status on this?

a-siva commented 3 years ago

After further discussions with @rmacnak-google and @alexmarkov it is clear that fixing this would require some prototyping to evaluate the options available and the fix is going to take longer so moving this issue out of the October milestone and the plan is to complete this in Q4.

vsmenon commented 3 years ago

Any updates here? Is this issue still coming up in practice? (I.e., should this be a P1?)

a-siva commented 3 years ago

I don't think this needs to be a P1.

a-siva commented 1 year ago

//cc @rmacnak-google @alexmarkov do we still think this is something that we will plan to fix in the near term, should it still be a 'p2' ?