flutter / flutter

Flutter makes it easy and fast to build beautiful apps for mobile and beyond
https://flutter.dev
BSD 3-Clause "New" or "Revised" License
165.51k stars 27.32k forks source link

Add hot restart hooks to support dart:ffi #75528

Open jonahwilliams opened 3 years ago

jonahwilliams commented 3 years ago

Consider a Dart usage of a C api through ffi. A common pattern is the usage of context objects that must be explicitly setup and torn down for a given thread. ( This is simplified from an example program I wrote using https://sol.gfxile.net/soloud/ )

final lib = ffi.DynamicLibrary.open('some.dll');

class SoundSystem {
  ffi.Pointer _audioContext = lib.createContext();

  SoundSystem() {
    lib.setupContext(_audioContext);
   }

   void playSound() {
     lib.playSound(_audioContext);
   }

   void dispose() {
     lib.destroyContext(_audioContext);
   }
}

While this works well enough when performing a hot reload, but a hot restart may cause a native crash as recreating the SoundSystem with cause another audio context to be created without destroying the previous one.

During normal program execution this isn't a problem, since the dispose can be connected to the correct part of the application teardown logic ( or the handled by the OS as the program exits).

Trying to do something like teardown the entire widget tree may solve this problem, but would be expensive as well as a large behavior change with unintended consequences. Instead, I would propose adding a binding hook that allows developers/library authors to run callbacks before the hot restart is performed. The risk should be lower since the feature is opt-in, and failures could be ignored since the isolate would be destroyed anyway (though detecting these failures without using something like a timeout will be tricky).

For example, the above code could be modified like:

final lib = ffi.DynamicLibrary.open('some.dll');

class SoundSystem {
  ffi.Pointer _audioContext = lib.createContext();

  SoundSystem() {
    if (kDebugMode) {
      SomeBinding.instance!.addHotRestartTeardown(dispose);
    }
    lib.setupContext(_audioContext);
   }

   void playSound() {
     lib.playSound(_audioContext);
   }

   void dispose() {
     lib.destroyContext(_audioContext);
   }
}

This won't work if the library is trying to be platform agnostic though - which means we might need to consider either dart level support or a convention around using a dart:developer API. For example, something like

import 'dart:developer';

final lib = ffi.DynamicLibrary.open('some.dll');

class SoundSystem {
  ffi.Pointer _audioContext = lib.createContext();

  SoundSystem() {
    if (kDebugMode) {
      registerExtension('dart.ext.hotRestartTeardown', dispose);
    }
    lib.setupContext(_audioContext);
   }

   void playSound() {
     lib.playSound(_audioContext);
   }

   void dispose() {
     lib.destroyContext(_audioContext);
   }
}

Unfortunately I don't think that will work out of the box, since the various different libraries would stomp on eachother with different callbacks

jonahwilliams commented 3 years ago

FYI @Hixie @zanderso @dcharkes for thoughts.

Hixie commented 3 years ago

Having a way to register a callback that is invoked just before a hot restart SGTM. We should make sure to wrap each such callback in a try/finally block so that failures don't block the restart. Such callbacks should presumably be asynchronous. I think it'd be fine to call all of them at once and then await all their returned Futures. We should apply some timeout (like 5 seconds) to that wait, and if that expires print a message saying that hot restart is taking a long time (maybe each callback can have a label and we'd print the labels of any stalled callbacks in the message), providing some mechanism to continue with the hot restart regardless.

zanderso commented 3 years ago

In the dart:io implementation (which is similar to clients of dart:ffi in that it wraps a lot of native APIs), whenever a native resource is handed to Dart code, it has a native finalizer attached to it, e.g. to call close() on a file descriptor as a backstop to prevent leaks. When working on dart:io, as a library author, I found that that defensive approach was preferable to hunting down missing dispose() calls in client code. Given some way to attach native finalizers to dart:ffi Pointer types, those finalizers would run when the main isolate is torn down during a hot restart.

A hot restart is also not the only time when an isolate might be be destroyed unexpectedly. See https://api.dart.dev/dev/2.13.0-4.0.dev/dart-isolate/Isolate/kill.html. If a parent isolate kills a child isolate, the child isolate will have no opportunity even to run service extension callbacks. So the only cleanup that can happen is running the finalizers on the native side.

However, it seems like it might be difficult to require dart:ffi users to provide a shim library for attaching a native finalizer for every library they'd want to use, and there might be other problems with exposing the finalizer API from dart_api.h. But cleaning up native resources whenever an isolate is torn down, not just due to a hot restart, is probably what we want. So more like:

registerExtension('dart.ext.isolateTeardown', dispose);
zanderso commented 3 years ago

Also /cc @mraleph

dcharkes commented 3 years ago

Related issue for hot-reload https://github.com/flutter/flutter/issues/59061.

jonahwilliams commented 3 years ago

@zanderso I'm somewhat ignorant about the finalizer APIs in dart, but would this be something that is feasible to use with a library that is only available as a dll, or would it require authors to do some "wrapping" of the existing native APIs?

jonahwilliams commented 3 years ago

@dcharkes we actually already have this for hot reload in flutter, you can use reassemble method provided your native API is connected to the widget tree in someway.

zanderso commented 3 years ago

@zanderso I'm somewhat ignorant about the finalizer APIs in dart, but would this be something that is feasible to use with a library that is only available as a dll, or would it require authors to do some "wrapping" of the existing native APIs?

Yeah, I mentioned this above, but my understanding is that a wrapper/shim library would be required to attach the native finalizer before returning an object handle to Dart code. Putting my thoughts a different way:

  1. Ideally dart:ffi has a native-side API (not necessarily the API currently in dart_api.h) for registering callbacks for cleaning up native resources when the Dart side drops all references (e.g. when an Isolate is getting torn down).
  2. I suspect (1) is too hard, but an improvement would be the ability to add Dart-language cleanup hooks not only for hot restart, but whenever an Isolate is getting torn down. This won't cover as many cases as (1), but still better than nothing.
jonahwilliams commented 3 years ago

I could see that working in cases where a native library is written for the purpose of use in flutter. For my example usage I would probably just comment out the sound library when I'm not testing it, rather than write another wrapper.

maks commented 2 years ago

I ran into the very same issue that @jonahwilliams did, in my case trying to use via FFI a small library that uses MiniAudio as its audio backend that I had made. And likewise, everytime I hot-restart a Flutter app (well at least testing on Linux desktop & Android) that uses it (via a Flutter plugin I built to expose the library) the whole app crashes from native-side code because it doesn't release the audio context.

So I thought that the new (in Dart 2.17) NativeFinalizer would now be the solution to this. BUT it doesn't seem to work for the hot-restart case as the finalizer doesn't seem to be called when a hot-restart occurs.

This is the code I'm using (irrelevant bits removed) based on the code example in the NativeFinalizer dartdoc:

class LibMSFAPlugin implements Finalizable {
  static final _finalizer = NativeFinalizer(_bindings.addresses.shutdownEngine.cast());

  /// Used to prevent double close and usage after close.
  bool _shutdown = false;

  Future<bool> init() async {
    final SendPort helperIsolateSendPort = await _helperIsolateSendPort;
    final request = _InitRequest();
    final completer = Completer<bool>();
    _initRequest = completer;
    helperIsolateSendPort.send(request);
    return completer.future;
  }

  void sendMidi(List<int> list) {
    //...
  }

  void shutDown() {
    if (_shutdown) {
      return;
    }
    _shutdown = true;
    _finalizer.detach(this);
    _bindings.shutdownEngine();
  }
}

and I'm just using ffigen to generate the binding, again as per docs, so I'm not sure if I'm doing something wrong or if there is a bug with NativeFinalizers and Flutter? @dcharkes would you have some insight on this?

I should add that my app was also crashing on Hot-reload but thanks to @jonahwilliams tip above on reassemble adding this code has at least stopped the crashes on hotreload:

 @override
  void reassemble() {
    super.reassemble();
    print("reassembling state...");
    plugin.shutDown();
    plugin = LibMSFAPlugin();
  }
dcharkes commented 2 years ago

So I thought that the new (in Dart 2.17) NativeFinalizer would now be the solution to this. BUT it doesn't seem to work for the hot-restart case as the finalizer doesn't seem to be called when a hot-restart occurs.

As far as I know, hot restart does not force a garbage collection. Neither is a single garbage collection guaranteed to collect all unreachable objects. Only on VM shutdown are the native finalizers guaranteed to run. So relying on garbage collection here is not a valid approach for supporting hot restart. The question to ask here is: is hot restart conceptually a VM-shutdown and VM-start? Or is it only a shutdown of all existing isolates? I've filed https://github.com/dart-lang/sdk/issues/50016 to investigate this.

jonahwilliams commented 2 years ago

If it doesn't work with the finalizer approach, I had a draft written up where I added a new binding callback that you could stick a dispose call on. We could revist that approach

SchedulerBinding.instance.debugAddPreRestartCallback();
maks commented 2 years ago

Thanks for looking into further @dcharkes ! And that would be great @jonahwilliams ! Since this is only a development time issue, it would be great to have a simple binding callback to hook into for these kinds of clean up tasks that need to happen on a hot restart.

Interestingly I'm also working on a Dart (not Flutter) app project that uses a different library (Sunvox) that also has an audio engine but there using VScodes (hot?) restart command works fine as that seems to cause a SIGINT to my Dart cli app (on Linux) and so I have a place to hook into to cleanly shutdown the audio.

dcharkes commented 2 years ago

@jonahwilliams Are (1) all isolate groups shutdown in Flutter before new isolates are started on hot restart? Or are (2) the isolates shut down after the new isolates are already created (this would be likely faster, and also would enable hot restart to fail and to continue running the old isolates).

If (1), then we should use the NativeFinalizer approach and ensure running them before the starting new isolates in hot restart. If (2) then the heaps of the old isolates outlive the hot restart, and relying on NativeFinalizers would not work.

jonahwilliams commented 2 years ago

The flutter tool needs to manually shut down additional isolates while performing a hot restart. I don't think we intentionally keep other isolates alive, its just an artifact of the implementation.

If we could update hot restart to shutdown the entire isolate group, would this "just work"?

dcharkes commented 2 years ago

Yes, it should "just work", if all isolates are shut down:

void Isolate::Shutdown() {
  // ...

  // Ensure native finalizers are run before isolate has shutdown message is
  // sent. This way users can rely on the exit message that an isolate will not
  // run any Dart code anymore _and_ will not run any native finalizers anymore.
  RunAndCleanupFinalizersOnShutdown();

https://github.com/dart-lang/sdk/blob/c1e43135ad1905c90d17aeaf27e421b331eb508e/runtime/vm/isolate.cc#L2652-L2655

jonahwilliams commented 2 years ago

I looked into this a bit, but it does not seem like dart_api.h exposes an API to shutdown an isolate group, just an individual isolate. Is that correct?

For reference, this is where the engine performs an isolate shutdown triggered by hot restart:

https://github.com/flutter/engine/blob/main/runtime/dart_isolate.cc#L730-L751

dcharkes commented 2 years ago

I looked into this a bit, but it does not seem like dart_api.h exposes an API to shutdown an isolate group, just an individual isolate. Is that correct?

Correct, we only have Dart_ShutdownIsolate.

https://github.com/dart-lang/sdk/blob/a1b669fda1e238845784edebfc859d1cc57cba6f/runtime/include/dart_api.h#L1167-L1174

We have a callback that is called after an isolate group is shut down

https://github.com/dart-lang/sdk/blob/a1b669fda1e238845784edebfc859d1cc57cba6f/runtime/include/dart_api.h#L744-L757

https://github.com/dart-lang/sdk/blob/a1b669fda1e238845784edebfc859d1cc57cba6f/runtime/include/dart_api.h#L962

@aam @mkustermann Any reasoning why Dart_ShutdownIsolateGroup wasn't added to dart_api.h when isolate groups were added to the VM? Should we add it?

Neither is there an API that lets you ask which isolates are in an isolate group (with which one could iterate over the isolates with Dart_ShutdownIsolate). @jonahwilliams you could in the Flutter engine keep track of all the isolates that you start per isolate group, but that sounds like a lot of book keeping.

mkustermann commented 1 year ago

@aam @mkustermann Any reasoning why Dart_ShutdownIsolateGroup wasn't added to dart_api.h when isolate groups were added to the VM? Should we add it?

We do have something like it internally. In the end it's based on iterating over all isolates of a group and killing them (similar to Isolate.kill(...)).

The embedder is called on initialization of an isolate as well as on shutdown/cleanup, so one could remember the set of isolates and invoke Dart_KillIsolate() for all of them. Similarly one could use vm-service API to loop over isolates in a group and killing them.

We could add a Dart_KillIsolateGroup() API. The semantics would probably be: a) it disables spawning of new isolates b) it sends kill signal to all isolates of the isolate group. The embedder would still need to wait until all isolates (including the current one) have shutdown: It can do that by waiting for the isolate group's shutdown callback to be invoked.

@jonahwilliams Would this be helpful?

jonahwilliams commented 1 year ago

The flutter tool currently tracks and shuts down these isolates, so this might be helpful. But does this help with running finalizers?

dcharkes commented 1 year ago

The flutter tool currently tracks and shuts down these isolates, so this might be helpful. But does this help with running finalizers?

After an isolate has been shut down, the native finalizers are guaranteed to have been run.

jonahwilliams commented 1 year ago

Sorry, I'm a bit lost then. It sounds like we're doing everything as expected, but then I would imagine that finalizers should work - but they don't according to https://github.com/flutter/flutter/issues/75528#issuecomment-1251710029 ?

dcharkes commented 1 year ago

Also for any helper isolates that were started?

jonahwilliams commented 1 year ago

These are shutdown by the flutter tool. Possibly this is happening too late?

maks commented 1 year ago

sigh this lack of a hot-restart hook keeps getting worse and worse 😒

I know this issue is specifically about FFI and its not exactly the same thing since afaik Isolates are not used in Dart Web, but I've run into a variation of this same issue now with Flutter Web and Firestore streams not able to be correctly shutdown again due to a lack of a hook for the Firebase plugins to use to detect a hot-restart to clean close the connection

Others report same problem not being able to close websockets.

Surely @jonahwilliams's registerHotRestartCallback() API could address all of these issues? Reading PR #86134 I have no idea why it was closed, I'm guessing there was some internal discussion about why it wouldn't be merged?

Hixie commented 1 year ago

I think that was just that @jonahwilliams was taking a break from Flutter stuff -- please don't hesitate to take over the PR and resubmit it (it'll probably need rebasing and stuff to make it work though).

jonahwilliams commented 1 year ago

Yeah, Like @Hixie mentioned I took a break from Flutter for a while and was not able to continue it. Though there is still some question on whether it would be necessary, or if we could use the new Finalizer API.

On the web things might be a bit different, and to my knowledge I thought that the web hot restart should handle these sorts of things, but that sounds like its not the case. I don't think the finalizer API will work there, so maybe we should just go forward with this API?

maks commented 1 year ago

No worries at all @jonahwilliams thanks for the huge amount of work you've already done on Flutter! πŸ‘πŸ»

@Hixie I'll have a crack at getting Jonah's branch rebased and submit it a new PR for it as yes I think that Finalizer's don't seem to be a solution for this, at least for the moment that don't seem to work for the hot-restart use case.

davidmartos96 commented 1 year ago

I opened a new PR based on the previous work from Jonah #130662

brianquinlan commented 1 year ago

I just had a user-reported bug related to this for package:cupertino_http. The TL;DR version is that package:cupertino_http will continue to send HTTP data to ports that no longer exist after the application has been hot reloaded.

I don't understand why we can't make the registration function an API in dart:developer and then the embedder can decide when to invoke the callbacks (Dart CLI would never do so).

dcharkes commented 1 year ago

I don't understand why we can't make the registration function an API in dart:developer and then the embedder can decide when to invoke the callbacks (Dart CLI would never do so).

If I'm not mistaken, "restart" is not a Dart concept it's a flutter concept, only "reload" is a Dart concept. dart:developer can reloadSources.

Restart should be implemented on top of ShutdownIsolate as per discussion in this issue above. When Dart_ShutdownIsolate we do not expect to run any more Dart code on that isolate. So, if any Dart code needs to be executed before shutdown, it should be done via the embedder that calls ShutdownIsolate.

Since "hot restart" is a Flutter concept, hot restart hooks should probably live in Flutter.

cc @jonahwilliams Is my assessment correct that "hot restart" is a Flutter concept? (Digging through flutter_tools with fullRestart only elicits that the resident frontend server is fully reset.)

(Hot reload hooks should probably live in dart:developer if we would need those. But we don't need these for apparently for the use cases mentioned in this bug and the latest PR.)

Whether "hot restart" would be a useful feature for the Dart VM so it can be reused in multiple embedders, that's another question.

aam commented 1 year ago

Is my assessment correct that "hot restart" is a Flutter concept? (Digging through flutter_tools with fullRestart only elicits that the resident frontend server is fully reset.)

https://github.com/flutter/engine/blob/main/shell/common/engine.cc#L196 restart logic is in the engine and from vm perspective root isolate gets launched after all previously running isolates are shutdown. https://github.com/flutter/flutter/blob/master/packages/flutter_tools/lib/src/run_hot.dart#L564 is where vmservice request that is handled by the engine(by running the code above) is sent by the flutter tools.

. The TL;DR version is that package:cupertino_http will continue to send HTTP data to ports that no longer exist after the application has been hot reloaded.

Is cupertino_http problem hot-restart specific? Can't same thing happen if the isolate with port handler gets killed for example(Isolate::kill)? Because that's effectively what happens with flutter hot-restart. Will listening for vmservice isolate shutdown events help with getting things cleaned up?

brianquinlan commented 1 year ago

Since "hot restart" is a Flutter concept, hot restart hooks should probably live in Flutter.

The practical consequence of this is that FFI packages that could otherwise be Dart packages will have to become Flutter packages if they want to survive hot restart. We don't want that.

Is cupertino_http problem hot-restart specific? Can't same thing happen if the isolate with port handler gets killed for example(Isolate::kill)? Because that's effectively what happens with flutter hot-restart.

IMO, if a user explicitly kills the Isolate listening for events then that is on them. But we do want our packages to work with hot-restart.

Will listening for vmservice isolate shutdown events help with getting things cleaned up?

Maybe. Do you get those events before the isolate is shut down?

jonahwilliams commented 1 year ago

Lets not let the perfect be the enemy of the good here. ffi Packages targeting Flutter have a serious deficiency that cannot be solved with existing dart tooling. From my prior investigation, neither vmservice events nor finalizers were sufficient to solve the issue, though I'm far from an expert in the Dart Isolate lifecycle. If the Dart team wants to investigate these issues and propose a functional alternative design there would certainly be no complaint from me, but I don't think this effort should block attempts to help Flutter developers today.

dcharkes commented 1 year ago

https://github.com/flutter/engine/blob/main/shell/common/engine.cc#L196 restart logic is in the engine and from vm perspective root isolate gets launched after all previously running isolates are shutdown.

This means relying on native finalizers should work. But it does not:

void* global_resource = NULL;
int allocated_counter = 0;

FFI_PLUGIN_EXPORT void* AllocateResource() {
  void* global_resource = malloc(20);
  allocated_counter++;
  return global_resource;
}

FFI_PLUGIN_EXPORT void ReleaseResource(void* resource) {
  free(resource);
  global_resource = NULL;
  allocated_counter--;
}

FFI_PLUGIN_EXPORT int AllocatedCounter() {
  return allocated_counter;
}
import 'dart:ffi';

import 'test_hot_restart_bindings_generated.dart' as bindings;

class MyResource implements Finalizable {
  final Pointer<Void> _pointer;

  bool _released = false;

  MyResource._(this._pointer);

  factory MyResource.allocate() {
    final pointer = bindings.AllocateResource();
    final result = MyResource._(pointer);
    _finalizer.attach(result, pointer);
    return result;
  }

  void release() {
    if (_released) {
      return;
    }
    _released = true;
    _finalizer.detach(this);
    bindings.ReleaseResource(_pointer);
  }

  // static final _dylib = DynamicLibrary.open('libtest_hot_restart.so'); // Android
  static final _dylib = DynamicLibrary.process(); // MacOS

  static final _finalizer = NativeFinalizer(
      _dylib.lookup<NativeFinalizerFunction>('ReleaseResource'));

  static int allocatedCounter() => bindings.AllocatedCounter();
}

A quick test makes allocatedCounter increase by one on every hot restart.

I'll try to see why isolate shutdown doesn't happen. (Or why the native finalizers are not run.)

mkustermann commented 1 year ago

Crosslinking https://github.com/flutter/flutter/issues/124546#issuecomment-1541758901 that describes a few issues with flutter's hot-restart implementation, amongst them those two:

dcharkes commented 1 year ago

A quick test makes allocatedCounter increase by one on every hot restart.

This assessment is not entirely correct. It reproduces on MacOS and Android. The Linux implementation actually works perfectly fine with NativeFinalizers! (Repro: https://github.com/dcharkes/test_hot_restart ) So it seems that not being able to rely on NativeFinalizers is a per OS implementation problem.

dcharkes commented 12 months ago
  • A hot restart should stop/kill all existing isolates and clear state (such as isolate name server registry)

I can repro this to be an issue.

Tested on:

$ flutter --version
Flutter 3.16.0-12.0.pre.63 β€’ channel master β€’ git@github.com:flutter/flutter
Framework β€’ revision 88f302913a (3 days ago) β€’ 2023-10-16 04:49:56 -0400
Engine β€’ revision 15dee70086
Tools β€’ Dart 3.3.0 (build 3.3.0-27.0.dev) β€’ DevTools 2.28.1

Linux, MacOS, and iOS.

https://github.com/dcharkes/test_hot_restart

Main isolate shutdown by

  pc 0x000000010e8bc458 fp 0x00000003092004b0 dart::Profiler::DumpStackTrace(bool)+0x28
  pc 0x000000010ebc1d87 fp 0x00000003092009f0 Dart_ShutdownIsolate+0x37
  pc 0x000000010e71b16d fp 0x0000000309200a20 flutter::DartIsolate::Shutdown()+0x5d
  pc 0x000000010e7283dd fp 0x0000000309200a90 flutter::RuntimeController::~RuntimeController()+0x9d
  pc 0x000000010e72859c fp 0x0000000309200ab0 flutter::RuntimeController::~RuntimeController()+0x1c
  pc 0x000000010e650d25 fp 0x0000000309200c70 flutter::Engine::Restart(flutter::RunConfiguration)+0x95

https://github.com/dcharkes/test_hot_restart/tree/extra-isolate

Helper isolate is shut down by

  pc 0x000000010e8bc458 fp 0x000000030a0c1900 dart::Profiler::DumpStackTrace(bool)+0x28
  pc 0x000000010ebc1d87 fp 0x000000030a0c1e40 Dart_ShutdownIsolate+0x37
  pc 0x000000010e80be42 fp 0x000000030a0c1ea0 dart::MessageHandler::TaskCallback()+0x342

Instead, we should be killing all isolates in RuntimeController::~RuntimeController(). Relying on the message system shuts down the isolate too late. We need to synchronously wait until all isolates are shutdown to ensure the NativeFinalizers are run.

I assume we don't support Isolate.spawnUri in Flutter (we ship no sources inside a flutter app)? So we should only have a single isolate group.

If we only have a single isolate group we have multiple options:

  1. Save the isolate group in the RuntimeController and introduce Dart_KillIsolateGroup as Martin suggested earlier, or
  2. don't save the isolate group but use DartIsolateScope + Dart_CurrentIsolateGroup to find the isolate group and introduce Dart_KillIsolateGroup, or
  3. introduce a Dart_GetIsolateGroup(isolate) and Dart_KillIsolateGroup, or
  4. if we already keep track of a list of all isolates somewhere in the Flutter engine, let RuntimeController' refer to that list and loop over every isolate in that list in the destructor.

@jonahwilliams Any preferences? I can make a PR for the Dart SDK and Flutter engine. (Also, anyone else who needs to be looped in for feedback here?)

jonahwilliams commented 12 months ago

I assume we don't support Isolate.spawnUri in Flutter (we ship no sources inside a flutter app)? So we should only have a single isolate group.

I think that might happen to work in debug mode? Or maybe that was disabled. But it definitely does not work in profile/release mode.

RE: the options above, I don't think the engine keeps track of any isolate lists, and isolates can come and go via Isolate.run/compute. So any options that involve the engine tracking that state is going to be complicated. If Dart already knows the isolates to shutdown I think I'd prefer to delegate that to Dart.

mkustermann commented 11 months ago

When a flutter user decides to restart (instead of reload) it should ensure all isolates of an isolate group are killed - otherwise it will lead two two different isolate groups (one with old code and one with new code) that cannot really communicate with each other anymore - see https://github.com/flutter/flutter/issues/124546 (which this issue is proposing to fix).

Though there's a complication: Flutter allows creating multiple UI isolates (see docs.flutter.dev/add-to-app/multiple-flutters). There's a question whether a hot-restart should affect one or all UI isolates.

(I wanted to bring this multiple-UI-isolate scenario here, because it's very uncommon and probably not very well tested - at least with hot-restart)

jonahwilliams commented 11 months ago

I think that we have briefly discussed the issues of hot reload and hot restart when there are multiple engines, and concluded that 1) The tooling doesn't handle this correctly today and 2) We're unsure what the right approach would even be (do users expect the hot reload/restart to only effect the entrypoint they are "running"?

I think that using multiple engines in a development environment is unlikely, and I wouldn' worry about it beyond making sure we don't explode.

dcharkes commented 11 months ago

I think that using multiple engines in a development environment is unlikely, and I wouldn' worry about it beyond making sure we don't explode.

Given that the current behavior is weird with regard to messages, and that adding a synchronous Dart_KillCurrentIsolateGroup() could lead to deadlocks, should we consider refusing to hot restart all together if there are multiple engines / multiple UI isolates? That would at least be clear behavior for users. (I'd need to check how we would detect multiple engines, I don't know the code by heart.)

jonahwilliams commented 11 months ago

I don't feel like I'm familiar enough with the multiple engines case to really make a call, but if the alternative is deadlocks then refusing to hot restart sounds reasonable.

dcharkes commented 11 months ago

I don't feel like I'm familiar enough with the multiple engines case to really make a call, but if the alternative is deadlocks then refusing to hot restart sounds reasonable.

Who is more knowledgable w.r.t. multiple engines?

mkustermann commented 11 months ago

I think that we have briefly discussed the issues of hot reload and hot restart when there are multiple engines, and concluded that 1) The tooling doesn't handle this correctly today and 2) We're unsure what the right approach would even be (do users expect the hot reload/restart to only effect the entrypoint they are "running"?

Flutter's big selling point is fast development time via hot-reload/hot-restart. So not supporting hot-reload/hot-restart at all for add2app / multi engine seems very harsh - as that would make it really hard for anyone to use it (development cycle may be terrible).

The hot-reload support in the VM will reload all isolates in an isolate group (that's the only thing that makes sense) - so supporting hot-reload in multi-engine scenario would affect all UI isolates. But hot-reload is mostly a VM-internal operation and doesn't require much flutter-engine or native (read: java/...) changes.

Given these hot-reload semantics, one could argue that a hot-restart semantics should also affect all multi-engine UI isolates. The question would then be how to do that safely in the engine and whether the native side would need changes as well (read: Java engine API) or it only requires flutter-engine internal changes.

Who is more knowledgable w.r.t. multiple engines?

AFAIK it was driven by @gaaclarke wdyt?

gaaclarke commented 11 months ago

AFAIK it was driven by @gaaclarke wdyt?

Wow, a lot of background here.

I agree with Jonah's assessment that we don't keep track of a list of isolate in the engine, so any solution shouldn't hinge on that.

I would expect hot-reload/hot-restart with multiple engines to have the same semantics as having multiple isolates. So, as Martin was saying, it effects all isolates (including the ones associated with engines).

So, when I implemented the feature I assumed that hot-reload/hot-restart were all handled under the Dart API so the implementation never explicitly made any decisions about it, nor was it necessary for implementing it.

We later found out that the tool does not handle the multi-engine case. I'm not very familiar with the tool but I assumed it is a matter of doing something like:

- GetGlobalEngine()->HotReload();
+ for (auto& engine : GetGlobalEngines()) {
+   engine->HotReload();
+ }

My understanding is we've never implemented something like that. It's worth noting that its omission is not accidental.

From reading through this long thread it sounds like hot-reload is good. We need to accommodate hot-restart which has hooks in the engine that tear down isolates, causing finalizers to execute which is actually the top line problem for this thread, right?

We want to call Dart_KillCurrentIsolateGroup but the problem is that Dart_KillCurrentIsolateGroup doesn't handle the case where multiple isolates are running on the same thread. It seems to me that what you'd want to do is to loop over all the engines and have them kill their isolate. Once all the engine isolates are dead you could call Dart_KillCurrentIsolateGroup at which point the bookkeeping should be easier since all of the isolates in the group are killed and you won't need signals or thread synchronization. Calling Dart_KillCurrentIsolateGroup while another engine is relying on it doesn't make sense anyways.

Taking a step back, I would solve the top line issue first for single engines, then solve it for multi-engine usage because there seems to be a gap with tooling anyways wrt to multiple engines and the top line issue effects a lot of single engine users.

Anything else I can help with? There is a lot in this thread =)

dcharkes commented 11 months ago

It seems to me that what you'd want to do is to loop over all the engines and have them kill their isolate. Once all the engine isolates are dead you could call Dart_KillCurrentIsolateGroup at which point the bookkeeping should be easier since all of the isolates in the group are killed and you won't need signals or thread synchronization. Calling Dart_KillCurrentIsolateGroup while another engine is relying on it doesn't make sense anyways.

That indeed sounds like a simpler solution. Thanks @gaaclarke!

Right, so that means we can't have a Engine::Restart that kills and immediately starts an isolate again.

bool Engine::Restart(RunConfiguration configuration) {
  TRACE_EVENT0("flutter", "Engine::Restart");
  if (!configuration.IsValid()) {
    FML_LOG(ERROR) << "Engine run configuration was invalid.";
    return false;
  }
  delegate_.OnPreEngineRestart();
  runtime_controller_ = runtime_controller_->Clone(); // Kills isolate
  UpdateAssetManager(nullptr);
  return Run(std::move(configuration)) == Engine::RunStatus::Success; // Starts isolate
}

https://github.com/flutter/engine/blob/4da3e5b12540458b82ff1e7ba733d08f67977042/shell/common/engine.cc#L196-L206

The caller would need to of this would first need to Engine::Kill(/*whole_group=*/false) for all engines but one, then kill the isolate group Engine::Kill(/*whole_group=*/true) (we need one isolate from the isolate group to find the isolate group so we can kill it) and then Engine::Run again for all engines.

I'd have to dive into how the hot-restart request is wired so that it triggers a hot restart.

I'm unfamiliar with multi engine. Does it mean that someone else is writing C++ code that imports the Flutter C++ engine code and wires hot restart up manually? Where do I find examples of this?

Does our own C++ code always only have a single engine?

gaaclarke commented 11 months ago

I'm unfamiliar with multi engine. Does it mean that someone else is writing C++ code that imports the Flutter C++ engine code and wires hot restart up manually? Where do I find examples of this?

Does our own C++ code always only have a single engine?

Here is an example that is doing it: https://docs.flutter.dev/add-to-app/multiple-flutters. The api is surfaced all the way up to users via objc, java, etc.

Here's how to think of "multi engine", the feature was called "lightweight engines". Customers wanted to have multiple engines in their app as they migrated existing apps to flutter piece-by-piece. The problem with that is that each engine has a lot of overhead with respect to memory and startup latency. Lightweight engines solves this problem by having multiple engines share the heaviest components, greatly reducing memory cost and startup latency. One of the heavy parts they share is the thread host. So, the engines will have 2 different dart isolates but they execute on the same thread. These shared resources are managed as std::shared_ptrs across engines. Is this helping?