[vm/ffi] `NativeCallable.onTemporaryIsolate`

dcharkes commented 8 months ago

We might never build this, but let's have a tracking bug to point to.

https://api.dart.dev/stable/3.2.4/dart-ffi/NativeCallable/NativeCallable.listener.html enables async callbacks that return immediately upon calling and schedule the callback to be run on the Dart event loop of the right isolate. No return values can be returned.

https://api.dart.dev/stable/3.2.4/dart-ffi/NativeCallable/NativeCallable.isolateLocal.html enables sync callbacks with return values. The current thread must be the one entered in the target isolate.

If we would like to be able to run Dart code synchronously and get a return value, but we don't have access to a Dart isolate in the current thread, we could conceivably spawn a new temporary isolate.

This would incur some serious limitations:

No access to other isolates / global state.
No async / event loop. (Synchronous return.)

We're currently not exploring this. @liamappelbe

A way more general approach would be to have shared memory multithreading:

https://github.com/dart-lang/language/pull/3531

liamappelbe commented 1 month ago

This has become more relevant recently, as a way of eliminating ffigen's generated ObjC code. Small ObjC trampolines are generated to fix https://github.com/dart-lang/native/issues/835, but if we had temp isolate callbacks we could write these trampolines in Dart.

Are we happy with the NativeCallable.onTemporaryIsolate name? What about shortening it to NativeCallable.tempIsolate?

dcharkes commented 1 month ago

I'm happy with the longer name, it's more clear. cc @lrhn for naming!

Cross linking the other open issue for NativeCallable.blocking:

https://github.com/dart-lang/sdk/issues/54554

lrhn commented 1 month ago

"Avoid abbreviations" is a style rule, so "temp" isn't stylistic.

I'd drop "temporary" completely. Maybe newIsolate, freshIsolate or spawnIsolate.

Is the isolates spawned first, out spawned synchronously when trying to fix the callback? Can you do more than one callback during the same native call? If so, does it use the same isolate, or does it spawn a new isolate per callback?

How will you prevent the call from starting async computations? It's presumably an isolate in the same isolate group, so you can't control access to dart:async. Not that that would work since Future is exported by dart:core, and something like Finalizer also triggers async callbacks.

Does the isolate do Isolate.exit when the callback returns? (Then each callback will definitely need a separate isolate.)

liamappelbe commented 1 month ago

I'd drop "temporary" completely. Maybe newIsolate, freshIsolate or spawnIsolate.

I wonder if those imply that it's an ordinary isolate with no limitations?

Is the isolates spawned first, out spawned synchronously when trying to fix the callback? Can you do more than one callback during the same native call? If so, does it use the same isolate, or does it spawn a new isolate per callback?

There's existing temporary isolate infrastructure in place for NativeCallable.listener, and we were planning to reuse that. That means we'll be spawning a new isolate each time one of these callbacks is invoked. But we were also considering adding an optimisation to cache the temporary isolates (this optimisation would affect both NativeCallable.listener and NativeCallable.onTemporaryIsolate).

So I'd like to avoid making any promises about whether it's a fresh isolate or not, because we may want to change the implementation in future.

dcharkes commented 1 month ago

So I'd like to avoid making any promises about whether it's a fresh isolate or not, because we may want to change the implementation in future.

Even if it's not fresh, we might want to not leak any state. (e.g. reset all global fields?)

I can't find a pattern in our names yet:

NativeCallable.listener: behaves like a listener - (One gotcha though is that it returns early before it has been run, leading to errors if arguments were on the stack. Which "listener" doesn't communicate.) (Maybe messageHandler would have signalled that more than listener, listeners are called synchronously usually)
NativeCallable.isolateLocal: must be on the same thread as the thread that is the mutator of the isolate. The native-callable is isolate local. The native callable should be used isolate local?

So one name is behaves like or should be used as, and the other name is more about what it is. I guess sticking closer to what it is makes more sense. A native callable is run, so the kind of native callable is how it is run.

NativeCallable.temporaryIsolate: the native callable is a temporary isolate or is run on temporary isolate? Communicates definitely no access to shared state. But does not communicate no async. Temporary isolate conceptually can have async.

Maybe the name doesn't really matter and we just pick something and add good documentation. 🤷

liamappelbe commented 1 month ago

NativeCallable.temporaryIsolate sgtm

lrhn commented 1 month ago

NativeCallable.listener: could be async or asEvent.
NativeCallable.isolateLocal: sync.

That exhausts the sync/async behaviors, so NativeCallable.temporaryIsolate is something else. Which it is, it does a synchronous callback to dart code by running it in parallel. It does that by using a temporary (possibly fresh) isolate.

About resetting globals, also needs to close all ports and clear the event queues, to make sure nothing survives. Then send onExit events. Basically what Isolate.exit does, just without merging the heap back in, so that heap can be reused. Then just resetting the globals may be cheaper than allocating a new isolate, but probably not by much, uf constants are already shared.

mraleph commented 1 month ago

I would strongly prefer we align semantics and naming with extensions we are exploring in context of shared memory multithreading proposal, so that we avoid introducing duplication and incurring additional migration / cognitive costs later.

Specifically, this means going for a variation of NativeCallable.shared which is described in the proposal.

Though I must admit that I don't like the name shared for callables because it is not descriptive enough - though I don't really see any good alternative. We need some word that describes callback which:

is not bound to a specific isolate, but is bound to a specific isolate group.
can't access isolate local state, but can access shared native memory.

There is no single word which clearly describes these capabilities and limitations.

dcharkes commented 1 month ago

Ah, this means it should be an error to access global fields completely in NativeCallable.temporaryIsolate. And not that all fields are reset before the next invocation.

@mraleph Could we theoretically support having unawaited futures in such code? E.g. would it be possible to have a message queue for the isolate group? (The native callable itself must be synchronous though due to the return value. So only unawaited futures.)

NativeCallable.notIsolateLocal? 🤣 It's synchronous like NativeCallable.isolateLocal, but is not local to an isolate. 😀

I like NativeCallable.shared, but not while it doesn't yet have access to any shared state. 🙈 But if we really only want to upgrade the semantics later, having users migrate to a new constructor name later is a churn.

liamappelbe commented 1 month ago

So ideally we'd have a name that allows us to implement this now using the existing temporary isolate infra, but later upgrade to the shared isolate proposal when shared memory lands.

How about NativeCallable.sharedIsolate? It's close enough to temporaryIsolate that it shouldn't be too confusing for users if we write good docs. It still implies that some sort of atypical isolate will be created/used, and that it'll be a different isolate to the one that creates/owns the NativeCallable.

mraleph commented 1 month ago

cc @aam @a-siva

lrhn commented 1 month ago

Ah, this means it should be an error to access global fields completely in NativeCallable.temporaryIsolate.

That would be non-constant fields at least. Constants are probably fine.

Good luck with that! :wink:
That would prevent you from using Uri.parse, RegExp, List.toString, Base64.decode and Object.hash. Might be a bit restritive and surprising. (And I definitely don't want it to become a breaking change to start using a static variable in some algorithm, so not only can we make no promises, we also won't make any future promises.)

Could we theoretically support having unawaited futures in such code?

You can't have futures at all. The first thing a Future constructor or method does is to read the global variable Zone._current. Same for almost any other async operation.

I have no real hope that we can find a way to introduce such a restricted Dart isolate, and still be able to run actual Dart code in it. There are too many places where code assumes that it can cache things in static variable, either storing them or just lazily creating a value the first time its needed. Uri can probably be fixed. Future/Zone definitely cannot. It's trivial for future changes to add code that would stop working in such a limited isolate, because it's perfectly valid Dart code that works everywhere else. Basically, whatever that isolate is, it's not Dart.

aam commented 1 month ago

Specifically, this means going for a variation of NativeCallable.shared which is described in the proposal.

How about NativeCallable.groupShared ?

but later upgrade to the shared isolate proposal when shared memory lands.

You can experiment with using shared memory for callback temp isolates now - it's available on dev/main under --experimental-shared-data flag. You can decorate fields with @pragma('vm:shared') and use only those from your callback. There is no enforcement regarding using non-shared data at the moment though. I think one boilerplate challenge that need to be solved is to how to get this shared data into regular isolates. Callbacks are called concurrently while regular isolates are doing their own work, so whatever data is flowing through the callbacks need to be stash or queued by callback, dequeued somehow by the regular isolate.

You can't have futures at all. The first thing a Future constructor or method does is to read the global variable Zone._current. Same for almost any other async operation.

The proposal talked about introducing shared-friendly versions of futures, streams, zones. But those won't mesh with normal futures if I understand correctly.

liamappelbe commented 1 month ago

Uri can probably be fixed. Future/Zone definitely cannot. It's trivial for future changes to add code that would stop working in such a limited isolate, because it's perfectly valid Dart code that works everywhere else. Basically, whatever that isolate is, it's not Dart.

These are great points. It sounds like if we want something like temporary isolate callbacks (before shared memory lands), we'll have to allow them to read and write static variables. There's no technical issue with that afaik, it would just be confusing for users if they expect those static variables to have the same value as they do on the original isolate, or to have persistent values between invocations. But that odd behavior would be preferable to not having access to all these core libraries, and adding static variables to an internal algorithm being a breaking change. Obviously we'd have to document it very clearly.

We'll still have this issue with shared memory, because the static variables would have to be declared as shared. So unless we go through and annotate every internal static variable in the core libraries as shared (which would have performance implications), we'll still run into all these issues @lrhn points out.

mraleph commented 1 month ago

@lrhn

Good luck with that! 😉 That would prevent you from using Uri.parse, RegExp, List.toString, Base64.decode and Object.hash. Might be a bit restritive and surprising. (And I definitely don't want it to become a breaking change to start using a static variable in some algorithm, so not only can we make no promises, we also won't make any future promises.) ... You can't have futures at all. The first thing a Future constructor or method does is to read the global variable Zone._current. Same for almost any other async operation.

I have no real hope that we can find a way to introduce such a restricted Dart isolate, and still be able to run actual Dart code in it.

Yep, and that's intentional. If you want full Dart semantics you need to have a proper Dart isolate (with an event loop and such). Doing async in a temporary isolate is a foot gun anyway - we need to define how it works and there are multiple different ways it could work (e.g. you can decide to block until all events are drained and ports are closed or you could decide that execution goes to the default thread pool). If we implicitly default to either of these we will have surprising behavior. So I don't want us to be implicit, I want APIs to be explicit and restrictive.

I have the same reservations about static state in general - I don't believe that behavior where you impliciltly get a fresh copy of the static state every time callback is called is any good.

Outside of async we can audit and tweak core libraries in a way that unlocks using things which do not require event loop. If needed by using shared memory capabilities not exposed to the Dart developer. It is not that much different from how we canonicalize constants across multiple isolates - this memory is shared but a regular Dart developer can't achieve the same thing.

But that odd behavior would be preferable to not having access to all these core libraries, and adding static variables to an internal algorithm being a breaking change.

I strongly disagree with this. I do not believe this odd behavior would be preferable.

I suggest to look at this from a different angle - we need to stop thinking about this as simply a Dart problem. It is an interop problem. When you write C code (which uses Dart VM C API) you can write callbacks which get executed on any thread but these callbacks can't touch the Dart world unless they explicitly enter a Dart isolate using Dart_EnterIsolate or they can use native ports to send data to Dart asynchronously. What I am suggesting here is effectively transporting this capability in Dart: you can write callbacks which can execute on any thread but can't touch full fledged Dart world directly unless you explicitly enter some isolate. You can also choose to simply send data to Dart using ports.

It is a fundamental capability from which many other capabilities (e.g. NativeCallable.listener) can be built. We need to be exposing simple explicit fundamental building blocks - not spend time supporting custom implicit special cases.

lrhn commented 1 month ago

If you want full Dart semantics

"Full Dart semantics" is a very wide concept. I think we can reasonably separate that into different grades of support.

Full platform library functionality and all language features available. ("The whole monty.")
Some groups of (library) functonality are disabled (maybe async, having no event loop, and probably isolates). That may affect language features too - if you can't create a Future, you can't call an async function.
Some language features disabled. (Deferred libraries? most likely. Static state? I don't believe it'll fly, but it's a well-delimited concept at least. Maybe some limits on static state and its initialization.)
Arbitrary platform code may fail to work. No guarantees made, test a lot and pray we don't change the deal. (Probably not usable, so this is the level of semantic support that I want to avoid reaching.)

you can write callbacks which can execute on any thread but can't touch full fledged Dart world directly unless you explicitly enter some isolate.

What worries me isn't the goal, but the details, where we draw the line between "non-isolate-compatible" code and "isolate-required" code. What can you do before entering an isolate? (Do you have a Dart library you can use to enter an isolate?)

We can obviously disable particular features (anything async, like running in a zone that throws on any attempt to schedule, maybe even on trying to access Zone.current. No zones available — except Zone.root which is a constant, whoops! We can disable any isolated feature of the platform libraries if we really want to, preferably in a way that doesn't have a branch overhead for non-restricted code. As long as no other code depends on the disabled feature for its implementation. That's something we can massage the libraries to work with. That's the "easy" part.

Not accessing static state is harder, basically impossible. That includes all non-constant top-level/static variables. I don't believe a total ban on access to static variables is viable, and I have no real idea how much such a ban would affect. Something like Endian.host is a lazy final static variable, meaning that ByteData wouldn't be usable. They're everywhere.

Let's assume we can somehow allow some of those, the ones that are really just caching a compute-on-demand result, lazy rather than mutable, so it won't matter if they're recomputed or not, and so it doesn't matter whether if they're reset on each call or not. It's indistinguishable, allowing us to reuse or reinitialize an isolate without changing any behavior.

Then there is internal static state (like the list used for cycle-detection in Iterable.toString). We can mark that as "safe" too, it doesn't matter if it's lost between executions. It's a temporary value. The Uri parser tables should be constant anyway. Encoding.getByName could use a const map (this would be an argument for not allowing user-added encoding names).

Maybe we could have a standard practice of the core platform libraries not containing persistent (visibly) mutable state, so that it's never possible to see if you're running against a fresh version or an existing one. (Except Zone.current, obviously, which is the canonical "global state" that you can attach other state to.) Seems very reasonable, since it avoids any interference between different libraries that are all modifying platform state. (Also no adding more URI scheme default ports then.)

Don't know if we can extend that to other platform libraries too. Interop-libraries have external state they access, they usually don't need their own. (The IOOverrides._global is an exception, and one of the reasons I don't like it. It should just be zone based - but that does link everything IO to async, so if we disable async and Zone.current, you can't do File("path").readAsStringSync() any more, which is a reasonable synchronous computation.)

We may be able to draw a line somewhere and ensure that most of the (non-async) platform libraries can work consistently in a restricted isolate, whether fresh or reused. I doubt we can make them work without access to statics at all.

Then there are user libraries, which won't ever consider that they'll be used in a restricted setting. They'd be prone to breaking arbitrarily. Maybe the only real guidance would then be "only run code you wrote yourself", and core platform libraries, except dart:async.

TL;DR: I'm worried the boundary between what works in a restricted context, and what doesn't, will be semi-arbitrary and unstable over time. It'll require real work to ensure that just the core platform libraries can work consistently, if at all.

dart-lang / sdk

[vm/ffi] `NativeCallable.onTemporaryIsolate` #54530