Open dcharkes opened 8 months ago
This has become more relevant recently, as a way of eliminating ffigen's generated ObjC code. Small ObjC trampolines are generated to fix https://github.com/dart-lang/native/issues/835, but if we had temp isolate callbacks we could write these trampolines in Dart.
Are we happy with the NativeCallable.onTemporaryIsolate
name? What about shortening it to NativeCallable.tempIsolate
?
I'm happy with the longer name, it's more clear. cc @lrhn for naming!
Cross linking the other open issue for NativeCallable.blocking
:
"Avoid abbreviations" is a style rule, so "temp" isn't stylistic.
I'd drop "temporary" completely.
Maybe newIsolate
, freshIsolate
or spawnIsolate
.
Is the isolates spawned first, out spawned synchronously when trying to fix the callback? Can you do more than one callback during the same native call? If so, does it use the same isolate, or does it spawn a new isolate per callback?
How will you prevent the call from starting async computations? It's presumably an isolate in the same isolate group, so you can't control access to dart:async
. Not that that would work since Future
is exported by dart:core
, and something like Finalizer
also triggers async callbacks.
Does the isolate do Isolate.exit
when the callback returns? (Then each callback will definitely need a separate isolate.)
I'd drop "temporary" completely. Maybe
newIsolate
,freshIsolate
orspawnIsolate
.
I wonder if those imply that it's an ordinary isolate with no limitations?
Is the isolates spawned first, out spawned synchronously when trying to fix the callback? Can you do more than one callback during the same native call? If so, does it use the same isolate, or does it spawn a new isolate per callback?
There's existing temporary isolate infrastructure in place for NativeCallable.listener
, and we were planning to reuse that. That means we'll be spawning a new isolate each time one of these callbacks is invoked. But we were also considering adding an optimisation to cache the temporary isolates (this optimisation would affect both NativeCallable.listener
and NativeCallable.onTemporaryIsolate
).
So I'd like to avoid making any promises about whether it's a fresh isolate or not, because we may want to change the implementation in future.
So I'd like to avoid making any promises about whether it's a fresh isolate or not, because we may want to change the implementation in future.
Even if it's not fresh, we might want to not leak any state. (e.g. reset all global fields?)
I can't find a pattern in our names yet:
NativeCallable.listener
: behaves like a listener - (One gotcha though is that it returns early before it has been run, leading to errors if arguments were on the stack. Which "listener" doesn't communicate.) (Maybe messageHandler would have signalled that more than listener, listeners are called synchronously usually)NativeCallable.isolateLocal
: must be on the same thread as the thread that is the mutator of the isolate. The native-callable is isolate local. The native callable should be used isolate local?So one name is behaves like or should be used as, and the other name is more about what it is. I guess sticking closer to what it is makes more sense. A native callable is run, so the kind of native callable is how it is run.
NativeCallable.temporaryIsolate
: the native callable is a temporary isolate or is run on temporary isolate? Communicates definitely no access to shared state. But does not communicate no async. Temporary isolate conceptually can have async.Maybe the name doesn't really matter and we just pick something and add good documentation. 🤷
NativeCallable.temporaryIsolate
sgtm
NativeCallable.listener
: could be async
or asEvent
.NativeCallable.isolateLocal
: sync
.That exhausts the sync/async behaviors, so NativeCallable.temporaryIsolate
is something else. Which it is, it does a synchronous callback to dart code by running it in parallel. It does that by using a temporary (possibly fresh) isolate.
About resetting globals, also needs to close all ports and clear the event queues, to make sure nothing survives. Then send onExit
events. Basically what Isolate.exit
does, just without merging the heap back in, so that heap can be reused. Then just resetting the globals may be cheaper than allocating a new isolate, but probably not by much, uf constants are already shared.
I would strongly prefer we align semantics and naming with extensions we are exploring in context of shared memory multithreading proposal, so that we avoid introducing duplication and incurring additional migration / cognitive costs later.
Specifically, this means going for a variation of NativeCallable.shared
which is described in the proposal.
Though I must admit that I don't like the name shared for callables because it is not descriptive enough - though I don't really see any good alternative. We need some word that describes callback which:
There is no single word which clearly describes these capabilities and limitations.
Ah, this means it should be an error to access global fields completely in NativeCallable.temporaryIsolate
. And not that all fields are reset before the next invocation.
@mraleph Could we theoretically support having unawaited futures in such code? E.g. would it be possible to have a message queue for the isolate group? (The native callable itself must be synchronous though due to the return value. So only unawaited futures.)
NativeCallable.notIsolateLocal
? 🤣 It's synchronous like NativeCallable.isolateLocal
, but is not local to an isolate. 😀
I like NativeCallable.shared
, but not while it doesn't yet have access to any shared state. 🙈 But if we really only want to upgrade the semantics later, having users migrate to a new constructor name later is a churn.
So ideally we'd have a name that allows us to implement this now using the existing temporary isolate infra, but later upgrade to the shared isolate proposal when shared memory lands.
How about NativeCallable.sharedIsolate
? It's close enough to temporaryIsolate
that it shouldn't be too confusing for users if we write good docs. It still implies that some sort of atypical isolate will be created/used, and that it'll be a different isolate to the one that creates/owns the NativeCallable
.
cc @aam @a-siva
Ah, this means it should be an error to access global fields completely in
NativeCallable.temporaryIsolate
.
That would be non-constant fields at least. Constants are probably fine.
Good luck with that! :wink:
That would prevent you from using Uri.parse
, RegExp
, List.toString
, Base64.decode
and Object.hash
. Might be a bit restritive and surprising. (And I definitely don't want it to become a breaking change to start using a static variable in some algorithm, so not only can we make no promises, we also won't make any future promises.)
Could we theoretically support having unawaited futures in such code?
You can't have futures at all. The first thing a Future
constructor or method does is to read the global variable Zone._current
. Same for almost any other async operation.
I have no real hope that we can find a way to introduce such a restricted Dart isolate, and still be able to run actual Dart code in it. There are too many places where code assumes that it can cache things in static variable, either storing them or just lazily creating a value the first time its needed.
Uri
can probably be fixed. Future
/Zone
definitely cannot. It's trivial for future changes to add code that would stop working in such a limited isolate, because it's perfectly valid Dart code that works everywhere else. Basically, whatever that isolate is, it's not Dart.
Specifically, this means going for a variation of NativeCallable.shared which is described in the proposal.
How about NativeCallable.groupShared
?
but later upgrade to the shared isolate proposal when shared memory lands.
You can experiment with using shared memory for callback temp isolates now - it's available on dev/main under --experimental-shared-data
flag. You can decorate fields with @pragma('vm:shared')
and use only those from your callback. There is no enforcement regarding using non-shared data at the moment though.
I think one boilerplate challenge that need to be solved is to how to get this shared data into regular isolates. Callbacks are called concurrently while regular isolates are doing their own work, so whatever data is flowing through the callbacks need to be stash or queued by callback, dequeued somehow by the regular isolate.
You can't have futures at all. The first thing a Future constructor or method does is to read the global variable Zone._current. Same for almost any other async operation.
The proposal talked about introducing shared-friendly versions of futures, streams, zones. But those won't mesh with normal futures if I understand correctly.
Uri
can probably be fixed.Future
/Zone
definitely cannot. It's trivial for future changes to add code that would stop working in such a limited isolate, because it's perfectly valid Dart code that works everywhere else. Basically, whatever that isolate is, it's not Dart.
These are great points. It sounds like if we want something like temporary isolate callbacks (before shared memory lands), we'll have to allow them to read and write static variables. There's no technical issue with that afaik, it would just be confusing for users if they expect those static variables to have the same value as they do on the original isolate, or to have persistent values between invocations. But that odd behavior would be preferable to not having access to all these core libraries, and adding static variables to an internal algorithm being a breaking change. Obviously we'd have to document it very clearly.
We'll still have this issue with shared memory, because the static variables would have to be declared as shared. So unless we go through and annotate every internal static variable in the core libraries as shared (which would have performance implications), we'll still run into all these issues @lrhn points out.
@lrhn
Good luck with that! 😉 That would prevent you from using
Uri.parse
,RegExp
,List.toString
,Base64.decode
andObject.hash
. Might be a bit restritive and surprising. (And I definitely don't want it to become a breaking change to start using a static variable in some algorithm, so not only can we make no promises, we also won't make any future promises.) ... You can't have futures at all. The first thing aFuture
constructor or method does is to read the global variableZone._current
. Same for almost any other async operation.I have no real hope that we can find a way to introduce such a restricted Dart isolate, and still be able to run actual Dart code in it.
Yep, and that's intentional. If you want full Dart semantics you need to have a proper Dart isolate (with an event loop and such). Doing async in a temporary isolate is a foot gun anyway - we need to define how it works and there are multiple different ways it could work (e.g. you can decide to block until all events are drained and ports are closed or you could decide that execution goes to the default thread pool). If we implicitly default to either of these we will have surprising behavior. So I don't want us to be implicit, I want APIs to be explicit and restrictive.
I have the same reservations about static state in general - I don't believe that behavior where you impliciltly get a fresh copy of the static state every time callback is called is any good.
Outside of async we can audit and tweak core libraries in a way that unlocks using things which do not require event loop. If needed by using shared memory capabilities not exposed to the Dart developer. It is not that much different from how we canonicalize constants across multiple isolates - this memory is shared but a regular Dart developer can't achieve the same thing.
But that odd behavior would be preferable to not having access to all these core libraries, and adding static variables to an internal algorithm being a breaking change.
I strongly disagree with this. I do not believe this odd behavior would be preferable.
I suggest to look at this from a different angle - we need to stop thinking about this as simply a Dart problem. It is an interop problem. When you write C code (which uses Dart VM C API) you can write callbacks which get executed on any thread but these callbacks can't touch the Dart world unless they explicitly enter a Dart isolate using Dart_EnterIsolate
or they can use native ports to send data to Dart asynchronously. What I am suggesting here is effectively transporting this capability in Dart: you can write callbacks which can execute on any thread but can't touch full fledged Dart world directly unless you explicitly enter some isolate. You can also choose to simply send data to Dart using ports.
It is a fundamental capability from which many other capabilities (e.g. NativeCallable.listener
) can be built. We need to be exposing simple explicit fundamental building blocks - not spend time supporting custom implicit special cases.
If you want full Dart semantics
"Full Dart semantics" is a very wide concept. I think we can reasonably separate that into different grades of support.
Future
, you can't call an async
function.you can write callbacks which can execute on any thread but can't touch full fledged Dart world directly unless you explicitly enter some isolate.
What worries me isn't the goal, but the details, where we draw the line between "non-isolate-compatible" code and "isolate-required" code. What can you do before entering an isolate? (Do you have a Dart library you can use to enter an isolate?)
We can obviously disable particular features (anything async, like running in a zone that throws on any attempt to schedule, maybe even on trying to access Zone.current
. No zones available — except Zone.root
which is a constant, whoops!
We can disable any isolated feature of the platform libraries if we really want to, preferably in a way that doesn't have a branch overhead for non-restricted code. As long as no other code depends on the disabled feature for its implementation. That's something we can massage the libraries to work with.
That's the "easy" part.
Not accessing static state is harder, basically impossible. That includes all non-constant top-level/static variables.
I don't believe a total ban on access to static variables is viable, and I have no real idea how much such a ban would affect. Something like Endian.host
is a lazy final static variable, meaning that ByteData
wouldn't be usable.
They're everywhere.
Let's assume we can somehow allow some of those, the ones that are really just caching a compute-on-demand result, lazy rather than mutable, so it won't matter if they're recomputed or not, and so it doesn't matter whether if they're reset on each call or not. It's indistinguishable, allowing us to reuse or reinitialize an isolate without changing any behavior.
Then there is internal static state (like the list used for cycle-detection in Iterable.toString
). We can mark that as "safe" too, it doesn't matter if it's lost between executions. It's a temporary value.
The Uri
parser tables should be constant anyway.
Encoding.getByName
could use a const map (this would be an argument for not allowing user-added encoding names).
Maybe we could have a standard practice of the core platform libraries not containing persistent (visibly) mutable state, so that it's never possible to see if you're running against a fresh version or an existing one.
(Except Zone.current
, obviously, which is the canonical "global state" that you can attach other state to.)
Seems very reasonable, since it avoids any interference between different libraries that are all modifying platform state. (Also no adding more URI scheme default ports then.)
Don't know if we can extend that to other platform libraries too. Interop-libraries have external state they access, they usually don't need their own. (The IOOverrides._global
is an exception, and one of the reasons I don't like it. It should just be zone based - but that does link everything IO to async, so if we disable async
and Zone.current
, you can't do File("path").readAsStringSync()
any more, which is a reasonable synchronous computation.)
We may be able to draw a line somewhere and ensure that most of the (non-async) platform libraries can work consistently in a restricted isolate, whether fresh or reused. I doubt we can make them work without access to statics at all.
Then there are user libraries, which won't ever consider that they'll be used in a restricted setting. They'd be prone to breaking arbitrarily.
Maybe the only real guidance would then be "only run code you wrote yourself", and core platform libraries, except dart:async
.
TL;DR: I'm worried the boundary between what works in a restricted context, and what doesn't, will be semi-arbitrary and unstable over time. It'll require real work to ensure that just the core platform libraries can work consistently, if at all.
We might never build this, but let's have a tracking bug to point to.
https://api.dart.dev/stable/3.2.4/dart-ffi/NativeCallable/NativeCallable.listener.html enables async callbacks that return immediately upon calling and schedule the callback to be run on the Dart event loop of the right isolate. No return values can be returned.
https://api.dart.dev/stable/3.2.4/dart-ffi/NativeCallable/NativeCallable.isolateLocal.html enables sync callbacks with return values. The current thread must be the one entered in the target isolate.
If we would like to be able to run Dart code synchronously and get a return value, but we don't have access to a Dart isolate in the current thread, we could conceivably spawn a new temporary isolate.
This would incur some serious limitations:
We're currently not exploring this. @liamappelbe
A way more general approach would be to have shared memory multithreading: