Open conrad-watt opened 4 months ago
The big design question here is whether the difference between shared-suspendable
and shared-fixed
is reflected only in the validation contexts for the functions or also in the function types.
If the two are not differentiated at the type level, then there would be no way to statically disallow indirect calls from shared-fixed
to shared-suspendable
functions. That would be fine as long as shared-fixed
functions had the dynamic semantics of implicitly setting a shared-barrier
for the duration of their execution. We would be using a runtime check instead of the type system to ensure that shared-fixed
function frames are never captured in shared continuations.
Alternatively, if we do differentiate at the type level, we can statically disallow shared-fixed
functions from calling shared-suspendable
functions outside an explicit shared-barrier
. This would be kind of annoying because the different kinds of shared functions would not be interchangeable, but at least a shared-suspendable
function could be adapted to be a shared-fixed
function by a wrapper that explicitly set up a shared-barrier
and called the underlying function.
Personally, I think the former option, where we do not distinguish at the type level, is more attractive. Having to have wrapper functions and explicit shared-barriers to achieve the same runtime semantics and function interoperability is a bunch of complexity and code size for no benefit AFAICT.
If the two are not differentiated at the type level, then there would be no way to statically disallow indirect calls from
shared-fixed
toshared-suspendable
functions. That would be fine as long asshared-fixed
functions had the dynamic semantics of implicitly setting ashared-barrier
for the duration of their execution. We would be using a runtime check instead of the type system to ensure thatshared-fixed
function frames are never captured in shared continuations.
For this reason, my intuition is that we will need differentiation at the type level. Based on @rossberg's sketch here I believe that such a barrier would carry an eager runtime cost (at least setting a bit in the stack). This would mean that even code just compiling to shared-fixed
would be paying a price "just in case" some shared-suspendable
code gets into an indirect call. An even worse scenario would be if the barrier isn't initially implemented when shared-fixed
is introduced, so that the later introduction of shared-suspendable
at the language level would require existing shared-fixed
code to regress in performance.
Having to have wrapper functions and explicit shared-barriers to achieve the same runtime semantics and function interoperability is a bunch of complexity and code size for no benefit AFAICT.
I think the benefit would be that code just using shared-fixed
wouldn't need to incur a runtime overhead to defend against the possibility of a shared-suspendable
indirect call. If one wants to go from shared-fixed
->shared-suspendable
, one needs to explicitly add the barrier/handler instructions that capture the overhead of setting the necessary bits in the stack/setting up the suspend handler.
Actually, now I'm wondering if the same argument about overheads applies to nonshared
->shared-suspendable
calls. This may mean that a type-level distinction between shared-fixed
and shared-suspendable
is actually valuable even if we do find a way to have thread-local functions, so as to allow code just doing nonshared
->shared-fixed
calls to avoid unnecessary overheads.
EDIT: IMO the variant I discuss in the OP with a resume_shared-barrier
instruction would be a cleaner solution than a block-level shared-barrier
which would require different validation rules in its body.
Actually, I realise there's an alternative design that may make more sense. Instead of allowing only shared-suspendable
->shared-fixed
calls, allow only shared-fixed
->shared-suspendable
calls. Now when a shared-suspend
happens in a shared-suspendable
function, it can just search for the first handler. If the handler is for a shared
continuation, it's known that all the frames in between are shared-suspendable
(because shared-resume
can only happen on a shared-suspendable
function). If the handler is for an nonshared
continuation, trap. Calling from shared-suspendable
to shared-fixed
is allowed only through a nonshared-resume
handler (which would cause shared-suspend
in subsequent shared-suspendable
frames to trap).
I think this might fit the existing model of stack switching better, where functions that may suspend can still be called even without a handler, but attempting to actually suspend just traps. It would also allow nonshared
->shared-suspendable
calls just fine. In my OP design, a shared-suspendable
function can only be entered from other Wasm if at least one handler is created, which doesn't seem consistent with the unshared case.
because
shared-resume
can only happen on ashared-suspendable
function
I don't think there's any reason to disallow shared continuations from being resumed from shared-fixed
or even unshared functions, right? It's just like calling a shared-suspendable
function.
Sorry, I meant that shared-resume
can only act on a shared
continuation, which must have been created from a shared-suspendable
function. I agree that the execution of shared-resume
could occur within the body of a shared-fixed
or nonshared
function.
The fact that disallowing suspendable->fixed
calls seems reasonable and that separately disallowing fixed->suspendable
calls seems reasonable reinforces my belief that doing neither would be better :)
Now when a
shared-suspend
happens in ashared-suspendable
function, it can just search for the first handler
If we use a "zero-cost" shared-barrier
implementation where it acts like a handler rather than proactively setting a bit, then this search can find shared-barrier
just as well.
The fact that disallowing suspendable->fixed calls seems reasonable and that separately disallowing fixed->suspendable calls seems reasonable reinforces my belief that doing neither would be better :)
It seems one or the other is needed, because the bad case is a call stack of the form
shared-suspendable
(with handler) -> shared-fixed
-> shared-suspendable
(with suspend instruction)
We need to make sure one way or the other that the middle shared-fixed
frame can't be captured in a shared continuation. It seems like the natural way to do this is to require that at least one of the transitions can only be done through a handler, instead of a regular call, so that attempts to do a shared-suspend
can be caught. Currently I think restricting the shared-suspendable
->shared-fixed
direction makes more sense.
If we use a "zero-cost"
shared-barrier
implementation where it acts like a handler rather than proactively setting a bit, then this search can findshared-barrier
just as well.
If we expect shared-barrier
to be implemented by implicitly turning all the calls in its body that cross the suspendable
-fixed
boundary into handlers, I think it would be better to require explicit handler instructions instead.
I had a chance to think about this some more. If we start out by assuming that every shared-barrier
must be made explicit and that shared-suspendable and shared-fixed are separate types, then this is what we get:
shared-barrier
to avoid the non-shared frame from being captured.shared-barrier
to avoid the shared-fixed frame from being captured. This requirement is mandatory; it is not enough to require shared-suspendable to shared-fixed calls to be in a shared-barrier
instead because that does not ensure safety when a shared-fixed function calls a shared-suspendable function that initiates a shared suspension.shared-barrier
because shared-fixed functions cannot successfully initiate shared suspensions (why not? If due to validation, that would inhibit inlining shared-suspendable into shared-fixed, so preferably they would trap. But due to what shared-barrier
would they trap?) and if they call back into a shared-suspendable function, the previous rule must apply so there is no need for a further shared-barrier
.shared-barrier
and calls the shared-suspendable target.func.bind
or some similar mechanism is required to set up a shared-barrier
and call the underlying bound function indirectly.func.bind
or similar), each producer will have to exclusively use one or the other, meaning it would be impossible for a producer to support work-stealing and non-shared function parameters simultaneously. This seems bad, but if all non-shared JS objects are wrapped as shared thread-bound data, maybe it can be ok.shared-barrier
can be implemented eagerly or lazily. Since it is always explicit, it can't affect performance implicitly.On the other hand, if we make all the required shared-barrier
s implicit and do not distinguish between shared-fixed and shared-suspendable in the type system:
shared-barrier
s by making the entire bodies of non-shared and shared-fixed function implicitly be shared-barrier
s.shared-barrier
similar to exception handling to avoid paying a performance cost on every call to a non-shared or shared-fixed function. This seems ok.shared-barrier
for transient non-shared access in shared-suspendable functions and to support inlining shared-fixed functions into shared-suspendable functions. Inlining in the other direction would work because we would allow shared suspensions to be initiated in non-shared and shared-fixed functions; they would just unconditionally trap because they would necessarily be inside the implicit shared-barrier
bodies of those functions.Sorry for the wall of text. We should probably move on to a live discussion soon.
Just a few points to add:
explicit case
Note that instead of having a block-level shared barrier instruction, it's possible to instead have a call-level barrier instruction (essentially the resume_barrier
instruction I sketched in the OP). I think all of the observations above translate directly to this alternative. My intuition says the call variant would be less controversial.
Also, as I sketched here one can instead restrict the shared-suspendable
->shared-fixed
direction, which may be more natural. Especially if the concern is "the direction that's restricted becomes hard to inline", it's more ok for shared-suspendable
->shared-fixed
to be slow as this direction is likely less performance-critical: because of the restrictions on shared-suspendable
, one can't actually call any shared-fixed
functions that really have nonshared parameters.
One other issue with restricting the shared-fixed
->shared-suspendable
direction: it may make calls from JS directly into shared-suspendable
Wasm slow (morally, JS is also "fixed" so needs similar guards). At least with the shared-suspendable
->shared-fixed
direction, things only get slower if your code transitions from "fixed"->"suspendable"->"fixed", which we might consider less likely.
Since shared-suspendable and shared-fixed functions cannot be mixed at indirect call sites (without
func.bind
or similar), each producer will have to exclusively use one or the other, meaning it would be impossible for a producer to support work-stealing and non-shared function parameters simultaneously. This seems bad, but if all non-shared JS objects are wrapped as shared thread-bound data, maybe it can be ok.
It's hard for me to see how a producer could actually support work-stealing and non-shared function parameters simultaneously even in the most optimistic case. I'd bet that "shared-suspendableness" would infect almost every non-trivial function, unless there's a strict static partition at the source/language runtime level, in which case static annotations in Wasm are still ok. I'd even bet that this problem would happen in the implicit case (i.e. most calls would just start trapping if any clever partition were attempted).
EDIT: and I should emphasise again that this is why I still think we push for thread-local functions. If we believe work-stealing is going to be real in the future, we're just kicking the can down the road until then, and complicating the language in the meantime.
implicit case
Instead, we can trivially ensure all non-shared/shared-fixed to shared-suspendable calls are inside
shared-barriers
by making the entire bodies of non-shared and shared-fixed function implicitly beshared-barriers
.
I'd like to understand more explicitly how you'd plan to distinguish nonshared
from shared-fixed
from shared-suspendable
without type system extensions. I can imagine a semantics where nonshared
, shared-fixed
, and shared-suspendable
are bits that live on the dynamic function instance, purely to enable a dynamic trapping semantics for shared-barrier
(implicit or explicit). Instinctively this seems a little unfortunate to me, since the bit is very close to a type system extension, just by swapping the dynamic trapping semantics on shared-barrier
for a static check.
I also don't have a clear view of how the dynamic check semantics avoids regressing every existing "fixed" function call. Morally it seems like inserting an extra try-catch
into every function at the language level, which I wouldn't expect to be costless.
This would require a lazy implementation of
shared-barrier
similar to exception handling to avoid paying a performance cost on every call to a non-shared or shared-fixed function. This seems ok.
Can you expand on how this works currently for exception handling? This may be the piece I'm missing. I'd expect at least a penalty in compilation time and/or cache effects/branch prediction.
Actually, I realise there's an alternative design that may make more sense. Instead of allowing only
shared-suspendable
->shared-fixed
calls, allow onlyshared-fixed
->shared-suspendable
calls. Now when ashared-suspend
happens in ashared-suspendable
function, it can just search for the first handler. If the handler is for ashared
continuation, it's known that all the frames in between areshared-suspendable
(becauseshared-resume
can only happen on ashared-suspendable
function). If the handler is for annonshared
continuation, trap. Calling fromshared-suspendable
toshared-fixed
is allowed only through anonshared-resume
handler (which would causeshared-suspend
in subsequentshared-suspendable
frames to trap).
Yes, this is what I had originally envisioned. I had imagined that producers who wanted to use shared-continuations would choose the 'shared-suspendable' type for all of the functions they generate for source language functions, as all of their source language types are likely shared and so the strictest semantics are not an issue. For calling out to JS for local host functions, they would need to perform the barrier at those points.
@tlively
non-shared to shared-suspendable calls must be within a shared-barrier to avoid the non-shared frame from being captured.
That would require having a sequence of A: [shared-suspendable] -> [non-shared] -> B: [shared-suspendable]
with a shared-continuation handler in A and a shared suspend in B. But because shared (of any kind) cannot call non-shared, this cannot happen.
Since shared-suspendable and shared-fixed functions cannot be mixed at indirect call sites (without func.bind or similar), each producer will have to exclusively use one or the other, meaning it would be impossible for a producer to support work-stealing and non-shared function parameters simultaneously. This seems bad, but if all non-shared JS objects are wrapped as shared thread-bound data, maybe it can be ok.
Agreed, for producers using shared-continuations, non-shared function parameters can't be used. As I sketched in #42, I believe that we could support a scheme where non-shared context locals and the shared-barrier can be used to access non-shared state inside shared continuations.
I also wonder if we could mix these functions at indirect call sites by having shared-suspendable <: shared-fixed
. Shared suspendable has a proper subset of runtime semantics of shared fixed. When doing an indirect call to an unknown (either fixed/suspendable) function, a barrier might need to be done. But if the function type is known to be suspendable, the barrier could be avoided.
@conrad-watt
Can you expand on how this works currently for exception handling? This may be the piece I'm missing. I'd expect at least a penalty in compilation time and/or cache effects/branch prediction.
At least for SM, we implement catch
lookup by walking the stack and performing metadata lookup based off of return addresses in stack frames to find which catch handler a call site was in when an exception happens. The advantage is that going into a try
block is mostly free at runtime (catch blocks do add control flow edges to handle rejoining from exception paths which can inhibit some regalloc opts, but you can't avoid that). But it's pretty slow in the case that we do actually throw an exception.
non-shared to shared-suspendable calls must be within a shared-barrier to avoid the non-shared frame from being captured.
That would require having a sequence of
A: [shared-suspendable] -> [non-shared] -> B: [shared-suspendable]
with a shared-continuation handler in A and a shared suspend in B. But because shared (of any kind) cannot call non-shared, this cannot happen.
The situation I have in mind is just [non-shared] -> [shared-suspendable]
, under the assumption that this kind of call is allowed by analogy to how non-shared functions are allowed to access other shared module items like tables and globals.
Hmm, I'm not sure I follow without seeing where the handler/suspend are in that situation. It also seems like this would be a problem even if we don't split up the function types (as it doesn't involve shared-fixed at all)?
The shared suspension is initiated in the shared-suspendable
frame. I wasn't thinking that there would necessarily be a handler, but that we would still want to trap as soon as we find ourselves in a non-shared frame during the suspension. If you argue that that's unnecessary because there cannot possibly be a handler and we'll trap anyway, then the example as I was thinking of it doesn't work. I was assuming an invariant that the semantics should never have a stack walk for a shared suspension traverse a non-shared frame because that makes safety provable with more local reasoning.
I also wouldn't rule out [shared-suspendable] -> [non-shared] -> [shared-suspendable]
via thread-bound or thread-local function machinery, although then you're back to the case where putting the barrier on either edge would work unless you're assuming the invariant I had in mind.
I also wouldn't rule out
[shared-suspendable] -> [non-shared] -> [shared-suspendable]
via thread-bound or thread-local function machinery, although then you're back to the case where putting the barrier on either edge would work unless you're assuming the invariant I had in mind.
That's interesting, I guess with thread-local functions in the proposal as-is we already could have a call stack shared-suspendable -> non-shared -> shared-suspendable
and would need the thread-local function to act as the shared-barrier. So engines will need some feature like this under-the-hood either way? Host JS functions are similar, they just block all suspending.
During our discussion on https://github.com/WebAssembly/shared-everything-threads/issues/42, we discussed that a "safety valve" decision for JS function access, if we can't reach consensus on (strong/weak) thread-local functions, would be to (re)introduce a version of
shared
function that cannot have its execution suspended as part of a (hypothetical) shared continuation.Currently our design doesn't permit
nonshared
parameters toshared
functions in order to be forward-compatible with shared continuations, which would allow such anonshared
object to be smuggled into another thread by suspending execution and resuming in another thread. By forbidding such suspensions, we could allownonshared
parameters, and thus pass in a JS context (e.g. a struct containing unshared references to JS functions) as a regular parameter that would be threaded through execution (or a context local as sketched here), giving a mechanism to call JS functions fromshared
Wasm functions.This issue is to discuss the design implications of this approach. A few initial points:
shared-suspendable
semantics with some mechanism for thread-local functions (weak if necessary) over the below. That being said, I don't think the below is awful, and it seems less controversial from a GC engineering perspective.Note that, if we still believe that shared continuations will eventually exist, the below approach doesn't permanently solve our current problems, but instead pushes them into the future. Any compilation scheme wanting to use shared continuations will need to mark most functions as
shared-suspendable
, and so for such a scheme we'd still need to solve the same problem of JS access that we have in the current design (e.g. by introducing thread-local functions).Design Sketch
Terminology
For the purposes of this discussion, I'm going to refer to functions as being either
nonshared
,shared-suspendable
, orshared-fixed
(a different name forshared-nonsuspendable
, which is a mouthful). The distinction between these functions would be enforced by a static annotation on the function type.nonshared
functions are what we have today. Remember that in general,shared
things can't capturenonshared
things.shared-suspendable
functions are the "fully shared" functions we've been discussing before this point:nonshared
type (globals/tables/functions)nonshared
parameters/localsnonshared
references in the body (e.g.struct.new
)shared
continuationsshared-fixed
functions are somewhat more relaxed:nonshared
type (globals/tables/functions)nonshared
parameters/locals are allowednonshared
references in the body (e.g.struct.new
) are allowedshared
continuation, but can be part of anonshared
continuationIntuitively, the call stack of a
shared-suspendable
function could be captured as part of a shared continuation, and resumed in another thread. This means it's not safe for the frame to ever capture anonshared
object, even transiently. In contrast,shared-fixed
function calls are guaranteed to stay in the same thread for the entire duration of their execution. Therefore it's safe to pass innonshared
objects as parameters, and materialise them during the function call's execution. Prior to the standardisation of shared continuations, onlyshared-fixed
functions would be definable.Restrictions on calling
(EDIT: see this comment for an alternative approach with different restrictions)
shared-suspendable
functions can always callshared-fixed
functions with no restrictions. The extent to whichshared-fixed
functions can callshared-suspendable
functions depends on some design decisions of stack switching. By default, all forms ofshared-fixed
->shared-suspendable
call would be disallowed by validation (implying annotations/type tracking of[non]suspendable
on relevant call instructions).If the stack switching proposal includes a lexical barrier instruction (e.g. see here), it seems feasible to also include a concept of a "shared-only" barrier which traps upon an attempt to capture a shared continuation, but not a non-shared one. All forms of call to
shared-suspendable
functions could be allowed inside the body of this barrier. On reflection, I don't think that ashared-fixed
call should implicitly introduce such a barrier, since this would mean that everyshared-fixed
call would have the implicit overhead of "Check if I'm in a continuation and if so, set the barrier bit". Instead I think the shared-only barrier should always be explicit (and would switch validation fromshared-fixed
mode toshared-suspendable
mode within its body). @rossberg please correct me if I'm wrong about the above.A
shared-fixed
function could also call ashared-suspendable
function by wrapping the latter as a shared continuation and using a hypotheticalresume_barrier
instruction (as sketched here https://github.com/WebAssembly/stack-switching/issues/44#issuecomment-1909545807). If a "shared-only" barrier proves infeasible, this would be the only way to make ashared-fixed
->shared-suspendable
call.Note that in either case, it's still ok for a
shared-fixed
function to hold a reference to ashared-suspendable
function; only calling is complicated. This means that we don't need to distinguish between different kinds ofshared
for tables and globals - the[non]suspendable
distinction is only needed for callable things.