Closed tlively closed 3 months ago
The typing for
suspend
is very similar.C ⊢ suspend x : [t1* (ref null x)] -> [t2* rt] -- C.TYPES[x] = prompt t1* (ref null? y) -- C.TYPES[y] = cont t2* rt
How do you ensure that the prompt
given as input to suspend
actually comes from a parent stack? Since it's a first-class value, it seems like it could be smuggled (e.g. through globals) out of another call stack. I remember we discussed a while ago (2022 in-person meeting?) that this verification process often ends up being a linear search.
I've also been thinking about how to leverage the commitment to a "reserved register/stack slot" for the counter/encoded JS handlers from bag-of-stacks. If the typed continuations proposal were restricted to a single kind of signal and just two handlers - "suspend" or "trap", then I think it would be possible to use the reserved register to propagate a pointer to the most recent handler. One never needs to worry about daisy-chaining of parent handlers since a signal is guaranteed never to automatically propagate out of the "suspend" or "trap" handler. Maybe this proposal is expressive enough. However if a user did need a signal to propagate further up the stack, they'd have to mock this in userspace, which might be inefficient (I think there are analogous frictions with implementing a source language's delimited continuations using bag-of-stacks).
If we wanted to think about facilitating the existing typed continuations proposal with rich handlers using a similar strategy, then as we discussed here the remaining issue is dealing with daisy-chaining of handlers. I think this could potentially be accomplished by treating the necessary references to parent handlers almost like additional local variables in the relevant call stack.
So if a function has a maximum depth of N handlers in its body, then on initial entry into the function, the call stack needs to reserve N extra slots for handler pointers. You always have a pointer to the most recent handler (potentially from the parent function) in your "reserved register", and when you enter the Nth handler within the function, you put the pointer to this previous handler in the Nth reserved stack slot, and the pointer to the new current handler in the "reserved register". To allow N to be known ahead-of-time in the streaming compilation tier, function headers might need to be extended with a predeclaration of "max handler depth", which could potentially live in the local variable encoding space (although you could get away without this predeclaration, with enough tricks - I imagine the existing exception handling instructions have similar considerations).
I think what you are describing is what's usually called named handlers. There is a sketch of how that could look with typed continuations in the explainer.
(What we previously meant with continuations without search was a bit different: suspension would still have an implicit handler, but it's always the innermost one, i.e., the one for the current stack. If that cannot handle the event then it doesn't continue trying the next one but simply traps. That's roughly how e.g. Wasmtime fibers work, though of course they aren't safe.)
But as @conrad-watt hints at, as soon as you have prompts — or any form stack chaining, for that matter, like with the separated dynamic scoping feature you presented — you'll need a linear check to guarantee safety, either when switching up (suspending) or when switching down (resuming). We can shift the cost to around in an implementation but we can't avoid it. The linear check is simply the price we have to pay for safety in Wasm.
Plus, if you also have undirected (symmetric) switches, like with bag-o-stacks, then those have to perform the linear check every time, because they lack the structure to establish additional invariants. That is, you'll end having to check twice as often as when using asymmetric primitives.
(And of course, once you are there, the only presumed advantage of bag-of-stacks is out the window. Furthermore, if a linear check has to be paid for anyway, then extending it to search is almost free. In particular, since not building that in has other complexity costs: to be able to implement it in user space, we'd need to support (a) a form of catch-all handler and (b) a "resuspend" instruction. Overall, that ends up being both more complicated and less efficient.)
How do you ensure that the
prompt
given as input tosuspend
actually comes from a parent stack?
I don’t have anything better than linear search. It’s interesting (and unfortunate) that we can’t seem to achieve the nice composability properties of asymmetrical coroutines without it.
If the typed continuations proposal were restricted to a single kind of signal and just two handlers…
This seems extremely limiting, though. What type of value would be passed to the single non-trap handler? I’m not too sure of the benefit either; IIUC, this would require a user space linear search comprising multiple suspends/switches in all the same places other designs would have the engine do a linear search ending in just one switch.
How do you ensure that the
prompt
given as input tosuspend
actually comes from a parent stack?I don’t have anything better than linear search. It’s interesting (and unfortunate) that we can’t seem to achieve the nice composability properties of asymmetrical coroutines without it.
I think we need to be a bit careful in order to disentangle different concerns here.
We can have asymmetric coroutines without the linear search. Indeed, plain asymmetric coroutines as in Lua or Wasmtime Fiber don't build in a search. What they offer instead is much like what @conrad-watt and @rossberg are alluding to. Plain asymmetric coroutines ensure that the innermost handler must handle everything. So if you want to simulate a linear search for a handler then you have to implement explicit forwarding (by analogy, imagine a version of exception handlers where exceptions are always handled by the innermost handler and how you would implement general exception handlers on top of that). This is exactly how the current implementation of WasmFX on top of Wasmtime Fiber works (though this is certainly not the most efficient implementation strategy).
One source of confusion in all of these discussions is that there are different notions of composability at play here. Let me briefly describe the two most relevant ones.
What asymmetric coroutines provide that symmetric coroutines don't is composability with built-in side-effects - the existing built-in side-effects essentially "just work" without the need to generate lots of extra plumbing everywhere. Examples where this is relevant to Wasm include exceptions, calling back-and-forth between JS and Wasm (https://github.com/WebAssembly/stack-switching/issues/49), and generating meaningful stack traces in the presence of stack-switching (@frank-emrich will say a little about this last point at the in-person meeting next week).
Effect handlers, as exemplified by WasmFX (aka typed continuations) are a refinement of asymmetric coroutines, that introduce a richer notion of composability. As well as supporting composition with built-in effects (by virtue of being a form of asymmetric coroutines), they support composition of multiple user-defined features, for instance, lightweight threads and generators. This form of composition is where nested handlers (and a potentially linear cost for searching for a handler) come in. In practice in existing systems like OCaml 5, which uses effect handlers for high-performance stack-switching, this kind of composition is used sparingly and handler stacks are typically shallow.
I think what you are describing is what's usually called named handlers. There is a sketch of how that could look with typed continuations in the explainer.
I think this is essentially right, except the version @tlively described is restricted so that each handler can handle only one kind of event.
Closing this since this idea does not avoid linear runtime and we have settled on a design to move forward with.
I'm definitely not proposing any changes to either proposed design, but I wanted to share this idea for discussion. I haven't talked with anyone about this yet.
Our discussion in #49 about how asymmetric coroutines are fundamentally more expressive than symmetric coroutines with respect to their interaction with JS got me thinking about how we could combine those benefits with the search-free switching ideas from the bag-o-stacks proposal. I know @slindley and others have emphasized in the past that it would be possible to change the design of typed continuations to avoid the need for handler search, but I haven't seen concrete details of what that could look like until now.
The main idea is that instead of a single
stack
type, we differentiate types for switching "up" the stack of stacks (i.e. suspending) and types for switching "down" the stack of stacks (i.e. resuming). Let's call the type for suspendingprompt
because we're switching up the stack to the prompt (i.e. handler, etc.) and the type for resumingcont
because it is for resuming a delimited continuation.Just like the
stack
type in bag-o-stacks,prompt
andcont
have a sequence of parameter types they expect to receive, where the last parameter must be a reference to where control just came from after a suspend or resume. Forprompt
, control must have just come from acont
and vice versa. Theprompt
reference parameter in acont
never needs to be nullable because the prompt will always exist immediately after a resume. In contrast, acont
may no longer exist if a continuation returns or retires to a prompt, so thecont
reference in aprompt
type may be nullable.cont
references can be created bycont.new
, but there is noprompt.new
.prompt
references are only created byresume
instructions.The interesting thing here is that the results of the continuation function are identical to the values received by prompts the continuation will be able to suspend to. With this simple type system, prompts can only handle one kind of "event," i.e. one sequence of parameter types, so
resume
instructions do not need to be branch tables that can handle arbitrary different result types. By ensuring that the function results are the same as the prompt parameters, we avoid the need for a branch to differentiate suspends from normal returns, soresume
does not need to branch at all. Of course producers can still differentiate suspends from normal returns in user space if they need to.The typing for
suspend
is very similar.As an optimization, we can also allow continuations to retire themselves when they suspend, allowing their resources to be cleaned up eagerly. This is essentially non-local return up to a prompt.
Note that the prompt type used with
suspend_retire
must have a nullable continuation reference parameter because it will receive a null value.The last interesting instruction is
suspend_switch
, which is kind of like a tail-call for continuations. (This is calledswitch_to
in the typed-continuations proposal).suspend_switch
takes both aprompt
reference and acont
reference. Control passes directly to the given continuation, which receives the givenprompt
reference as if it had been resumed from that prompt rather than from a sibling continuation. Since control does not actually pass to the prompt, the target continuation also needs to receive the reference to the previous continuation, otherwise there would be no live references to the previous continuation left after the switch.Beyond these instructions,
cont.new_ref
,cont.bind
,prompt.bind
, andresume_throw
would all be straightforward to include as well.suspend_switch_retire
would also be possible, although it seems more niche.