Asymmetric coroutines without search

I'm definitely not proposing any changes to either proposed design, but I wanted to share this idea for discussion. I haven't talked with anyone about this yet.

Our discussion in #49 about how asymmetric coroutines are fundamentally more expressive than symmetric coroutines with respect to their interaction with JS got me thinking about how we could combine those benefits with the search-free switching ideas from the bag-o-stacks proposal. I know @slindley and others have emphasized in the past that it would be possible to change the design of typed continuations to avoid the need for handler search, but I haven't seen concrete details of what that could look like until now.

The main idea is that instead of a single stack type, we differentiate types for switching "up" the stack of stacks (i.e. suspending) and types for switching "down" the stack of stacks (i.e. resuming). Let's call the type for suspending prompt because we're switching up the stack to the prompt (i.e. handler, etc.) and the type for resuming cont because it is for resuming a delimited continuation.

comptype = ... | prompt t* (ref null? x) | cont t* (ref x)

C ⊢ prompt t1* (ref null? x) : ok
-- C.TYPES[x] = cont t2* rt

C ⊢ cont t1* (ref x) : ok
-- C.TYPES[x] = prompt t2* rt

Just like the stack type in bag-o-stacks, prompt and cont have a sequence of parameter types they expect to receive, where the last parameter must be a reference to where control just came from after a suspend or resume. For prompt, control must have just come from a cont and vice versa. The prompt reference parameter in a cont never needs to be nullable because the prompt will always exist immediately after a resume. In contrast, a cont may no longer exist if a continuation returns or retires to a prompt, so the cont reference in a prompt type may be nullable.

cont references can be created by cont.new, but there is no prompt.new. prompt references are only created by resume instructions.

C ⊢ cont.new x y : [] -> [(ref x)]
-- C.TYPES[x] = cont t1* (ref z)
-- C.FUNCS[y] = func t2* -> t3* rt
-- C.TYPES[z] = prompt t3* rt

The interesting thing here is that the results of the continuation function are identical to the values received by prompts the continuation will be able to suspend to. With this simple type system, prompts can only handle one kind of "event," i.e. one sequence of parameter types, so resume instructions do not need to be branch tables that can handle arbitrary different result types. By ensuring that the function results are the same as the prompt parameters, we avoid the need for a branch to differentiate suspends from normal returns, so resume does not need to branch at all. Of course producers can still differentiate suspends from normal returns in user space if they need to.

C ⊢ resume x : [t1* (ref null x)] -> [t2* rt]
-- C.TYPES[x] = cont t1* (ref y)
-- C.TYPES[y] = prompt t2* rt

The typing for suspend is very similar.

C ⊢ suspend x : [t1* (ref null x)] -> [t2* rt]
-- C.TYPES[x] = prompt t1* (ref null? y)
-- C.TYPES[y] = cont t2* rt

As an optimization, we can also allow continuations to retire themselves when they suspend, allowing their resources to be cleaned up eagerly. This is essentially non-local return up to a prompt.

C  ⊢ suspend_retire x : [t1* (ref null x)] -> [t2*]
-- C.TYPES[x] = prompt t1* (ref null y)

Note that the prompt type used with suspend_retire must have a nullable continuation reference parameter because it will receive a null value.

The last interesting instruction is suspend_switch, which is kind of like a tail-call for continuations. (This is called switch_to in the typed-continuations proposal). suspend_switch takes both a prompt reference and a cont reference. Control passes directly to the given continuation, which receives the given prompt reference as if it had been resumed from that prompt rather than from a sibling continuation. Since control does not actually pass to the prompt, the target continuation also needs to receive the reference to the previous continuation, otherwise there would be no live references to the previous continuation left after the switch.

C  ⊢ suspend_switch x : [t1* (ref null z) (ref null x)] -> [t2* rt]
-- C.TYPES[x] = cont t1* (ref null? y) (ref z)
-- C.TYPES[y] = cont t2* rt

Beyond these instructions, cont.new_ref, cont.bind, prompt.bind, and resume_throw would all be straightforward to include as well. suspend_switch_retire would also be possible, although it seems more niche.

The typing for suspend is very similar.

C ⊢ suspend x : [t1* (ref null x)] -> [t2* rt]
-- C.TYPES[x] = prompt t1* (ref null? y)
-- C.TYPES[y] = cont t2* rt

How do you ensure that the prompt given as input to suspend actually comes from a parent stack? Since it's a first-class value, it seems like it could be smuggled (e.g. through globals) out of another call stack. I remember we discussed a while ago (2022 in-person meeting?) that this verification process often ends up being a linear search.

I've also been thinking about how to leverage the commitment to a "reserved register/stack slot" for the counter/encoded JS handlers from bag-of-stacks. If the typed continuations proposal were restricted to a single kind of signal and just two handlers - "suspend" or "trap", then I think it would be possible to use the reserved register to propagate a pointer to the most recent handler. One never needs to worry about daisy-chaining of parent handlers since a signal is guaranteed never to automatically propagate out of the "suspend" or "trap" handler. Maybe this proposal is expressive enough. However if a user did need a signal to propagate further up the stack, they'd have to mock this in userspace, which might be inefficient (I think there are analogous frictions with implementing a source language's delimited continuations using bag-of-stacks).

If we wanted to think about facilitating the existing typed continuations proposal with rich handlers using a similar strategy, then as we discussed here the remaining issue is dealing with daisy-chaining of handlers. I think this could potentially be accomplished by treating the necessary references to parent handlers almost like additional local variables in the relevant call stack.

So if a function has a maximum depth of N handlers in its body, then on initial entry into the function, the call stack needs to reserve N extra slots for handler pointers. You always have a pointer to the most recent handler (potentially from the parent function) in your "reserved register", and when you enter the Nth handler within the function, you put the pointer to this previous handler in the Nth reserved stack slot, and the pointer to the new current handler in the "reserved register". To allow N to be known ahead-of-time in the streaming compilation tier, function headers might need to be extended with a predeclaration of "max handler depth", which could potentially live in the local variable encoding space (although you could get away without this predeclaration, with enough tricks - I imagine the existing exception handling instructions have similar considerations).

I think what you are describing is what's usually called named handlers. There is a sketch of how that could look with typed continuations in the explainer.

(What we previously meant with continuations without search was a bit different: suspension would still have an implicit handler, but it's always the innermost one, i.e., the one for the current stack. If that cannot handle the event then it doesn't continue trying the next one but simply traps. That's roughly how e.g. Wasmtime fibers work, though of course they aren't safe.)

But as @conrad-watt hints at, as soon as you have prompts — or any form stack chaining, for that matter, like with the separated dynamic scoping feature you presented — you'll need a linear check to guarantee safety, either when switching up (suspending) or when switching down (resuming). We can shift the cost to around in an implementation but we can't avoid it. The linear check is simply the price we have to pay for safety in Wasm.

Plus, if you also have undirected (symmetric) switches, like with bag-o-stacks, then those have to perform the linear check every time, because they lack the structure to establish additional invariants. That is, you'll end having to check twice as often as when using asymmetric primitives.

(And of course, once you are there, the only presumed advantage of bag-of-stacks is out the window. Furthermore, if a linear check has to be paid for anyway, then extending it to search is almost free. In particular, since not building that in has other complexity costs: to be able to implement it in user space, we'd need to support (a) a form of catch-all handler and (b) a "resuspend" instruction. Overall, that ends up being both more complicated and less efficient.)

How do you ensure that the prompt given as input to suspend actually comes from a parent stack?

I don’t have anything better than linear search. It’s interesting (and unfortunate) that we can’t seem to achieve the nice composability properties of asymmetrical coroutines without it.

If the typed continuations proposal were restricted to a single kind of signal and just two handlers…

This seems extremely limiting, though. What type of value would be passed to the single non-trap handler? I’m not too sure of the benefit either; IIUC, this would require a user space linear search comprising multiple suspends/switches in all the same places other designs would have the engine do a linear search ending in just one switch.

How do you ensure that the prompt given as input to suspend actually comes from a parent stack?

I don’t have anything better than linear search. It’s interesting (and unfortunate) that we can’t seem to achieve the nice composability properties of asymmetrical coroutines without it.

I think we need to be a bit careful in order to disentangle different concerns here.

We can have asymmetric coroutines without the linear search. Indeed, plain asymmetric coroutines as in Lua or Wasmtime Fiber don't build in a search. What they offer instead is much like what @conrad-watt and @rossberg are alluding to. Plain asymmetric coroutines ensure that the innermost handler must handle everything. So if you want to simulate a linear search for a handler then you have to implement explicit forwarding (by analogy, imagine a version of exception handlers where exceptions are always handled by the innermost handler and how you would implement general exception handlers on top of that). This is exactly how the current implementation of WasmFX on top of Wasmtime Fiber works (though this is certainly not the most efficient implementation strategy).

One source of confusion in all of these discussions is that there are different notions of composability at play here. Let me briefly describe the two most relevant ones.

What asymmetric coroutines provide that symmetric coroutines don't is composability with built-in side-effects - the existing built-in side-effects essentially "just work" without the need to generate lots of extra plumbing everywhere. Examples where this is relevant to Wasm include exceptions, calling back-and-forth between JS and Wasm (https://github.com/WebAssembly/stack-switching/issues/49), and generating meaningful stack traces in the presence of stack-switching (@frank-emrich will say a little about this last point at the in-person meeting next week).
Effect handlers, as exemplified by WasmFX (aka typed continuations) are a refinement of asymmetric coroutines, that introduce a richer notion of composability. As well as supporting composition with built-in effects (by virtue of being a form of asymmetric coroutines), they support composition of multiple user-defined features, for instance, lightweight threads and generators. This form of composition is where nested handlers (and a potentially linear cost for searching for a handler) come in. In practice in existing systems like OCaml 5, which uses effect handlers for high-performance stack-switching, this kind of composition is used sparingly and handler stacks are typically shallow.

I think what you are describing is what's usually called named handlers. There is a sketch of how that could look with typed continuations in the explainer.

I think this is essentially right, except the version @tlively described is restricted so that each handler can handle only one kind of event.

Closing this since this idea does not avoid linear runtime and we have settled on a design to move forward with.

WebAssembly / stack-switching

Asymmetric coroutines without search #59