WebAssembly / stack-switching

A repository for the stack switching proposal.
Other
137 stars 13 forks source link

Unhandled suspension in embedding API #95

Open titzer opened 1 week ago

titzer commented 1 week ago

Some offline discussion turned up the issue of how to represent unhandled suspensions in the embedder API. Let's use this issue to discuss.

AFAICT an unhandled suspension is a pair of (something like) an exnref and a contref. An exnref is itself a pair of a tag and a payload, i.e. a vector of values. Tags in stack-switching add results, but I think they are only relevant for the static typing of handlers in the bytecode.

@rossberg had brought up the concept of a "meta-continuation". My understanding that that is basically a contref that could/should be resumable via the embedding API.

I think contref is pretty similar to a funcref from the embedder's perspective; invoking it is a meta-level operation which is dynamically-typed. The only differences I can see are:

  1. contref doesn't fit under the any heaptype hierarchy, so it should fail any dynamically-typed embedder operations that take an anyref,
  2. contref supports, or should support cont.bind, if we have that embedder operation,
  3. invoking a contref could suspend (but I think invoking a funcref can also suspend), and
  4. it may be represented differently, since it could be a continuation object or a pair of a version+stack resource.
rossberg commented 1 week ago

There actullay are three interrelated issues. Let's separate them.

1. Contref values

At first glance, there is no particular problem with handling contref in the embedding API. A contref is not an anyref, but that's no different from funcref. It can be passed back and forth as a contaddr value, and there should probably be primitives like cont_invoke (which, unlike resume inside Wasm, does not provide a handler) and cont_bind, possibly cont_alloc(?), with the obvious types.

The one issue to be aware of here is that a contaddr has no self-describing type in an efficient implementation. Hence, they could not expose these embedding functions to untrusted users, at least not safely (e.g., in the JS API), because their pre-conditions cannot be checked dynamically. But that seems an acceptable constraint.

2. Unhandled suspenions

The trickier issue is if an embedder function results in a suspension that isn't handled in Wasm and therefore (conceptually) suspends the embedding function call itself. In the API, there currently are two functions that can execute Wasm code and could theoretically suspend: func_invoke and module_instantiate; cont_invoke would be a third one. Currently, these functions return a union RESULT val* | THROW exnaddr | ERROR, where the latter essentially signals a trap. From an exnaddr the API furthermore allows reading the tag and the arguments.

With stack switching, a forth outcome would be a suspension. That raises at least two questions: (1) what is its API-level representation, and (2) what can you do with that?

Re (1), note that a contref is only allocated when a handler is found for a suspension. That is because the extent of the continuation is unknown until its delimiter is found, so you cannot construct it yet (you don't know at which stack edge you are suspending).

We could treat embedder calls as having implicit handlers (i.e., being delimiters), such that the forth case is defined as SUSPEND tagaddr val* contaddr. But that has implications: in particular, if it's a regular contaddr, you could pass it back to Wasm as a contref, which makes it observable that there actually is a delimiter at the boundary.

But our entire design for continuations is such that they correspond 1-to-1 to stacks in the implementation. Hence, AFAICS, if e.g. func_invoke actually was a delimiter that can produce a contref, then that could only be implemented under such a setup if func_invoke was in fact always switching to a new stack when called! That seems highly undesirable from a performance perspective.

So my conclusion from that is that an "unhandled" suspension must be something else, a host-level continuation, which is not interchangeable with actual Wasm continuations. But what exactly is still a bit fuzzy to me, as is (2), what you can do with that. My suspicion is that an implementation may not want to actually reify it, since that would potentially make suspension more expensive, e.g., when suspending across host frames.

3. Suspending host calls

This is the inverse problem of the previous: how host functions called from Wasm can suspend. We need to be able to model that, too, e.g. to correctly describe sandwich scenarios. One instance of this would be an attempt to describe JSPI without hand-waving, which seems like a desirable goal once we have native Wasm suspension.

What a host function can produce currently encapsulated in the spec's type result ::= val* | THROW_ADDR exnaddr | TRAP, which mirrors the above. A similar extension is necessary here.

fgmccabe commented 1 week ago

@rossberg ..."one issue to be aware of here is that a contaddr has no self-describing type in an efficient implementation". We may have a requirement for this.

To help mitigate security sandbox issues, we would like to validate that the continuation being resumed is at least of the same type as that expected by the resume instruction. This must be done at run-time: attackers may be able to replace a continuation value with another, unexpected one. We can't necessarily prevent that, but we can at least ensure that the replacement continuation has the correct type (and therefore user code would not result in a type confusion).

There is the continuation type identified when a coroutine suspends. Our current plan is to simply record the index of the continuation type when we generate a continuation. We will record this in the stack resource alongside the counter.

Is this plan doomed to failure?

rossberg commented 1 week ago

@fgmccabem, that may work for this particular purpose as a (semantically redundant) security measure, with some extra cost and hoops — provided you mean runtime type, since a type index isn't meaningful across module boundaries. (Though I am not sure it buys any extra security when an attacker can already mess with continuations anyway.)

But the harder problem are observable type checks, such as casts or language boundary checks as above. I don't think the implementation you suggest works for that, without creating rather dubious semantics. Because when doing a cast on a contref, you cannot rely on this type information being in sync with that contref. You could only implement a correct semantics for casts on yet unconsumed continuations, and fail in other cases. But that means that a contref would effectively morph its observable type to something weaker (like top) when using it, which is at odds with all principles of typing and substitutability (by which only the other direction would be okay).

The only correct implementation option I could see for an alllocation-free approach would be a fat pointer with the RTT as a third component. But that would presumably be rather costly.

Either way, neither of these is a cost we should carelessly impose on all Wasm implementations.

fgmccabe commented 1 week ago

Funnily enough, I think that the super-fat pointer approach is effectively what we would end up with in V8.

tlively commented 1 week ago

The one issue to be aware of here is that a contaddr has no self-describing type in an efficient implementation. Hence, they could not expose these embedding functions to untrusted users, at least not safely (e.g., in the JS API), because their pre-conditions cannot be checked dynamically. But that seems an acceptable constraint.

This is related to the question of whether contrefs should support casts, but if they don't, then how would you round trip a contref through JS at all and safely recover its type when it flows back into Wasm? It seems that non-castable references should never be able to be passed in from JS to Wasm.

if e.g. func_invoke actually was a delimiter that can produce a contref, then that could only be implemented under such a setup if func_invoke was in fact always switching to a new stack when called! That seems highly undesirable from a performance perspective.

To address this, I've been thinking that we should either have a new version of func_invoke (and any other embedder API that might suspend) that explicitly opts in to suspension (e.g. func_invoke_suspendable) or add a Boolean parameter to func_invoke for opting in. Only when opting in to suspension would func_invoke execute on a new stack. If func_invoke would suspend without the opt-in, the result should be ERROR instead.

rossberg commented 1 week ago

@tlively:

It seems that non-castable references should never be able to be passed in from JS to Wasm.

Yes, that's what I meant earlier: we couldn't expose these functions in the JS API (unless we are willing to impose the extra cost on all JS-embedded Wasm engines). We can still add them to the embedder API, which doesn't need to be enforceably safe (and generally isn't).

To address this, I've been thinking that we should either have a new version of func_invoke

So in non-suspendable mode, it does the equivalent of a barrier instruction? That is a possibility, but I imagine there are valid embedding scenarios where neither option is desirable.

tlively commented 1 week ago

It's not just the API functions that shouldn't be exposed to JS, though, it's also things as simple as passing and returning contref as function parameters and results on the Wasm/JS boundary. ToWebAssemblyValue depends on being able to cast reference values to their expected types.

tlively commented 1 week ago

To address this, I've been thinking that we should either have a new version of func_invoke

So in non-suspendable mode, it does the equivalent of a barrier instruction? That is a possibility, but I imagine there are valid embedding scenarios where neither option is desirable.

Yes, that's the idea. What other behavior could we provide?

titzer commented 1 week ago

@fgmccabe I think we should avoid anything that necessitates a triple for the value representation. Fat pointers to implement the counter approach is probably my limit; otherwise it would probably just make sense to box it and have a continuation object. I didn't think through the implications for that at the JS boundary, but I imagine we'd up in a similar situation as to why we introduced extern.convert_any and what to make a possible boxing cost explicit.

I was also thinking about subtyping on continuation types, and I think our current design pressure towards admitting the counter-based approach ends up in exactly the situation Andreas mentioned above; casts would only succeed for unused continuations and a contref's type would be mutable, so engine optimizations that, e.g. remove redundant casts would not be sound.

rossberg commented 1 week ago

@tlively:

It's not just the API functions that shouldn't be exposed to JS, though, it's also things as simple as passing and returning contref

Yes, absolutely. And that's consistent with not exposing exnref to JS.

What other behavior could we provide?

Take JS world for example, where higher-order functions like forEach are sometimes implemented externally/natively and then call back into JS. I could totally imagine Wasm environments that need to do something similar. With stack switching that results in a sandwich scenario, where suspending in the inner Wasm is not a bug, and simply suspending across the host part is what's wanted (unlike, say, with JSPI). Requiring the host to implement every such callback with multiple stack switches would potentially be prohibitive, so I think the embedding API ought to make that kind of forwarding cheap. How is the question, of course, but distinguishing host continuations from Wasm seems required.

titzer commented 6 days ago

@rossberg

Wizard implements all host -> Wasm calls as a stack switch now. I'm more optimistic that stack switching cost will be cheap enough that this is viable, especially considering that a host -> Wasm call, at least in this setup, means unpacking metavalues into Wasm's representation. I haven't done extensive measurements but the actual stack switch mechanism is on the order of 10-15 instructions. (The cost of finding/creating a fresh stack is more). I'd like to do more extensive measurements, but I'm at least somewhat optimistic here.