WebAssembly / interface-types

Other
642 stars 57 forks source link

Exception object type as anyref or its subtype #10

Closed aheejin closed 5 years ago

aheejin commented 6 years ago

We have been discussing using anyref as an exception type (or its supertype) recently, as @eholk mentioned in https://github.com/WebAssembly/host-bindings/issues/9#issuecomment-349139419.

The current exception handling proposal imposes several difficulties (WebAssembly/exception-handling#30 and WebAssembly/exception-handling#31), and allowing opaque exception objects to be stored in locals and be dynamically type-tested can solve most of those problems. (They don't have to be stored in linear memory.) And the proposed anyref as WASM value (#9) type sounds like it satisfies many requirements.

One thing is, we use a 'tagged value' to represent an exception object. An exception object is a pair of a tag and a list of values. Definitions of the related terminologies are here. Tags can be used many ways, possibly to denote types (int, MyException&, ...) or languages (C++, JavaScript, ...). In C++ exception support, we are using them to denote languages (we can't do types with C++ because of inheritance and such): so for example a specific tag can mean C++, the other tag can mean JavaScript, etc.

Can we treat anyref as tagged values in general? It might be possible for most of non-exception objects to have the same predefined tag, making them essentially tagless. This way, we would also need a match instruction that dynamically tells if the current object (on stack) has the specified tag or not.

Or, can we make the tagged value type as a subtype of anyref? In that way we would also need some instructions like isinstanceof as suggested in #4, as well as also the match to dynamically test the tag. This way we need to introduce supertype/subtype hierarchy in the system.

cc @eholk @dschuff @KarlSchimpf

rossberg commented 6 years ago

If we allowed tagged values as first-class values then their type would have to be something new. It could probably be made a subtype of anyref, depending on implementation details. You'd need to further distinguish the type of such tagged values from that of caught exception values (sometimes called exception packets), because those are more than just the exception that was thrown. They include context information, for example, an associated stack trace. With resumption, they would in fact include a continuation.

For that reason, I'd be hesitant to make exception packets first-class values that can escape their handler. That would have severe implications for the future, for example, any attempt to later extending exception handling with resumption might become difficult or very costly, because continuations would immediately become first-class as well. That's a scary thing to commit to prematurely.

magcius commented 6 years ago

This would then turn exceptions into GC'd objects, correct? If they can be local values, one can imagine they can set_elem to store them in a global table, which means they would need some form of reference management. I don't quite understand how this is supposed to fit together with C++ exceptions.

aheejin commented 6 years ago

@magcius The exception objects are host-created opaque objects and not C++ pointers. A C++ exception object pointer is a value in a {tag, value+} pair.

eholk commented 6 years ago

...any attempt to later extending exception handling with resumption might become difficult or very costly, because continuations would immediately become first-class as well.

While supporting resumption is something I'd like to see in the future, it is more important that we design an exception system now that meets the needs of C++. If we can do this and accommodate resumption in the future, great! But making resumption difficult should not stop us from solving problems that C++ has now.

It may be that exceptions with resumption, or more generalized effect handlers, are differently enough from the exceptions we want to enable now that we should design them as separate features with completely separate types and instructions.

lukewagner commented 6 years ago

If we allowed tagged values as first-class values then their type would have to be something new. It could probably be made a subtype of anyref, depending on implementation details.

Agreed

You'd need to further distinguish the type of such tagged values from that of caught exception values (sometimes called exception packets), because those are more than just the exception that was thrown.

What if we inverted the what-owns-what relationship so that the caught value was the tagged value and its contents were a list of wasm values (which would include anyref). Additionally:

(This all feels rather symmetric to function import/export rules.) Thus, by default, a wasm throw wouldn't need to eagerly capture a stack or do a GC allocation. If wasm wanted a backtrace, in the EH MVP, it would need to call out to JS to throw new Error(). When we one day added first-class stack-walking to core wasm, presumably that could be used directly as long as there was an associated value type.

For that reason, I'd be hesitant to make exception packets first-class values that can escape their handler. That would have severe implications for the future, for example, any attempt to later extending exception handling with resumption might become difficult or very costly, because continuations would immediately become first-class as well. That's a scary thing to commit to prematurely.

I would've thought the necessary constraint here was that the resumption value (which could be a new value type whose values would be stored inside the tagged exception value) was invoked once. This could be a dynamic restriction. Dropping the syntactic restriction of resume would, just like rethrow, increase the expressivity of resumption as a feature. So I don't see the problem here, and even a win.

rossberg commented 6 years ago

@lukewagner, yes, one vs multiple invocations of a continuation is probably gonna be a dynamic check. What you might want to know statically, though, is zero vs one. Other implementations e.g. track in the exception type whether it is resumable. That way, a handler can potentially be compiled more cheaply. With only catch-all you perhaps could annotate the handler differently, but if the exception can also escape then you'd need to track this property further through the type system I think. What corner cases could arise and does this impose extra cost because we'll need to assume the worst case in more places?

Agreed with most of your other points. The expressiveness of allowing an exception to escape will be important, but I suspect that case will incur some overhead that we would not want to impose all the time.

@eholk, it would be rather unfortunate to design a complex Wasm feature mainly for the benefit of C++ or a specific compiler. We have successfully avoided that so far.

@aheejin, note that first-class and storable in memory are separate things. References can never be stored in linear memory. They are still first-class if you can pass them to other functions or store them in global as value types. Statically predictable lifetime is interesting for exception packets in so far that you typically don't even want to materialise them.

aheejin commented 6 years ago

@rossberg

That way, a handler can potentially be compiled more cheaply. With only catch-all you perhaps could annotate the handler differently, but if the exception can also escape then you'd need to track this property further through the type system I think. What corner cases could arise and does this impose extra cost because we'll need to assume the worst case in more places?

But without the ability to rethrow from outside of a catch block, which is the reason I want to assign exceptions to locals, we may end up having to assume the worst case for most exceptions anyway. So the point I was trying to make in WebAssembly/exception-handling#30 is, it is a very common pattern that a rethrow is followed by some code that is reachable from many catchs, like below:

block $label0
  try
    ...
  catch i
    br $label$0
  end
  ...
  try
    ...
  catch i
    br $label$0
  end
end

some common code
rethrow

If we want to rethrow an exception that's caught by either of the catches and don't want to duplicate 'some common code' part, which can be arbitrarily long, there is no way to support this in the current scheme. One thing @eholk suggested offline is we might be able to use resumable exceptions everywhere, so that we can use it like a subroutine to run the common code and come back to a catch block:

try
  try
    ...
  catch i
    ...
    throw j          (1)
    rethrow          (4)
  end
catch                (2)
  some common code
  resume             (3)
end

(Execution order: (1) -> (2) -> (3) -> (4))

Here I showed only a single try-catch pair (the outer try-catch will be inserted by compiler to make this resumable, so that the control flow can come back to the inside of a catch block again). If there are multiple catch blocks that share some common code, we need that many extra compiler-inserted try-catches. But anyway, while this scheme looks overly complicated provided that this is just to support a couple of very plain and simple try-catches, this is going to use resumable exception everywhere, just to return to a catch block to rethrow something, because we can't assign exceptions to locals.

Do you suggest any alternatives that can make rethrow happen?

lukewagner commented 6 years ago

@rossberg

@lukewagner, yes, one vs multiple invocations of a continuation is probably gonna be a dynamic check. What you might want to know statically, though, is zero vs one

Building on the approach I outlined above, just like the exception-with-stack value could be an opaque anyref stored inside the tagged exception value, I imagine we would have a new opaque continutation value type, created by some new resumable_throw opcode, that would be optionally stored in the exception tagged value. I think we need the static throw vs. resumable_throw distinction anyway because of how differently they get compiled locally in the function. Then the catch site either has, or doesn't have, a continuation value just based on the matched exception tag's signature. So I think the zero vs. one would be sufficiently static for an efficient impl, or is there something else?

flagxor commented 6 years ago

@lukewagner @rossberg Pulling out the throw vs resumable_throw distinction seems useful (as does in general grounding our choices in what's required in the compiler). It also (hopefully?) let's us disentangle resumption.

Exception handling for C++ with good cross-language interaction seems like it's going to require some kind of mechanism for the exception to make it's way outside the scope of the catch. So far we've danced around this with:

rossberg commented 6 years ago

@lukewagner, @flagxor, distinguishing the throw is one end, but I strongly suspect that you'll also want to be able to distinguish on the catch end for optimal code. That looks trickier with a tag-agnostic catch_all, but maybe it can be done.

@aheejin, @flagxor, the other option proposed earlier was a generalised rethrow, which works similar to a br, but for the exceptional path. Taking @aheejin's example from above:

try $label0
  try
    ...
  catch i
    rethrow $label0  // rethrow from target block
  end
  ...
  try
    ...
  catch i
    rethrow $label0
  end
catch
  some common code
  rethrow
end

In general, rethrow $l terminates the target block with the current exception. If that block happens to be a try body, then this is simply a jump to the respective handler and can be compiled as such. (Omitting here the source label to denote the "current" exception that we already have on rethrow, so in fact it would then have two labels, source and target.)

But you might all be right, and exceptions in locals still be the nicer option. I agree with @lukewagner that we ultimately may want to have that anyway. Just fearing that this might be harder to design and implement properly and that we end up cutting corners or prematurely pruning the design space for resumption. For example, there is the choice about shallow vs deep handlers, where the latter always resumes inside the handling try, which turns out to have certain advantages (and disadvantages). That option is lost when you allow escaping. I would at least suggest consulting with people who have experience implementing and using such mechanisms.

lukewagner commented 6 years ago

@rossberg Ah, I see your point: when compiling a call from within a try block block that can catch continuations, you need to start a new stack segment at the callsite so that the segment can be set aside when executing the handler. If there are already separate throw/resumable_throw and rethrow/resume opcodes, then it also seems natural to have separate catch/catch_continuation opcodes, which should give us all the static info. That still leaves some questions about how the continuation value gets created/passed, but it seems like there are viable options here.

But overall still agreed that the first-class exception value is probably our cleanest solution.

aheejin commented 6 years ago

@rossberg

I don't think that would work because, wrapping code parts with an outer enclosing try-catch can introduce other problems. Putting an outer enclosing try-catch for two arbitrary try-catches involves computing a nearest common dominator of the two try-catches in a CFG, which may very well be the entry node, resulting in the new try-catch wrapping the whole function. And there can be other function calls that might throw elsewhere, which should throw to the caller in case they throw. But now we have an enclosing catch that wraps all those calls, so they would get caught in the new catch, while they should just throw to the caller. Referring to your example,

try $label0

  (1)

  try
    ...
  catch i
    rethrow $label0
  end

  (2)

  try
    ...
  catch i
    rethrow $label0
  end

  (3)

catch
  some common code
  rethrow
end

there can be other calls that might throw in (1), (2), or (3). Their semantics now have changed because in case they throw, they are going to be caught by the new catch. We can insert some more code to make those calls throw to the caller, like, before all those calls we set some local signalling that they are meant to be propagated up to the caller and not be caught by a new catch or something, and in the new catch block we insert a branch to check the value of that local and do different things based on the result. But this is clearly ugly and will contribute to code size as well.

And what is the difference between rethrow label and normal rethrow? In your example code, it doesn't look like they are semantically different. (If we replace the rethrow label with a normal rethrow, it would have the same semantics, I mean)

eholk commented 6 years ago

there can be other calls that might throw in (1), (2), or (3). Their semantics now have changed because in case they throw, they are going to be caught by the new catch.

For a second I thought you could work around this by throwing a new exception with a tag that is unused elsewhere, but then you would still need a way to be able to rethrow the original exception.

Still, I think we can make this work with another try block, like this:

try $label1
  try $label0
    (1)

    try
      ...
    catch i
      rethrow $label1
    end

    (2)

    try
      ...
    catch i
      rethrow $label1
    end

    (3)

  catch
    rethrow $label2
  end $label0
catch
  some common code
  rethrow
end $label1

This gives us a stack of try blocks, 2 1 0 (I'm using 2 as an implicit label that means "skip all the try blocks in this function"). We have common code in label 1's catch block that we want to run if any of the inner tries catch an exception. If the other code, such as (1), (2), or (3) throw, they are caught by the label 0 catch block. This block simply rethrows, but skips the common code. On the other hand, to run the common code, we rethrow $label1, which skips the layer at layer 0.

aheejin commented 6 years ago

@rossberg @eholk

Ah, now I understand what @rossberg meant. It was what you suggested in https://github.com/WebAssembly/exception-handling/issues/29#issuecomment-340012257. Yeah, we actually might be able to use this to solve this. This can incur code size increase, but I don't think that would be significant. I have to check if this can cover all the cases.

This is equivalent to adding a depth argument to rethrow (as in br or br_if). Does that mean we remove the original depth argument of rethrow, which specifies which exception object to rethrow? We don't seem to use it anywhere actually. Or do we keep both arguments?

eholk commented 6 years ago

It seems like having the depth argument be how many blocks out to throw instead of which exception to rethrow is more useful, so I think I'd be in favor of that change.

On Tue, Dec 19, 2017 at 4:11 PM Heejin Ahn notifications@github.com wrote:

@rossberg https://github.com/rossberg @eholk https://github.com/eholk

Ah, now I understand what @rossberg https://github.com/rossberg meant. It was what you suggested in WebAssembly/exception-handling#29 (comment) https://github.com/WebAssembly/exception-handling/issues/29#issuecomment-340012257. Yeah, we actually might be able to use this to solve this. This can incur code size increase, but I don't think that would be significant. I have to check if this can cover all the cases.

This is equivalent to adding a depth argument to rethrow (as in br or br_if). Does that mean we remove the original depth argument of rethrow, which specifies which exception object to rethrow? We don't seem to use it anywhere actually. Or do we keep both arguments?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/WebAssembly/host-bindings/issues/10#issuecomment-352925509, or mute the thread https://github.com/notifications/unsubscribe-auth/AAGdJqxQspg1Bk4LwCjk4mjtgYJlfUyTks5tCFCxgaJpZM4Q6jpg .

rossberg commented 6 years ago

@aheejin, @eholk, yes, that's the rough idea. Good point about other throws, though. Code that throws in (1), (2), (3) could either be handled the way @eholk suggests, or by also having a target label on throw itself to skip over the inner try -- which admittedly makes this proposal increasingly less attractive.

As said above, the proposal would imply having two labels on rethrow, because the motivation for having the source label is independent of this use case, it being a general composability argument (e.g. if you need to nest try into handlers).

But I'm also thinking through the first-class exceptions alternative some more. I'm positive that we could come up with an adequate semantics if we give up on deep handlers -- which might not fly anyway in a low-level language. (Unfortunately, I'm off into vacation now, but back in 2 weeks.)

aheejin commented 6 years ago

@rossberg

Code that throws in (1), (2), (3) could either be handled the way @eholk suggests, or by also having a target label on throw itself to skip over the inner try -- which admittedly makes this proposal increasingly less attractive

The bigger problem is it is not even going to be a throw - in most cases it's gonna be a call that might throw. I'm not actually against attaching a depth to throw instruction, but modifying call instruction so it can have a depth or creating the second call' instruction that can take a depth sounds a like a bigger and less attractive change.

rossberg commented 6 years ago

@aheejin, right, good point.

aheejin commented 5 years ago

Closing this, since we decided to make except_ref as a subtype of anyref.