WebAssembly / exception-handling

Proposal to add exception handling to WebAssembly
https://webassembly.github.io/exception-handling/
Other
161 stars 35 forks source link

Restore rethrow's immediate argument? #126

Closed aheejin closed 3 years ago

aheejin commented 4 years ago

We left out rethrows immediate argument in the last CG meeting's poll because we weren't sure about the use cases, but @rossberg suggested one in https://github.com/WebAssembly/exception-handling/issues/125#issuecomment-693436340.

I think having this functionality is in general good as long as there exist languages that

What do you think?

fgmccabe commented 4 years ago

This seems odd to me. Different languages have different semantics for throwing exceptions in general, including rethrow. It seems pretty unlikely that a single low level mechanism will be cover all the conflicting requirements. A better approach IMO would be to look at how the throw instruction would work in a rethrow situation. TLDR; this feels like the wrong question (should rethrow take an immediate).

aheejin commented 4 years ago

@fgmccabe

A better approach IMO would be to look at how the throw instruction would work in a rethrow situation.

I'm not exactly sure what this means. Do you mean, can we use throw instead for rethrowing, which in turn means "should we even keep rethrow"?

I think having two kinds of throwing instruction, currently throw and rethrow, offers more options to language implementors. Some languages, such as C++, do not need that functionality, but other languages might. And both the current proposal and the newly-decided proposal have it. I didn't mean to re-discuss whether we should keep rethrow or not here. (If someone wants to discuss it, it is a separate discussion.)

Restoring the immediate means whether we should allow rethrow to choose an exception to rethrow other than the topmost (or the recent-most) one, and that was the focus of this issue.

RossTate commented 4 years ago

It's hard to discuss tradeoffs between design alternatives for a feature without known use cases for the feature. I think what @fgmccabe is noting is that, when you consider the potential use cases, you don't find much consistency. For example, Java, Python, and C# each have different semantics for rethrowing exceptions. But maybe those are not meant to line up with the rethrow instruction. So having concrete examples of how rethrow would be used would help facilitate this discussion.

takikawa commented 4 years ago

I read the justification for including this that @rossberg posted, and I agree that making nesting of handlers easier/possible is a natural choice from the language semantics point of view.

However I'm a bit concerned about the complication for implementing this, as I think the immediate argument for rethrow effectively introduces an implicit exception stack that's separate from the normal value stack (the previous design with exnref was more consistent in this regard, with everything being on the value stack). This second stack may need to be explicitly tracked by an implementation, to ensure outer exception references remain live.

tlively commented 4 years ago

Is it correct that rethrow-with-immediate can always be emulated by adding a branch out to the correct catch scope and then doing a rethrow-without-immediate? If so, I'm not convinced by the composability argument since that's a fairly simple transformation. I would be convinced if we had a usecase where rethrow-with-immediate would be used often enough that there would be a code size win from not having to do that transformation, but given our lack of rethrow use cases, I don't think that's a compelling hypothetical, either.

aheejin commented 4 years ago

@takikawa

However I'm a bit concerned about the complication for implementing this, as I think the immediate argument for rethrow effectively introduces an implicit exception stack that's separate from the normal value stack (the previous design with exnref was more consistent in this regard, with everything being on the value stack). This second stack may need to be explicitly tracked by an implementation, to ensure outer exception references remain live.

Don't you need to keep this exception stack anyway even without the immediate argument? As @tlively said, we can still go from the inner catch scope to outer catch scope (by a branch or fallthrough) and rethrow the outer exception from there.

aheejin commented 4 years ago

@tlively

Is it correct that rethrow-with-immediate can always be emulated by adding a branch out to the correct catch scope and then doing a rethrow-without-immediate?

I think so.

but given our lack of rethrow use cases, I don't think that's a compelling hypothetical, either.

For C++ we lack use cases, but for languages that use rethrow functionality like that in https://github.com/WebAssembly/exception-handling/issues/125#issuecomment-693436340, I'm not sure..?

RossTate commented 4 years ago

That comment does not refer to any actual language. We currently have zero use cases. Python and C# both modify their exceptions when they rethrow them, and so they would have to implement their rethrow with throw of a new exception payload. Java could just as easily throw the rethrown exception. So we have multiple languages that would not use a rethrow instruction to implement exception rethrowing, and we have another language that could just as well implement exception rethrowing with throw.

Without knowing what rethrow is useful for, we cannot evaluate its design alternatives. The clever translation @tlively gave may or may not work depending on what rethrow is supposed to be doing. Until we have concrete examples of how it's supposed to be used and what purpose it's supposed to fulfill, we can all picture different variations of what rethrow does and is for and unintentionally speak past each other. We cannot effectively design features without use cases for those features.

takikawa commented 4 years ago

Don't you need to keep this exception stack anyway even without the immediate argument?

@aheejin Thanks, I think you're right about this, nevermind my comment about rethrow then.

rossberg commented 4 years ago

@aheejin, I have a really dumb question: how did you implement a correct semantics of finally or destructors under the previous proposal without using rethrow? As far as I can tell, rethrow is the only way to (re)throw exceptions whose constructor you don't know. It seems that if you don't use rethrow then any such feature will misbehave on foreign exceptions, e.g., JS ones.

aheejin commented 4 years ago

@rossberg (I wrote and deleted a comment just not and please ignore that)

Sorry, I think I said something confusing. Our C++ implementation currently uses rethrow for

  1. When an (both C++ and foreign) exception doesn't match the current catch. The current catch catches everything. With the new proposal decided, we use catch (C++ tag), but that doesn't mean we catch all C++ exceptions (i.e., in some catch we only catch int or something), so we have to still use rethrow if an exception doesn't match. This use case is actually not rethrowing but resuming, and it will be replaced by filter functions/blocks after two-phase unwinding is introduced later. In the current single-phase, rethrowing and resuming aren't different.

  2. After running destructors. This is also a resuming behavior and will be replaced by try-unwind-end. (In the single phase, the exception will be effectively rethrown at the end, even though there's no explicit rethrow). C++ doesn't have finally, but for the languages that have it, clang compiles it away, so it will be handled in a similar way as destructor blocks.

What I tried to say was we don't use rethrow for C++'s throw;, which is C++'s rethrowing keyword, but apparently I said things in a very confusing way. The reason is C++'s throw; apparently doesn't preserve stack traces. Our current C++ implementation prototype's catch (...) doesn't catch foreign exceptions, mainly for implementation simplicity. We may want to change this later.

rossberg commented 4 years ago

@aheejin, that makes sense, thanks for the clarification.

So this seems to imply that we clearly need rethrow, unless we want to introduce all of two-phase unwinding now.

RossTate commented 4 years ago

As far as I can tell, rethrow is the only way to (re)throw exceptions whose constructor you don't know.

The current catch catches everything. With the new proposal decided, we use catch (C++ tag), but that doesn't mean we catch all C++ exceptions (i.e., in some catch we only catch int or something), so we have to still use rethrow if an exception doesn't match.

Notice that this is "rethrowing" an exception with a known tag and available payload. rethrow is not necessary for this. You can just use throw, and the two programs would be semantically equivalent.

aheejin commented 4 years ago

@RossTate

Notice that this is "rethrowing" an exception with a known tag and available payload. rethrow is not necessary for this. You can just use throw, and the two programs would be semantically equivalent.

Without rethrow,

  1. We have to remember the order of extracted values and reconstruct them, and the bookeeping necessary for that may not be easy.
  2. This also increases code size.
  3. Also note that this is different from C++ throw;. C++ throw; doesn't preserve stack traces but this should. This is effectively uncaught exception and should be skipped by a filter function later when we have two-phase, so in the single phase, in case stack traces are embedded in the exception, this shouldn't be lost.

I think we have a very clear use case for rethrow in the current toolchain for all these reasons. And also, I think we agreed on the proposed spec like yesterday...? Not sure if we want to spend more time to discuss taking something out of that agreed spec only after a day. I opened this issue to discuss whether we need the immediate for rethrow.

RossTate commented 4 years ago
  1. You put the payload (which I believe for C++ is currently just an i32) in a local variable. There are likely other reasons to do this anyways (e.g. to access the payload multiple times).
  2. Barely.
  3. Stack traces are not part of the semantics of WebAssembly exceptions. They are a debugging tool, and there are other, more robust ways for engines to track stack traces to this end.

I opened this issue to discuss whether we need the immediate for rethrow.

For me, the answer depends on what rethrow means and how rethrow will be used. That said, it does sound like there's reason to believe that every hypothetical use does not need an immediate, so I'm personally fine with you closing the issue with that answer.

aheejin commented 4 years ago

@RossTate

  1. You put the payload (which I believe for C++ is currently just an i32) in a local variable. There are likely other reasons to do this anyways (e.g. to access the payload multiple times).
  2. Barely.
  3. Stack traces are not part of the semantics of WebAssembly exceptions. They are a debugging tool, and there are other, more robust ways for engines to track stack traces to this end.

Without two-phase unwinding, embedding stack traces or other info in exceptions is a way that we can provide some info. We didn't specify which info we should provide in the spec to maximize each engine's leeway according to their necessity though. We are currently using it, and I think this is sufficient to attest to the "Is there a use case" argument.

For me, the answer depends on what rethrow means and how rethrow will be used. That said, it does sound like there's reason to believe that every hypothetical use does not need an immediate, so I'm personally fine with you closing the issue with that answer.

Not sure exactly what you mean. The objective here is to decide whether we should restore the immediate back or not, and if we decide we should, propose to the CG meeting about the change. Have we made the decision?

rossberg commented 4 years ago

@RossTate:

Notice that this is "rethrowing" an exception with a known tag and available payload.

To implement destructors or finally in a heterogeneous language environment you need to be able to rethrow exceptions whose tag you do not know. For example, when C++-Wasm calls back into JS and JS throws.

RossTate commented 4 years ago

To implement destructors or finally in a heterogeneous language environment you need to be able to rethrow exceptions whose tag you do not know

@rossberg That has been addressed by try/unwind.

Without two-phase unwinding, embedding stack traces or other info in exceptions is a way that we can provide some info.

@aheejin An engine can have a global variable for the stack trace. If a function throws an exception, then the engine can initialize this variable to the empty stack. As it searches for a catch, it can add frames to this stack. If it finds a catch, then it can leave the stack untouched until the catching function returns, at which point it clears the variable, but if the function throws again then it continues to build upon the stack. This will work even for languages that modify exceptions as they go up the stack. It also can be used to get stack traces for traps.

We also want to consider that there are uses of the exception handling proposal that do not want you collecting stack traces because it is a waste of time, i.e. they are using it for dynamic control flow and know the thrown exception will be caught.

Applications that do want a stack trace can always explicitly put one into the payload. I prefer this approach because it takes guesswork out of engines (should the engine trace the stack or not? answer depends on the application).

aheejin commented 4 years ago

@RossTate I didn't say rethrow is the only way that we can provide stack traces. This is a way we are currently using it (with single phase unwinding), and was included in the new spec we agreed on, and we have to change our implementation even further without it, so I'd say we keep it. Of course, it is not specified in the spec what kind of auxiliary info a VM should include, so a VM can choose not to provide any side info, and that's fine.

I answered your "is there a use case?" argument. Given that we implemented it, are using it, and haven't had a major problem with it, I don't think every single aspect of the existing spec should also satisfy "Is this the only way to do this thing?" question.

RossTate commented 4 years ago

Have we made the decision?

My impression is that there's no need for an immediate in any hypothetical use. @rossberg is I think the only person we're waiting for updated thoughts from.

rossberg commented 4 years ago

My point still stands I believe. Implicit naming of nestable constructs is a basic structural design mistake. I gave an example. What would be the advantage of artificially crippling the instruction that way?

aheejin commented 4 years ago

Note that specifying exception to rethrow was capable using exnref in the previous (meaning the spec before this CG meeting) spec, so we are not introducing any new functionalities.

RossTate commented 4 years ago

@rossberg Not having an index makes the instruction size smaller for the extremely common case. And having an index would be redundant because it can easily be encoded through existing constructs.

aheejin commented 4 years ago

@rossberg Not having an index makes the instruction size smaller for the extremely common case.

Barely. And it requires the toolchain to do some unnecessary code transformation in the other cases, increasing complexity and code size there.

rossberg commented 4 years ago

@RossTate:

Not having an index makes the instruction size smaller for the extremely common case.

In this thread you seem to be simultaneously arguing that the entire instruction is useless and that it is gonna be extremely common. I have a hard time reconciling both these claims.

RossTate commented 4 years ago

I believe the use cases for the instruction could just as easily be served by throw. But if we're going to have such an instruction, it sounds like it will be used in a very particular pattern, namely the following:

(catch $exn
   (local.set $exn_payload)
   (br_if $exn_handler (some_test (local.get $exn_payload)))
   (rethrow)
)

If that's the case, then that suggests we should have catch rethrow the exception whenever the end is reached (instead of having a rethrow instruction). Once we do that, then catch and unwind are almost identical except that one is only used for certain exceptions and provides the payload. (This change has the added benefit of making the type annotation on try unnecessary, though actually taking advantage of that is a separate discussion.)

Taking things a step further, languages like Python modify the exception as it goes up the stack. Rather than having catch implicitly store the payload for when the exception gets rethrown at the end, we could instead require the end of the block to provide a new payload. At this point, catch is really doing unwinding with a payload (which is a more advanced form of unwinding), so we might as well rename it to unwind_tagged or something.

Putting this all together, the pair of unwind and unwind_tagged has strictly more functionality than the quadruple of unwind, catch, catch_all, and rethrow while at the same time likely being a more compact encoding of the common patterns. (Sorry, just thought of this idea.)

aheejin commented 4 years ago

@RossTate Umm... Can we please not turn everything upside down again and bikeshed new instructions two days after we passed a new spec...? Brainstorming and bikeshedding language design can be fun, but I think we really need to settle down and ship things.

Also, I think the approach when designing new instructions and changing already existing and implemented instructions should be different. We can't spend endless time bikeshedding things, and I think changing existing things require a stronger argument than designing things first time unless there's a clear problem that people agree on.

(By the way, usage or rethrow is not at the end of end of catch most time, as you suggest; it is only the case for destructor blocks, which will be replaced by try-unwind.)

binji commented 4 years ago

My guess is that rethrow will be a very rare instruction, even in a large program using exceptions everywhere, so I wouldn't worry too much about the code size. If we're concerned that rethrow may need an immediate in the future, then we can do what we've done before here, and provide a required 0 immediate, which can be extended later if necessary.

aheejin commented 4 years ago

@binji As I said in https://github.com/WebAssembly/exception-handling/issues/126#issuecomment-694152985, it's not rare at all; currently it's used all the time. The usage 2 will be replaced by try-unwind though. The usage 1 will remain until we actually have filters and two-phase unwinding.

And both 1 and 2 are not rethrowing but actually resuming. (In single-phase they are effectively the same.) C++ does not use rethrow for actual "rethrowing", which is throw;. But other languages might, and that's what @rossberg's use case suggests.

binji commented 4 years ago

Sorry, I mean rare compared to the total number of instructions. But if you have data that would be useful -- what percentage of the instructions in the code section for a medium-sized module are rethrow?

aheejin commented 4 years ago

@binji I just checked a single file that contains ~35000 instructions and rethrow was a bit less than 1%. But I think this will be reduced by more than 80% I think after we switch to try-unwind, so I wouldn't worry about the code size for it either. But without the immediate I think it is possible that the toolchain has to do some unnecessary code transformation, inserting branches and all. (C++ is not likely to need that though.)

RossTate commented 4 years ago

Clarification request: which of these programs are supposed to be equivalent?

  1. (try
      (call $f)
    catch $exn
      (throw $exn)
    )
  2. (try
      (call $f)
    catch $exn
      (rethrow)
    )
  3. (call $f)
aheejin commented 4 years ago

@RossTate 2 and 3 will have the same throwing site. The two programs are different still, so not sure what the definition of equivalence is though.

RossTate commented 4 years ago

Equivalence informs tools which program transformations/optimizations are valid. Inlining also changes throwing sites and stack traces. So it seems like either inlining functions that might throw exceptions is an invalid optimization, or the above three programs are equivalent. If the three programs are equivalent, then every program with rethrow can be equivalently expressed without rethrow.

aheejin commented 4 years ago

Stack traces are usually used for debugging without optimizations turned on. In your example two throwing sites are very close, but if rethrowing happens later in another caller up in the call stack, the stack traces will be vastly different. Your argument that inlining can make things the same is like, all stack traces that ever exist can be the same if all functions are inlined into main. Not sure if that makes sense.

And as I said, I think now is the time to actually make things work and ship things. We made a new spec two days ago. We can still consider changes, but I think they should satisfy one of these:

  1. The current spec has a serious problem
  2. The change is small and incremental, so it does not require a lot of changes in the existing implementations.
tlively commented 4 years ago

Like @RossTate, I'm not totally convinced that rethrow is necessary. However, I also don't think it's a significant complication to the proposal. AFAIK, the implementation in V8 currently embeds stack traces in thrown exceptions. I know those stack traces do not necessarily match the stack trace semantics of any particular source language, but I think it's plausible that they might be useful for debugging regardless of that. I don't know for sure because we don't have any concrete users of this proposal yet. Since those stack traces are a debugging tool that is not visible in Wasm semantics, in practice it's ok to perform optimizations like inlining that change them, even if in theory it is not.

Given all that, I think a reasonable course of action would be to keep rethrow, taking the risk that it turns out not to be useful. A similar line of reasoning suggests that we should add the immediate index to rethrow as well. Again, it may not turn out to be useful, but it's also not a large cost, so the expected value of adding it just in case and moving on with implementation seems higher to me than the expected value of continuing to disagree about it.

RossTate commented 4 years ago

I feel we need to be explicit about the expectations on stack tracing. When I asked language implementers for advice on exception handling, a common sentiment was to not have stack tracing as a default. It apparently substantially slows down the use of exceptions for dynamic control flow. Most implementations of efficient dynamic control flow either use two-phase approaches (so that they can collect the stack only after they find out the exception is uncaught) or do not collect the stack trace (except for possibly keeping the portion of the stack above the last handler/unwinder in tact).

As an example, for stack switching we think it makes sense to build on this proposal as it provides a mechanism for dynamic control flow. But it would be problematic if doing so causes a stack trace every time, an operation that could easily dominate the time of the stack switch itself.

Plus, as I pointed out above, there are other ways to build and track a stack trace than putting it in the exception payload. The technique I mentioned above would give the same stack trace for all three programs (that I believe should be considered equivalent). Another is for the application to make the stack trace an explicit part of the payload (as an externref) and import a function to collect the stack trace upon building the initial exception (and the application's JS layer could provide a different function to the import depending on whether the application is being debugged or not).

aheejin commented 4 years ago

I think I'm repeating myself at this point, so I'll summarize my answers here one more time:

  1. We don't require VMs to embed stack traces. It is not the default and it is up to them. So your speed argument does not hold.
  2. Your example is very small that the throwing site and the rethrowing sites are so close, but in real program they can be in different functions in a call stack.
  3. I think it's time to make things work and refrain from bikeshedding the spec we made two days ago.

And one more thing... this issue was about the immediate argument and I hoped this issue to focus on that. Whether we need rethrow at all is a much bigger request for change, which we have not considered in all three versions of the spec we made, so while I don't think we need to make big changes anymore unless there's a really good reason, I think it would've been better if you made a separate issue post.

RossTate commented 4 years ago

Fair request. I've moved discussion of whether to have rethrow to #127.