WebAssembly / exception-handling

Proposal to add exception handling to WebAssembly
https://webassembly.github.io/exception-handling/
Other
159 stars 34 forks source link

Forwards-compatibility with low-level primitives #108

Open RossTate opened 4 years ago

RossTate commented 4 years ago

In order to discuss forwards-compatibility, it's probably best to first get us all on the same page as to how the primitives in #105 would likely express the constructs in this proposal.

The first thing we need is a "universally understood" mark: mark try-catch-exnref : [exnref] -> unreachable.

Next, translate try ([ti*] -> [to*]) instr* catch instr* end to the following:

escape $hatch {
    mark try-catch-exnref { // [exnref] -> unreachable
        escape-to $hatch // uses the exnref on the stack
    } within {
        instr* // body of try
    }
} hatch [exnref] {
    instr* // body of catch
} [to*] // output type

Then translate throw $event to the following:

exnref.new $event // put exnref onto stack
call $throw_exnref

where

func throw_exnref : [exnref] -> unreachable {
    stack.walk {
        stack.next-mark try-catch-exnref {
            stack.exec-mark // passing exnref on stack
        }
    }
}

and similarly translate rethrow to simply call $throw_exnref.

br_on_exn remains as is, since that's more about reference types than about stacks or control.

One takeaway from this translation is that, theoretically speaking, this proposal is already forwards-compatible with the primitives in #105. However, practically speaking, we care about more than just getting programs to run; we want programs to run correctly, including programs that call other programs or are called by other programs. This is where the importance of stack conventions comes in.

As an example, suppose module A is compiled in the style of #105 whereas module B is compiled in the style of this proposal. If module A calls B, providing B with a callback into A, and B calls that callback that happens to throw an exception (in A), then we have a situation where B's stack frames are sandwiched between A's throw and (presumably) A's catch but where A's exceptions are not implemented using try-catch. Module A would like to let module B clean up its stack, but module A has its own unwinding state it wants to maintain (e.g. Python building the stack trace as it unwinds the stack). How should module A proceed?

In the current proposal, unwinding code always usurps control and then typically rethrows control to the next try-catch-exnref mark. That is, it assumes try-catch-exnref is the sole way to unwind the stack. If, on the other hand, the current proposal were revised to separate unwinding code (e.g. on_unwind instr* do *instr* end) from exception-handling code (still try/catch), then #105 could translate on_unwind to a "universally understood" mark unwinder : [] -> [] and module A's unwinding code could choose to execute unwinder marks it sees and ignore any try-catch-exnref marks. As an added benefit, this would work regardless of how B were compiled (assuming B chose to abide by the unwinder convention), so module A would not need to adjust its implementation strategy to account for B's specific choice of implementation strategy.

Hopefully that gives a since of where the compatibility problem really lies and how the current proposal might be changed to help with forwards-compatibility. It all comes down to what kind of conventions we want to support. More conventions means better compatibility between newer wasm programs and older wasm programs. Fewer conventions means fewer changes, including even no changes. It's possible that the on_unwind separation above is the sweet spot or that the sweet spot is to simply leave the proposal as is. Regardless of what we decide to do, the current proposal is optimized for a particularly common kind of exception semantics, and I think we should and can maintain that optimization for a common case.

aheejin commented 4 years ago

While I have many other questions to your proposal, I'd like to ask a few things first:

If, on the other hand, the current proposal were revised to separate unwinding code (e.g. on_unwind instr* do *instr* end) from exception-handling code (still try/catch), then #105 could translate on_unwind to a "universally understood" mark unwinder : [] -> [] and module A's unwinding code could choose to execute unwinder marks it sees and ignore any try-catch-exnref marks.

RossTate commented 4 years ago

I'd like to ask a few things first

Sounds good! It's a big write-up, so I imagine there are a ton of questions.

What does it mean that proposal A is forward-compatible to proposal B?

If at some point in time B is added, for programs using both A and B there is a "good" way for the features of A and B to interact, and for programs using B there is a "good" way to interact with programs written using A.

And what is a mark?

A (stack) mark in my writeup is a tag (e.g. gcrooter), a chunk of code that is executable within a certain stack frame (with the input-output type associated with the tag), and a region of control (i.e. a code range). While execution is within the region of control, the mark is considered to be on the stack. Does that definition make sense?

Also I think there are many words used in #105 without definition. Are those definitions in a specific version of the GC proposal currently discussed?

105 has no references until it gets to first-class stacks. It's written up to be fairly independent of (engine-managed) GC. Since most of the terminology is new, I tried to demonstrate the meaning of new terms through usage rather than through description. Let me know at any time if there's a term you'd appreciate a descriptive definition of.

What is "unwinding code" in "separating unwinding code from exception-handling code"?

Unwinding code is code that is ideally run when the stack is unwound for whatever reason. The most common cause is an exception being thrown, but that may not be the only cause. (As another example, a finalizer for a first-class stack might unwind the stack to free up resources when the GC realizes the stack is no longer reachable. Similarly, a lightweight thread manager might unwind the stacks of worker threads whose work is no longer needed.)

In the case of C++, destructors are unwinding code, whereas C++ try-catch clauses are exception-handling code.

And #105 does not have a primitive called on_unwind.

My suggestion was to add on_unwind to this proposal. C++ destructors would go into the do body of on_unwind (which is only executed by a forced unwind rather than a standard return, so for older C++ versions this would be the code where exceptions thrown by destructors are caught and converted into a trap). On the other hand, C++ try-catch clauses would use the existing try/catch construct, thereby "separating unwinding code from exception-handling code".

With this separation, #105 would treat on_unwind/do as code with an unwinder : [] -> [] mark, and would treat try/catch as code with a try-catch-exnref : [exnref] -> unreachable mark. If someone does unwinding (through the escape/unwind/hatch construct), they could execute unwinder marks as they unwind the stack. This would have the effect of running the C++ destructors and then returning control back to the original escape/unwind/hatch construct, rather than usurping control.

Hope I managed to be clearer this time!

aheejin commented 4 years ago

Sorry, it is not very easy to understand what you are suggesting. Also, unless you want to replace the whole proposal with #105, I think the details of that can be discussed in the new proposal repo dedicated to #105. It is not my intention to ask all those questions about #105 here.

While it is hard to keep track of how many changes you requested for the proposal in the last three months in this repo and all those emails you sent me, could you explain this change without referring to other primitives in #105? I think the semantics of on_unwind is not well defined in the first place. In on_unwind instr* do *instr* end, what is the first set of instr* and what is the second set? What is the semantics of this instruction?

Also, what is the difference between the instructions in catch block from the ones in on_unwind? Both of them run when the stack is unwound.

RossTate commented 4 years ago

What is the semantics of this instruction?

For on_unwind instr1* do instr2* end, execute instr1*. If during the execution of those instructions an exception is thrown and not caught by those instructions, then execute instr2* before further propagating the exception up the stack. The type of instr2* is always [] -> []. If the type of instr1* is [ti*] -> [to*], then the type of on_unwind instr1* do instr2* end is also [ti*] -> [to*].

Also, what is the difference between the instructions in catch block from the ones in on_unwind? Both of them run when the stack is unwound.

With the separation I'm suggesting, catch blocks are not executed when the stack is unwound; they are only executed when an exnref is thrown. That might sound contradictory because currently throwing an exnref is the only way to cause the stack to unwind, but #105 would provide other ways to unwind the stack. Note that the do block for on_unwind does not take an exnref as input, as other ways to unwind the stack might not involve an exnref.

dschuff commented 4 years ago

If I understand correctly, it sounds like: 1) You believe the EH proposal in its current form (let's call it "MVP EH") would not conflict in principle with your ideas for #105, but there could be compatibility issues when programs compiled for MVP EH and those compiled for #105 (in the absence of other mediation methods such as IT) want to run their respective cleanups when unwinding an interleaved stack. 2) adding to MVP EH a way to pull destructor/force-unwinding code into a separate block would allow such modules to cleanly interact with modules for #105.

I think this makes sense and is generally encouraging (at least to me, since I actually do kind of like the general idea behind the low-level stack-usage ideas in #105, but I also don't want to delay an MVP EH scheme).

I'm also not opposed in principle to adding something on to the MVP scheme in order to get better compatibility with a low-level scheme. But I also think that would end being used in a "V2" C++ exception ABI. For example, even if we could magically agree on exactly the right design for on_unwind (including having enough agreement on #105 that it was worth pursuing in its own right, such that it made any sense to consider compatibility with it), I think we'd probably still go ahead and finish productionizing the toolchain code for the V1 ABI (probably without actually declaring the ABI stable) and get it in the hands of our users, and then follow up with an improved version after that. Actually we may very well end up with that kind of ABI transition/stabilization even in the absence of spec additions. We took that approach with linking and other toolchain capabilities already, and while it means a longer road to the "final"/eventual state, its a much shorter road to a state that still serves a lot of users well, and we learn a lot on the way that informs the final state.

Accordingly I think it probably makes the most sense not to tack anything onto the MVP EH just yet, but to move along the usual process with #105 (e.g. the early CG stages where we consider the problem we are trying to solve and get consensus that we want to solve it), and keep MVP compatibility as a consideration there. That proposal can include an addition to MVP EH such as on_unwind, and if we want, we can easily peel it into its own proposal that can run ahead of the main one, so we can ship it and users can start compiling interoperable-MVP modules even before the final one ships.

RossTate commented 4 years ago

Your summary of my perspective sounds accurate. Thanks for the thorough analysis! I think I understand your reasoning as well. There's one concern I want to focus on that you might be intending to be addressed by your plan, but I just want to make it explicit. I'll use V1 for this proposal, V2 for this proposal plus some on_unwind, and V3 for #105.

My concern is that by that the time V3 comes around, there might be a lot of V1 programs out there that V3 programs need to play nice with. When a V3 module A calls some other module B and gives B a callback, module A doesn't know whether B is V1 or not, yet A is expected to be responsible for cleaning up B's stack. So if the callback throws an exception, it will need to walk up the stack and look for both V3-style handlers and V1-style handlers (and on_unwind, but that's easy). If it sees a V1-style handler, it knows it's going to lose control and so has to wrap up its stack-walking state into an exnref in order to execute the V1 cleanup code. That V1 handler in B will then typically rethrow the exnref, and eventually the exnref will reach the point where module A called out to B. So before calling out to B, module A will have to wrap the call with a V1-style try-catch in order to get the exnref and extract from it the stack-walking state. Then it will initiate a new stack walk looking for V3 handlers and running on_unwind. That's a lot of ugliness to support compatibility with V1 code (which goes away if you only expect compatibility with V2 code because on_unwind does not usurp control and you can ignore V1-style handlers).

So I think what you are suggesting in your plan is that, with appropriate signaling to the community, we can expect enough V1 programs to have been recompiled into V2 programs such that V3 programs won't feel pressured to jump through the above compatibility hoops. You know the C++ WebAssembly community much better than me, so I'm happy to defer to your judgement on that matter. Can you confirm that that's the expectation you had mind, or clarify if otherwise?

lukewagner commented 4 years ago

@RossTate Because, in general, module B could have a legitimate try { callback() } catch(...) { rethrow; }, it seems like, if module A wants to be maximally interoperable (even if V1 and V2 never existed and we went straight to V3), module A would need to wrap all its calls to imports with try blocks because it would have to be maximally conservative. Yes?

This "wrap every call to an import" situation is similar to a long-standing concern I've had that some modules/languages will simply not be compiled with exceptions enabled, and they will inevitably be composed with moduels that do and then bad things will happen (infrequently, in subtle ways). After talking about this in #68, it seems like we can't solve this in pure wasm b/c we don't have enough context to know what's a boundary between languages vs. just different DLLs in the same language. So, true to my idiom, my current thinking is that we should mitigate this problem in Interface Types by having any exception that unwinds into an adapter call turn into a trap. This puts the burden on modules that do use exceptions to catch those exceptions and convert them into something explicit in their interface (say a variant return in the shape of a Result). In the absence of such a design, every non-exception-safe module would need to defensively wrap all calls to imports with a try, which seems unfortunate.

RossTate commented 4 years ago

@lukewagner Can you clarify your example? The filtered catch is not expressible in V1/V2, and in a V1/V2-only world try { callback() } catch(...) { rethrow; } is equivalent to just callback(). In the meanwhile, I realize I should clarify something about my example that might address your concern.

In my example, module A knows that the exception thrown by the callback cannot be understood by module B (say because it pertains to some unexported stack mark). That's why if A knew that B were V2+, in which try-catch is only used to catch exceptions and not to unwind the stack, then the problem could be solved by ignoring try-catch in B entirely (as br_on_exn will always fail).

I see what you're saying about Interface Types, but research on exceptions (or more generally algebraic effects) has found that this callback pattern respects the "share nothing" principle behind Interface Types. Here is a paper that comes to mind on this topic. I'm happy to discuss that paper and Interface Types, but it probably merits a separate thread. Also, V3 does not have an explicit notion of exception throwing, since that's boiled down into a bunch of other primitive steps, only the last of which is unwinding. I think the construct you'd want in #105 is stack-wall, which would prevent a called module from walking up (and thereby throwing into) the callee's stack.

dschuff commented 4 years ago

So I think what you are suggesting in your plan is that, with appropriate signaling to the community, we can expect enough V1 programs to have been recompiled into V2 programs such that V3 programs won't feel pressured to jump through the above compatibility hoops. You know the C++ WebAssembly community much better than me, so I'm happy to defer to your judgement on that matter. Can you confirm that that's the expectation you had mind, or clarify if otherwise?

Well I certainly expect that there will still be V1 programs around for much longer than we'd prefer (certainly that's been my experience in past platforms). The emscripten community has actually been pretty good about uptake of new tools generally, but as the platform matures and more users trust us with giant cross-platform codebases that fund their businesses, I'd expect them to get more conservative on average. But: 1) we do still have plenty of improvements in the pipeline to offer users to incentivize them to upgrade. 2) we are still quite a ways away from cross-language integration that is so seamless that users can just pull random things from npm or whatever and have them wired directly together without e.g. interesting JS code in between. So for now, users are anyway going to have to think carefully about this integration layer as a primary part of their development work, and this kind of issue is probably not even their most pressing one.

So I think we will be able to give our users some kind of help on this when they need it, whether that's IT, another C++ ABI, etc.

lukewagner commented 4 years ago

@RossTate That paper looks promising; I'll take a look! My try { callback() } catch(...) { rethrow; } snippet was just suggestive of code in module B that unconditionally catches exceptions and rethrows them (such that the outer caller A would still need to catch the exnref). But it sounds like you're saying that's not possible?

RossTate commented 4 years ago

@lukewagner Supposing V3 is willing to assume B is V2+, then V2's try-catch clauses would be completely ignored. Now that I understand your intent, though, that doesn't circumvent the problem entirely though. That is, B could have a on_unwind clause that defies convention and usurps control. So let's walk through what happens in that scenario. A's callback does a stack walk looking for a stack mark that B does not have, thereby entirely ignoring B's portion of the stack until it gets to a corresponding stack mark in the module A code that called B in the first place. Executing that stack mark will then redirect control to some corresponding place up the stack that then chooses to execute the unwinding marks on the stack. That is, in this example A and B are trying to get along and trust each other to follow conventions (supposing infrastructure like on_unwind is in place for supporting those conventions). Module B could always betray A's trust by having an on_unwind that usurps control, but that's A's fault for misplacing trust in B. The trust/security model in #105 is actually based on an insight I got from @aheejin in https://github.com/WebAssembly/exception-handling/issues/101#issuecomment-589448011, namely that the caller should always have more privileges than the callee. Because A's handler is higher up the stack than B, it gets to choose how to unwind B's stack. By choosing to execute B's unwinders, A is trusting B to follow conventions. Alternatively, A could choose to skip B's unwinders entirely if it has no such trust. Per @tlively's prompt in https://github.com/WebAssembly/exception-handling/issues/101#issuecomment-589443129, we (the SOIL crew) are hoping to (eventually) formalize this security principle, show that aligns with existing security standards, and prove that it holds for #105. (Sorry, that took longer to explain than I expected. If you have more thoughts, it might be better to follow up elsewhere.)

@dschuff I also definitely expect there to be V1 programs still around. It's the dynamics of community pressure that I'm concerned about. It sounds like you think the dynamics will be that V1 will be pressured to upgrade to V2+ rather than V3 be pressured to jump through hoops to unwind V1 stacks. Your reasoning sounds good to me, and anyways y'all have much more insight into the community with which to make that assessment. So supposing @aheejin is fine with this plan, I'm happy to close this issue. Given that, I'm wondering how y'all would like to proceed with #105 then, but it's probably better to discuss that in #105 instead of here.