WebAssembly / exception-handling

Proposal to add exception handling to WebAssembly
https://webassembly.github.io/exception-handling/
Other
154 stars 33 forks source link

Add exception specifier to function signature #68

Open PoignardAzur opened 5 years ago

PoignardAzur commented 5 years ago

Exception specifiers are a common-enough feature in strongly-typed languages.

These specifiers have a few advantages, that warrant integrating them into WebAsm:

More generally, there's an argument to be made that whether or not a function can interrupt the control flow of your program should be a part of its API, and therefore its signature.

tlively commented 4 years ago

I think that this is useful for some languages where exceptions are declared or otherwise part of the type of a function, but would be a showstopper for other languages where it is not possible to determine in general the compete set of exceptions a function may throw after dynamic linking is taken into account. I agree that it would be very useful to be able to reason about thrown exceptions and automatically convert them to other formats when necessary, so this may be a good thing to raise in the interface types proposal.

rossberg commented 4 years ago

Also keep in mind that Wasm isn't a user-facing language and the role of the Wasm type system is not to guide programmers. It's only purpose is to help engines to ensure memory safety while enabling efficient compilation. It's not clear what exception annotations would add to that.

AFAICS, the same applies, more or less, to interface types.

aheejin commented 4 years ago

I don't think we can practically analyze the set of types of exceptions a specific function can throw unless every function signature embeds thrown exception signature within it. And changing the function signature format altogether would not be something we want, would it?

Having said that, adding some specifier like noexcept/nothrow can be doable and even compatible with the MVP, because they are conservative so we don't need exact analysis, and we can take hints from langauge itself (such as C++'s noexcept). And it's fine that all the MVP functions don't have it, it's a conservative hint anyway. Maybe we can add it if we it is shown to be useful for optimizations from the VM. (As @rossberg said, this hint wouldn't be very useful for users)

aheejin commented 4 years ago

I think that this is useful for some languages where exceptions are declared or otherwise part of the type of a function, but would be a showstopper for other languages where it is not possible to determine in general the compete set of exceptions a function may throw after dynamic linking is taken into account.

I'm not sure if it would be doable even with compiled languages with static linking. Note that wasm exception types do not correspond with a language's internal types, such as SomeClass* (they are all gonna be i32 in the end for C++, for example)

I agree that it would be very useful to be able to reason about thrown exceptions and automatically convert them to other formats when necessary, so this may be a good thing to raise in the interface types proposal.

What would it be useful for? Could you provide some examples?

lukewagner commented 4 years ago

I agree that when we take the perspective of "we're just an ISA for a source language" that exception specifications don't add anything that the source-language compiler couldn't do itself. But I think there's a bit more to the story.

First, two observations:

Now let's imagine, in the future we're slowly moving toward where a single app can contain wasm code from multiple packages, each compiled by possibly-different toolchains, that we have a module A, which was compiled with -fno-cxx-exceptions (or was compiled before exceptions existed), and A calls an export of module B, which was compiled with -fcxx-exceptions, and B throws an exception: what happens?

I expect the proposal today says that the exception unwinds through A as an exception and, since it's just an exception, some other module (JS or wasm) that called A could catch the exception and expect to call an export of A in the future. A, not supporting exceptions, could be in a corrupt or leaking state, however, so we can say this is simply a bug, an invalid combination of A and B, and the bug will manifest when A crashes b/c it's state is corrupt or it leaks to death. But this seems unfortunate and it would be nice for the ecosystem as a whole if this bug could be caught earlier.

So what if we specify that:

This wouldn't include any static validation rules that non-"throws" functions must wrap calls to "throws" functions in a try block; the enforcement would be dynamic (and I think also "free" during non-exceptional execution, because this can just be a bit on the unwind metadata).

We could also say that "throws" is ignored by type equality/subtyping, so that this "throws" effect doesn't create widespread annoyance (like the need to wrap "throws" functions so that they were importable/call_indirect-able with non-"throws" types). (I'm not positive this is a good idea.)

Without fancy inter-procedural analysis, fcxx-exceptions would set the "throws" effect for all functions in the module and, by being the default option, pre-exception-handling wasm and -fno-cxx-exceptions wasm would continue to not set the throws flag and get the early-error behavior if one was thrown. Thus one could imagine replacing the "throws" effect on functions with a "supports exceptions" flag on the module as a whole... but thus far we've avoided module flags like this and technically I can imagine use cases where you want the flag per-import.

tlively commented 4 years ago

I agree that it would be very useful to be able to reason about thrown exceptions and automatically convert them to other formats when necessary, so this may be a good thing to raise in the interface types proposal.

What would it be useful for? Could you provide some examples?

For example, perhaps the binding layer could transform a C++ exception thrown from one module into a Rust Result return type in another module. Or more simply, transform a C++ exception into a C error code return. Or more generally transform a language A exception into a language B exception. Of course the bindings layer would have to be very expressive and have its own abstract "Exception" type(s) that real exceptions could be lifted to and lowered from, so this may be a rather complex feature of the interface types proposal. But it would definitely be useful!


I expect the proposal today says that the exception unwinds through A as an exception and, since it's just an exception, some other module (JS or wasm) that called A could catch the exception and expect to call an export of A in the future. A, not supporting exceptions, could be in a corrupt or leaking state, however, so we can say this is simply a bug, an invalid combination of A and B, and the bug will manifest when A crashes b/c it's state is corrupt or it leaks to death. But this seems unfortunate and it would be nice for the ecosystem as a whole if this bug could be caught earlier.

I agree that if an exception silently bubbles up through A and is caught higher in the call stack, this could cause problems if A required destructors to be run. However, that just means that the toolchain for A should make sure to catch all exceptions, even those it does not understand, and run destructors before rethrowing. This makes modules compiled with and without the exceptions feature incompatible, but that's the kind of problem we already solve in the toolchain, for example making it a link error to try to unsafely link objects compiled with and without the atomics feature. Since these are tool problems, I'm not sure its worth the extra spec complexity to do runtime checks in engines as well.

lukewagner commented 4 years ago

@tlively For that to work, I think everyone would have to use the same toolchain and agree on a meta-convention for identifying and early-error on incompatibilities. Without the aid of JS, I'm not even sure what a pure-wasm meta-convention would be.

Since these are tool problems, I'm not sure its worth the extra spec complexity to do runtime checks in engines as well.

Agreed there is some additional complexity, but I think it could have minimal practical implementation complexity if designed as I proposed. Also, as I said before, I don't think there would be any extra dynamic checks required on non-exceptional control flow, and the cost of the extra check during unwinding would be negligible, I expect.

tlively commented 4 years ago

We have the target features section specified in the tool-conventions repo, which already functions as such a meta-convention. It's a feature of the WebAssembly object file format, but it's not tied to relocations or anything so it could also be adopted by any sort of WebAssembly loader, even if it doesn't use object files directly. Basically toolchains have to solve this problem anyway whether or not engines are specified to do any checking, so the additional benefit of having engines trap on errors seems minimal.

rossberg commented 4 years ago

@lukewagner:

We could also say that "throws" is ignored by type equality/subtyping, so that this "throws" effect doesn't create widespread annoyance

In that case it would be completely meaningless to put it on function types. Instead, it would more adequately be an annotation on function definitions that is simply a shorthand for a catch-all-and-trap around the function's body (in a special case that does not break tail calls).

lukewagner commented 4 years ago

@tlively I think you're thinking in terms of dynamic linking, where toolchains have to collaborate tightly. When combining wasm modules created by different toolchains in the more loosely-coupled context of a package manager (esp. using a more-declarative loader like ESM), I don't think there is a single loader that is in a position to check for mismatches. (How would it be implemented? And outside a JS environment? A conventional custom section could be elevated to the role of a standard that is checked by the engine, but then we're standardizing so it's a question of what's the best to standardize.)

@rossberg What about imports? What matters is the caller's expectation, not the definition of the callee.

tlively commented 4 years ago

@lukewagner If a toolchain wants to defend against untrusted imports throwing exceptions, it can already do that by catching them then trapping or cleaning itself up and rethrowing. No further collaboration is necessary. Your proposal goes further by making the trapping behavior default for MVP modules, and perhaps that's helpful in the short to medium term, but in the long run it won't be necessary. I also worry that the core spec is the wrong layer of abstraction for this problem. Since this is an issue of communication between modules, wouldn't interface adapters be a better place to solve it?

lukewagner commented 4 years ago

It seems like this would cause, in practice, for defensive purposes, every wasm module built with -fno-cxx-exceptions to emit try/catch around every import call. I suppose that's possible, but it seems a bit unfortunate. Is that what Emscripten would do by default?

rossberg commented 4 years ago

@lukewagner:

What about imports? What matters is the caller's expectation, not the definition of the callee.

Subtyping also applies to imports, so if it can ignore throw annotations, then their presence or absence on imports likewise provides zero information. Also, why would the expectations for calling an import be any more relevant than for calling a funcref?

mstarzinger commented 4 years ago

@lukewagner:

This wouldn't include any static validation rules that non-"throws" functions must wrap calls to "throws" functions in a try block; the enforcement would be dynamic (and I think also "free" during non-exceptional execution, because this can just be a bit on the unwind metadata).

I agree that this model sounds like it won't introduce any runtime overhead for non-exceptional execution, at least for engines that use stack unwinding without explicit checks at call sites.

@rossberg:

What about imports? What matters is the caller's expectation, not the definition of the callee.

Subtyping also applies to imports, so if it can ignore throw annotations, then their presence or absence on imports likewise provides zero information. Also, why would the expectations for calling an import be any more relevant than for calling a funcref?

Wouldn't that imply that it is essentially impossible to catch an embedder exception (e.g. thrown from JavaScript by an imported JavaScript function) in wasm? Since imported functions cannot be marked as "throws", all exceptions they throw will be converted to traps. I am not arguing for/against this, just want to double-check that I am understanding the implications correctly.

rossberg commented 4 years ago

@mstarzinger, I think whether JS exceptions are mapped to Wasm exns or to traps is a separate question.

When done correctly, effect annotations like "throws" ought to be purely a type-checking mechanism, and as such, should not affect runtime behaviour, only restrict what's a valid program (though Luke seems to suggest some sort of coercive behaviour).

So I'm not implying that exceptions should be converted to traps. I'm merely saying that the type system would not assert anything about their presence.

Because, in fact, if it did, then we would have to require JS functions to only match imports with throws-annotation, since there is no way to validate that they do not throw. That would be a backwards-incompatible change, however.

lukewagner commented 4 years ago

@rossberg My point was that what matters is the caller's expectation and putting a flag on a function definition that was a shorthand for "catch-all-and-trap around the function body" seemed to describe the callee more than the caller. But on second thought, I suppose such a flag describes the caller's expectation as well; the only difference is whether an exception is converted to a trap when unwinding into a frame vs. unwinding out of a frame; and if there isn't a try inside the function body, there's not really a difference. And I do see the point that, if there's not any static validation rules, it's not really part of the "type".

Another (inverted) framing of the flag could be "this function is exception-safe".

aheejin commented 4 years ago

@PoignardAzur

  • If a language uses a monadic or state-machine error model (eg Rust's Result<T, E> type), exceptions specifiers would allow them to interface with functions that may throw exceptions, by automatically transforming int fooBar() @mayThrow into Result<int, GenericException> fooBar().

Is this compatible with our exception proposal too? Our EH proposal's try-catch-based exception model does not return to the same place when an error occurs. The control flow is transferred to a catch clause, which may not even be in the same function.

  • If an interpreter decides to implement a "branch at every call site" strategy for functions that will frequently throw (see also #19), it's very important to be able to tell the interpreter which functions won't ever throw to avoid unnecessary overhead.

Could you elaborate how can we map languages that use try-catch based exception (e.g. C++) to this "branch at every call site" strategy, and why does it have less overhead than the stack unwinding based scheme in case we frequently throw? (I don't know much about this strategy or existing implementations of it, so I'm asking)

  • Especially in C++, noexcept specifiers can enable both compiler optimizations (better control flow analysis) and user optimizations (eg STL move optimizations).

I don't think adding specifiers to wasm function signatures would benefit language optimizations, such as C++'s noexcept-related optimizations. This happens before we generate wasm instructions, and this relies on not wasm signature. This kind of optimizations happen in the frontend.

aheejin commented 4 years ago

@lukewagner

Now let's imagine, in the future we're slowly moving toward where a single app can contain wasm code from multiple packages, each compiled by possibly-different toolchains, that we have a module A, which was compiled with -fno-cxx-exceptions (or was compiled before exceptions existed), and A calls an export of module B, which was compiled with -fcxx-exceptions, and B throws an exception: what happens?

I expect the proposal today says that the exception unwinds through A as an exception and, since it's just an exception, some other module (JS or wasm) that called A could catch the exception and expect to call an export of A in the future. A, not supporting exceptions, could be in a corrupt or leaking state, however, so we can say this is simply a bug, an invalid combination of A and B, and the bug will manifest when A crashes b/c it's state is corrupt or it leaks to death. But this seems unfortunate and it would be nice for the ecosystem as a whole if this bug could be caught earlier.

Can't this happen in MVP already? For example, when the call stack is like A (JS) -> B (wasm) -> C (JS) in MVP, when an exception is thrown in C and caught in A, B is in a corrupted state.

So what if we specify that:

  • Function types are extended with an optional "throws" effect that simply says "this function may throw". The default (inherited by all MVP wasm code) is that the effect is not present.

This feels rather like 'supports EH' flag than 'throws' then. I think there are also cases in which linking of modules with different feature flags doesn't make sense so we error out in the linker. How would this case be different from those other cases? Maybe cc @tlively

It seems like this would cause, in practice, for defensive purposes, every wasm module built with -fno-cxx-exceptions to emit try/catch around every import call. I suppose that's possible, but it seems a bit unfortunate. Is that what Emscripten would do by default?

Compiling with -fno-exceptions does not generate try/catch around every call. Code simply does not know about exceptions in that case, like C. Emscripten currently support EH by basically wrapping every invoke by a wrapper function that calls out to JS code which throws a JS exception if necessary, which is very slow. This EH feature is disabled by default because it's slow. But with it disabled it behaves the same as the host toolchain: it does not know anything about exceptions. No try-catch.

lukewagner commented 4 years ago

Can't this happen in MVP already?

Technically, if a JS exception unwinds into wasm today, the core wasm host function call rules say that that is a trap, and, as a general rule, after a trap, a wasm instance should be considered to be in a corrupt state and not reentered, thus, this is not actually an allowed thing today. (Yes, I know that the current impl of EH in Emscripten uses JS exceptions to unwind wasm in exactly this manner, but the JS code in that case is intimately coupled to the wasm code, so it's allowed to play with fire (traps).)

This all changes with the exception-handling proposal, though: presumably both JS and wasm exceptions both turn into exceptions, not traps when they unwind from a cross-instance call. Thus, allowing an exception to unwind from module A into module B is, in general, a valid (non-trapping) thing to do.

That all being said, after some discussion with @fgmccabe, it does seem like this is strictly a concern at (shared-nothing) module interface boundaries, not something one would want to use in a fine-grained manner within a core wasm module, and thus probably the "right" solution is to not have this in core wasm but instead to put some form of exception specification into the module interface type, with the net effect being a default convention that, when you don't support exceptions but call a function that declares it might throw, the call gets wrapped with a try block where the catch traps.

So I'm happy to close this issue; thanks for the discussion.

aheejin commented 4 years ago

Can't this happen in MVP already?

Technically, if a JS exception unwinds into wasm today, the core wasm host function call rules say that that is a trap, and, as a general rule, after a trap, a wasm instance should be considered to be in a corrupt state and not reentered, thus, this is not actually an allowed thing today. (Yes, I know that the current impl of EH in Emscripten uses JS exceptions to unwind wasm in exactly this manner, but the JS code in that case is intimately coupled to the wasm code, so it's allowed to play with fire (traps).)

This all changes with the exception-handling proposal, though: presumably both JS and wasm exceptions both turn into exceptions, not traps when they unwind from a cross-instance call. Thus, allowing an exception to unwind from module A into module B is, in general, a valid (non-trapping) thing to do.

You're right, and we should change this JS API part too. But I'm thinking that we might not catch RuntimeError, which includes traps, and maybe catch other 'normal' thrown foreign exceptions.

That all being said, after some discussion with @fgmccabe, it does seem like this is strictly a concern at (shared-nothing) module interface boundaries, not something one would want to use in a fine-grained manner within a core wasm module, and thus probably the "right" solution is to not have this in core wasm but instead to put some form of exception specification into the module interface type, with the net effect being a default convention that, when you don't support exceptions but call a function that declares it might throw, the call gets wrapped with a try block where the catch traps.

I'm not very sure what this means. Do they want throws specifier not on function signatures but on module interfaces instead? How is it computed? Is it something like 'supports EH', so that all modules compiled witt wasm EH will have it? And what is it gonna be used for? And I'm not sure where we should put those try-catch at the module boundary?

lukewagner commented 4 years ago

Sorry for not being more clear: I mean throws specifiers on the functions in a module interface type (which, importantly, are a superset of core wasm function types, and thus can be enriched with a throws specification). It's up to the toolchain for how to emit these throws specifications, but I would imagine that, as a minium -fno-cxx-exceptions would not declare anything thrown (guaranteeing dynamically that no exceptions were thrown by calling the export), and -fcxx-exceptions would add a throws(...) implying anything could be thrown.

There is actually value in saying something more precise than throws(...) at a module interface boundary: let's say I throw a std::string in module A and wish to catch that string in module B and I'm using shared-nothing-linking. Then it's necessary at the unwind boundary between module A and B to copy the std::string from A's linear memory into B's linear memory. This can be accomplished by declaring throws(runtime_error(string)), allowing the same lifting/lowering of the string exception payload as done for normal params/results. How this is surfaced to the source C++ program is a separate question, of course, especially now that dynamic throws specifications are deprecated in C++.

I think the high-order bit is that, particularly with shared-nothing-linking, exceptions are a very meaningful part of a module's interface.

rossberg commented 4 years ago

The fundamental problem with throws-annotations -- which essentially are a form of effect type system -- is that they are largely impractical in the presence of anything higher-order, like objects or function references, unless you also introduce (first-class) effect polymorphism. That's the problem languages like C++ and Java kept bumping into and the reason why they introduced ad-hoc escape hatches that ultimately made the whole thing more or less pointless and unloved. I would question that it's worth going there, even for interface types.

fgmccabe commented 4 years ago

That amounts to allowing a function signature

All a,b,e (a)=> b throws e

If I understand correctly. So, why not?

On Tue, Oct 22, 2019 at 10:09 PM Andreas Rossberg notifications@github.com wrote:

The fundamental problem with throws-annotations -- which essentially are a form of effect type system -- is that they are largely impractical in the presence of anything higher-order, like objects or function references, unless you also introduce (first-class) effect polymorphism. That's the problem languages like C++ and Java kept bumping into and the reason why they introduced ad-hoc escape hatches that ultimately made the whole thing more or less pointless and unloved. I would question that it's worth going there, even for interface types.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/WebAssembly/exception-handling/issues/68?email_source=notifications&email_token=AAQAXUHKPTLMSJ7OAV6DLPDQP7L65A5CNFSM4GDOV622YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOECACKCI#issuecomment-545269001, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAQAXUASHEVTQCD7LZ63VJTQP7L65ANCNFSM4GDOV62Q .

-- Francis McCabe SWE

rossberg commented 4 years ago

Well, so far polymorphism has not been on the table for Wasm (though I think GC types will eventually necessitate it). Let alone first-class polymorphism, which is what you would need for funcrefs.

Furthermore, how would you enforce these annotations in the funcref case? You'd potentially need to wrap every funcref at the interface boundaries into a function inserting the appropriate try handler.

lukewagner commented 4 years ago

The intention is already that funcrefs are wrapped at interface boundaries producing a semantically distinct funcref value on the other side. This is essential because the adaptations performed on params and results are highly effectful/visible (not just enforcing type contracts like the gradual typed coercion calculii) so the adapted function is fundamentally a different function. Given that, it's easy to add in the dynamic throws-specification checking.

rossberg commented 4 years ago

Okay, that's interesting. :)

How will that work if two modules share a mutable funcref global or a table? A funcref can tunnel through those without giving the interface layer any chance of wrapping it. Or is the intention that interface types do not support stateful im/exports?

Similarly, one can tunnel a function through type anyref and then (e.g. with the GC proposal) downcast on the other end. How would that be handled or prevented?

Edit: Answering the second question myself, that case may not be a problem, at least none specific to functions. I suppose interface types simply do not say, promise, or prevent anything regarding the ability to downcast.

fgmccabe commented 4 years ago

The interface types would not allow tuneling without wrapping. On the other hand tuneling with wrapping is very much on the agenda

On Wed, Oct 23, 2019 at 7:49 AM Andreas Rossberg notifications@github.com wrote:

Okay, that's interesting. :)

How will that work if two modules share a mutable funcref global or a table? A funcref can tunnel through those without giving the interface layer any chance of wrapping it. Or is the intention that interface types do not support stateful im/exports?

Similarly, one can tunnel a function through type anyref and then (e.g. with the GC proposal) downcast on the other end. How would that be handled or prevented?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/WebAssembly/exception-handling/issues/68?email_source=notifications&email_token=AAQAXUDCRV3KJRAGVSJAA4TQQBP55A5CNFSM4GDOV622YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOECBWBAI#issuecomment-545480833, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAQAXUBSTMG2ZYSOCY2XOW3QQBP55ANCNFSM4GDOV62Q .

-- Francis McCabe SWE

rossberg commented 4 years ago

But the question is: how would they prevent it? The only way I can see is by not allowing higher-order state in an interface, i.e., no mut globals or tables of function type.

lukewagner commented 4 years ago

As currently described, specifying an interface adapter does not force you to ensure any sort of impermeable membrane, so if you want to export a (global (mut funcref)) directly from the adapted module, go nuts, there is no adaptation provided or assumed; if a funcref ultimately gets passed taking i32 memory offsets to the "wrong" memory, that's on you. But, the convention established by toolchain defaults should be that you're not exporting shared-mutable anything (memories, tables, globals), which is, after all, implied by the name "shared-nothing linking" (which is I think a good basis for an interoperable multi-language/toolchain package ecosystem).

aheejin commented 4 years ago
fgmccabe commented 4 years ago

On Thu, Oct 24, 2019 at 8:17 PM Heejin Ahn notifications@github.com wrote:

-

Where would interface types take that throws info from then, if not from function types? As I said earlier, it is not practical to transitively scan the whole call graph to compute that throws signature for every function. If it can be conservative, we can attach throws to every function compiled with the EH feature though.

It is a misconception that interface types must come from the source language. Just as likely is that the source library is viewed as an implementation of the signature expressed as interface types.

-

-

About specifying exact types to throws in the interface types like throws(runtime_error(std::string)), many languages don't throw a raw type data itself. For example, C++ throws a i32 pointer, which is __cxa_exception* or something, which points to the buffer that contains the payload. I think other languages have their own exception class for that. Are we gonna make adapter functions for these language-specific exception types? And can we do that for pointers to those types too? If we're doing that, the adaptor should include translation rules for the buffer contents, which might contain basically anything. Anyway, the only type any C++ function can throw is currently i32.

see above And if I understand correctly; yes thrown values will also need lifting and lowering; just like any value. More generally module interfaces represent an ownership boundary; including the potential for a change of language. If a c++ exception is to survive the boundary it will need to be copied in any case: the memory spaces of two modules cannot be assumed to be the same.

-

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/WebAssembly/exception-handling/issues/68?email_source=notifications&email_token=AAQAXUB667E45TPGPM45QULQQJQOBA5CNFSM4GDOV622YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOECHBUFI#issuecomment-546183701, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAQAXUE7IWYZCMWGCIIERRLQQJQOBANCNFSM4GDOV62Q .

-- Francis McCabe SWE