WebAssembly / exception-handling

Proposal to add exception handling to WebAssembly
https://webassembly.github.io/exception-handling/
Other
161 stars 35 forks source link

Q: Accessing wasm exception contents externally #193

Open bkotsopoulossc opened 2 years ago

bkotsopoulossc commented 2 years ago

I played around with the current wasm exception handling implementation in latest chrome, using C++, emscripten and JS. I was able to throw an exception in C++ and catch it in JS, but it's unclear to me how to access the contents of the C++ exception, like the message string.

Example:

// In C++

void foo() {
  throw std::runtime_error("hello world");
}

// In JS

try {
  Module.foo()
} catch (e) {
  if (e instanceof WebAssembly.Exception) {
    console.log(e)
  }
}

The log will print something like this, with the exception cause hardcoded to wasm exception:

WebAssembly.Exception: wasm exception
  at someFunc (URL)
  at someOtherFunc (URL)
  ...

I see there is the getArg API on the exception object, where I can pass in a WebAssembly.Tag, but I'm not exactly sure what this API is for.

Is there any way in the current spec, or in any upcoming additions, to get access to the underlying C++ exception like this from JS, to be able to do some simple operations on it like log it?

I imagine some API as follows:

  if (e instanceof WebAssembly.Exception) {
    // get access to a pointer on the wasm heap that represents the C++ exception object
    const wasmExceptionPtr = e.getArg(new WebAssembly.Tag({ parameters: ["some_special_syntax"] }), 0)

    // call some custom hook back into wasm to access the underlying exception object based on the pointer
    const message = Module.getExceptionMessage(wasmExceptionPtr)

    console.log(message
  }

I've read through some of the related issues, like https://github.com/WebAssembly/exception-handling/issues/183, https://github.com/WebAssembly/exception-handling/issues/184, https://github.com/WebAssembly/exception-handling/issues/189, https://github.com/WebAssembly/exception-handling/issues/190 but didn't find an answer.

Thanks in advance.

aheejin commented 2 years ago

What getArg returns is an i32 value, which is a pointer to a C++ exception, which is returned by __cxa_allocate_exception in libc++abi.

The real C++ value thrown (what you can get from catch (someclass e)) is obtained by calling __cxa_begin_catch with the pointer later in Wasm code. This is the case in other native platforms like x86 too. The difference between native platforms and Wasm is how the personality function is called, but I'm not sure if this detail is relevant here.

So in short, std::runtime_error or a field within it is a C++-internal thing and Wasm doesn't have any knowledge about the structure of std::runtime_error. And I don't think it is possible to add something to the Wasm EH spec that can access the internal of a specific language's specific class. But we can support a similar thing from the toolchain level. I think emscripten-core/emscripten#6330 is a relevant discussion. One of the things suggested there was, the toolchain can add what() of a C++ exception object to the JS object's optional message field. But we haven't decided on anything yet.

bkotsopoulossc commented 2 years ago

What getArg returns is an i32 value, which is a pointer to a C++ exception, which is returned by __cxa_allocate_exception in libc++abi.

Great, as long as we can get the pointer, that should be sufficient. We can then call back into wasm with the pointer, and cast it to access the underlying object. But it's a little unclear to me what parameters need to be used for the tag - is this an implementation detail or a convention?

I am assuming maybe wasmException.getArg(new WebAssembly.Tag({ parameters: ["i32"] }), 0) but that is just a guess.

But we can support a similar thing from the toolchain level. I think emscripten-core/emscripten#6330 is a relevant discussion. One of the things suggested there was, the toolchain can add what() of a C++ exception object to the JS object's optional message field.

In general that discussion sounds awesome. Being able to map the C++ message into the JS error and vice versa would be quite useful.

aheejin commented 2 years ago

It looks this test gives examples of getArg's usage: https://github.com/v8/v8/blob/17a99fec258bcc07ea9fc5e4fabcce259751db03/test/mjsunit/wasm/exceptions-api.js#L202-L225

thibaudmichaud commented 2 years ago

I am assuming maybe wasmException.getArg(new WebAssembly.Tag({ parameters: ["i32"] }), 0) but that is just a guess.

More accurately it should look like wasmException.getArg(instance.exports.tag, 0), where tag is exported by the wasm module and is the tag that was used by the throw instruction. Tags have identity, so your example would be a type error.

Alternatively, the tag can be created from JS as let tag = new WebAssembly.Tag(...) and imported in the wasm module. Then tag can be used as the first argument of getArg. (This is how I did it in the test linked by @aheejin).

I'm not sure what emscripten does here. The former I think, but then the question is whether tags are (or can be) exported.

bkotsopoulossc commented 2 years ago

wasmException.getArg(instance.exports.tag, 0)

Oh wow! This is exactly what I was looking for - this is what I was confused on. I didn't realize this was exported nicely or that it was using reference equality under the hood. Makes much more sense now, thanks :)

bkotsopoulossc commented 2 years ago

What getArg returns is an i32 value, which is a pointer to a C++ exception, which is returned by __cxa_allocate_exception in libc++abi.

So the integer returned is an offset into the wasm heap? Is that the right way to interpret it?

aheejin commented 2 years ago

Yes, it is a regular memory address.

bkotsopoulossc commented 2 years ago

So there is a technique in the emscripten exception handling mechanism, where the c++ exception bubbles up from wasm to JS as a number, and JS catches it, and can then follow this pattern to call back into wasm, cast the number to a pointer, and use it to access the underlying exception object.

Yes, it is a regular memory address.

Based on this info, I tried the same technique with wasm EH, but got a memory access out of bounds when de-referencing the pointer in wasm. It sounds like the technique should work here, so I'm trying to figure out what is different