WebAssembly / design

WebAssembly Design Documents
http://webassembly.org
Apache License 2.0
11.41k stars 694 forks source link

Recovering from traps/non-trapping operations #878

Closed pipcet closed 7 years ago

pipcet commented 7 years ago

https://github.com/WebAssembly/design/blob/master/JS.md#javascript-api says:

Because JavaScript exceptions can be handled, and JavaScript can continue to call WebAssembly exports after a trap has been handled, traps do not, in general, prevent future execution.

That's true, of course, but I think it is necessary for usable wasm implementations to actually recover from traps (integer overflow traps in particular). And as far as I can see, that isn't possible because local variables and the expression stack will be lost irretrievably after a trap. Saving all state to linear memory before executing each potentially-trapping opcode would, I think, have a huge performance impact.

Am I very confused about all this? (I'm working on an asm.js/wasm gcc backend at https://github.com/pipcet/asmjs; asm.js mostly works, but it currently looks like I'll have to convert the wasm backend to use wrapper functions that make sure not to trap for each potentially-trapping operation).

I think it would be best to have non-trapping versions of arithmetic operators, even in the MVP.

jfbastien commented 7 years ago

Very cool that you're working on a backend for GCC!

We've discussed adding this in future features, I strongly doubt we'd add this for MVP.

it currently looks like I'll have to convert the wasm backend to use wrapper functions that make sure not to trap for each potentially-trapping operation).

I'm not sure I understand this, can you explain? Why do you have to handle traps in a certain manner? The LLVM backend can generate code without this just fine.

If I understand, you'd like to be able to handle traps, but that's separate from a GCC backend, right?

pipcet commented 7 years ago

I should have been more specific: I'd like to run existing C code, which tends to assume integer operations either do not trap or throw SIGFPE. That assumption is very widespread, including for things like Perl (which tests for it during the build process) and the GCC test suite.

The comment in JS.md really reads to me like the thinking was that SIGFPE could be simulated, and execution resumed, without too much of a performance impact. At least that's what I thought at first, but now that I've tried it there seems to be no way.

So it's true that the GCC backend will work just fine for strict standard C, but it will also break really common assumptions about what happens when you cast a double to an integer.

This feels like a regression from the asm.js backend (which is able to build Perl with only a few minor modifications).

jfbastien commented 7 years ago

Do you have example code which breaks with these assumptions? At a minimum we want to regress this knowingly. Our approach for e.g. integer division was that the cost of checking was very low, and almost no code assumes UB continues executing.

I'd like to understand which other trap cases we may have gotten wrong, as you seem to hint.

pipcet commented 7 years ago

To be honest, I might have overestimated the impact of this. The first thing I tried was Perl, and that broke right away. I'll try fixing that and seeing how far I get.

You're right that no portable code can actually continue executing after a SIGFPE signal, so maybe we could get away with performing a setjmp-to-linear-memory when a SIGFPE handler is installed and calling the handler in a loop when the signal hits...

jfbastien commented 7 years ago

Maybe @kripken had Perl running and could chime in?

kripken commented 7 years ago

No perl, but this topic has been discussed a lot in asm2wasm here. See in particular the last few comments with proposals for how to move forward, I expect we'll go with one of those, i.e.,

pipcet commented 7 years ago

So some of the traps I've been seeing turned out to be false positives in SpiderMonkey's wasm code: https://bugzilla.mozilla.org/show_bug.cgi?id=1321189. For the others, though, I'll probably go with the wrapper function approach that @kripken also uses for two of three modes.

sunfishcode commented 7 years ago

Do you have any examples where real-world code is catching SIGFPE and intending to continue executing after an integer division by zero?

I'd also like to learn more about real-world cases where traps arise from using plain wasm operators.

pipcet commented 7 years ago

"Continue executing" as in actually resuming execution at the instruction after the division by zero? No, I don't have that, since such code would necessarily depend on the architecture.

What I do know happens is that the SIGFPE handler terminates execution in some special way, as in the Perl example: https://github.com/pipcet/perl/blob/5a512245befd495e146a9d77d78da38918555f18/Configure#L11786. That's something we can support half-way, by calling the right handler, but we can't fill out the siginfo meaningfully for debugging.

It appears not to be standards-compliant to longjmp() from the SIGFPE handler (MSDN has an example that does this at https://msdn.microsoft.com/en-us/library/aa272905(v=vs.60).aspx, but it's, well, MSDN); we could support that, at a price, or with a special invocation of setjmp().

What we can't support, as far as I can see, is turning the trapping operation into a C++ exception.

Again, I was very wrong about the scale of this problem. Since the operators it affects are already somewhat slow and unlikely to be a performance bottleneck, I see no real problem anymore with wrapping them in safe functions.

sunfishcode commented 7 years ago

Thanks for the link to that perl code. I'm interested in examples like that because in practice, converting a large float to integer gives different values on different platforms (INT_MIN on x86, INT_MAX on arm, etc.), so naive uses of it would often already lead to observable bugs. In perl's case, it's in the Configure script, and it appears perl itself doesn't actually need that bit of code, because it doesn't use the CASTI32 config variable that gets set as a result.

Since you mention standards-compliance, it's worth noting float-to-int conversion overflow is full-fledged undefined behavior in both C and C++, so the code in that Configure script is officially buggy, for what that's worth.

That MSDN link is also interesting. That code won't work on wasm, because wasm has no _control87 nor any equivalent (though there are some ideas about providing more complete IEEE 754 functionality in the future).

pipcet commented 7 years ago

it appears perl itself doesn't actually need that bit of code

There's an identical test that sets CASTFLAGS & 2, which is actually used, if I'm reading the code correctly.

I also found http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1318.htm very interesting, because it implies that it's not just an issue of the standard omitting to make SIGFPE+longjmp() safe, it actually wouldn't work on real products.

sunfishcode commented 7 years ago

Ok, I see CASTFLAGS & 2 in numeric.c. Interestingly, it looks like it's using that to test whether it's safe to cast floating-point values in [0x80000000,UINT32_MAX] to uint32_t, which is fully defined in the C standard. That would mean it's only being used to work around buggy compilers. So it appears the perl code itself isn't actually depending on any undefined behavior here, only the Configure code is, which I find interesting :-).

pipcet commented 7 years ago

Oops, looks like one of those buggy compilers is mine :-) Hopefully fixed now.

I don't understand the Perl code's logic; it sets CASTFLAGS to 7 for wasm, but there's not really a reason for that, as you point out.

jfbastien commented 7 years ago

Can we close this?

pipcet commented 7 years ago

Yes! Thanks again for convincing me that what wasm provides is good enough for portable software.