bytecodealliance / wasmtime

A fast and secure runtime for WebAssembly
https://wasmtime.dev/
Apache License 2.0
15.25k stars 1.29k forks source link

Does Cranelift honour the requirement of WebAssembly that NaN results of most float ops must be arithmetic? #8967

Closed primoly closed 2 months ago

primoly commented 2 months ago

NaNs in IEEE floats are encoded just like infinities, except that at least one of the mantissa bits is set to one. Wasm NaNs in results of float ops (except fabs, fneg and fcopysign) have the additional requirement that the most significant bit of the mantissa must be one (arithmetic NaN). Is this actually honoured by Cranelift codegen? If NaNs are always canonicalised then this is guaranteed (as the canonical NaN is also an arithmetic one), but if NaNs are simply propagated then this could be an issue.

https://webassembly.github.io/spec/core/exec/numerics.html#nan-propagation https://webassembly.github.io/spec/core/syntax/values.html#syntax-float https://webassembly.github.io/spec/core/syntax/values.html#arithmetic-nan

alexcrichton commented 2 months ago

I believe in theory all Cranelift operations/backends/etc all do what wasm wants. That being said I also wouldn't preclude the existence of a bug somewhere. I haven't read up on the context of #8954 but is there a specific case you're wondering about?

cfallin commented 2 months ago

I think this is a question of auditing the backends to ensure that each ISA generates the appropriate kind of NaN. I'm not familiar enough with the FP subsets of our supported ISAs to say for sure; @abrown, @afonso360, @uweigand, others, thoughts on x64, aarch64, riscv64, s390x? Do they always generate arithmetic NaNs?

cfallin commented 2 months ago

(As an additional note, I'd be surprised if Wasm were spec'd in such a way that e.g. a f32.add required more than a bare FP add instruction on common ISAs -- that would seem to go against the "predictable performance" design, and I haven't seen "ensure it's an arithmetic NaN" logic in other engines -- but this is good to double-check.)

primoly commented 2 months ago

(As an additional note, I'd be surprised if Wasm were spec'd in such a way that e.g. a f32.add required more than a bare FP add instruction on common ISAs -- that would seem to go against the "predictable performance" design, and I haven't seen "ensure it's an arithmetic NaN" logic in other engines -- but this is good to double-check.)

I think you can only observe the exact bits of a float with certain instructions (memory storing, reinterpret bits as int, others?), so if those instructions do some kind of canonicalisation guaranteeing arithmetic NaNs, that would likely be enough. For stores to memory the performance impact should be negligible, because stores are so expensive, the additional conversion to arithmetic NaNs wouldn’t add much.

Edit: I don’t actually know much about the performance of CPU instructions, maybe with the speed of L1 caches stores are not expensive as I wrote above.

fitzgen commented 2 months ago

(As an additional note, I'd be surprised if Wasm were spec'd in such a way that e.g. a f32.add required more than a bare FP add instruction on common ISAs -- that would seem to go against the "predictable performance" design, and I haven't seen "ensure it's an arithmetic NaN" logic in other engines -- but this is good to double-check.)

+1 to this; I'd also be very surprised.

afonso360 commented 2 months ago

For RISC-V the behavior is fairly straightforward. (See §21.3. "NaN Generation and Propagation" of the ISA Unprivileged Manual)

Except when otherwise stated, if the result of a floating-point operation is NaN, it is the canonical NaN. The canonical NaN has a positive sign and all significand bits clear except the MSB, a.k.a. the quiet bit. For single-precision floating-point, this corresponds to the pattern 0x7fc00000

AFAIK the exceptions to this are fmin/fmax and the sign injection instructions.

So I think this is compliant with the WASM Spec directly, and for fmin/fmax we do some corrections.

uweigand commented 2 months ago

The s390x ISA does not use the term "arithmetic NaN". However, the definition of this term in the WebAssembly spec (most significant payload bit is 1) corresponds exactly to what the s390x ISA calls "quiet NaN":

There are two types of NaNs, signaling and quiet. A signaling NaN (SNaN) is distinguished from the corresponding quiet NaN (QNaN) by the leftmost fraction bit: zero for the SNaN and one for the QNaN. A special QNaN is supplied as the default result for an IEEE-invalid-operation exception; it has a plus sign and a leftmost fraction bit of one, with the remaining fraction bits being set to zeros.

The ISA guarantees that the result of any floating-point instruction (except those only manipulating the sign bit) will always be a quiet NaN (if it is a NaN). More specifically if the output is a NaN, it is either a copy of one of the input operands (possibly transformed from a signaling NaN to a quiet NaN), or else the default NaN described in the quote above (which corresponds to what WebAssembly calls the "canonical NaN").

Given that behavior of the ISA, it seems unlikely that code generated by the wasmtime backend would ever be able to violate the WebAssembly NaN propagation rules on s390x (although I haven't explicitly verified all patterns).

primoly commented 2 months ago

Thanks everyone, especially @afonso360 and @uweigand for digging up the RISC-V and s390x specifications. Now that I know of the difference between quiet and singling NaN it became clear to me why Wasm made that choice about the “arithmetic NaN” (quiet NaN). I haven’t checked how this is handled in x86-64 and AArch64 but I presume they either return the canonical NaN (as RISC-V) or some arithmetic NaN (as s390x).