Closed primoly closed 2 months ago
I believe in theory all Cranelift operations/backends/etc all do what wasm wants. That being said I also wouldn't preclude the existence of a bug somewhere. I haven't read up on the context of #8954 but is there a specific case you're wondering about?
I think this is a question of auditing the backends to ensure that each ISA generates the appropriate kind of NaN. I'm not familiar enough with the FP subsets of our supported ISAs to say for sure; @abrown, @afonso360, @uweigand, others, thoughts on x64, aarch64, riscv64, s390x? Do they always generate arithmetic NaNs?
(As an additional note, I'd be surprised if Wasm were spec'd in such a way that e.g. a f32.add
required more than a bare FP add instruction on common ISAs -- that would seem to go against the "predictable performance" design, and I haven't seen "ensure it's an arithmetic NaN" logic in other engines -- but this is good to double-check.)
(As an additional note, I'd be surprised if Wasm were spec'd in such a way that e.g. a
f32.add
required more than a bare FP add instruction on common ISAs -- that would seem to go against the "predictable performance" design, and I haven't seen "ensure it's an arithmetic NaN" logic in other engines -- but this is good to double-check.)
I think you can only observe the exact bits of a float with certain instructions (memory storing, reinterpret bits as int, others?), so if those instructions do some kind of canonicalisation guaranteeing arithmetic NaNs, that would likely be enough. For stores to memory the performance impact should be negligible, because stores are so expensive, the additional conversion to arithmetic NaNs wouldn’t add much.
Edit: I don’t actually know much about the performance of CPU instructions, maybe with the speed of L1 caches stores are not expensive as I wrote above.
(As an additional note, I'd be surprised if Wasm were spec'd in such a way that e.g. a
f32.add
required more than a bare FP add instruction on common ISAs -- that would seem to go against the "predictable performance" design, and I haven't seen "ensure it's an arithmetic NaN" logic in other engines -- but this is good to double-check.)
+1 to this; I'd also be very surprised.
For RISC-V the behavior is fairly straightforward. (See §21.3. "NaN Generation and Propagation" of the ISA Unprivileged Manual)
Except when otherwise stated, if the result of a floating-point operation is NaN, it is the canonical NaN. The canonical NaN has a positive sign and all significand bits clear except the MSB, a.k.a. the quiet bit. For single-precision floating-point, this corresponds to the pattern 0x7fc00000
AFAIK the exceptions to this are fmin
/fmax
and the sign injection instructions.
So I think this is compliant with the WASM Spec directly, and for fmin
/fmax
we do some corrections.
The s390x ISA does not use the term "arithmetic NaN". However, the definition of this term in the WebAssembly spec (most significant payload bit is 1) corresponds exactly to what the s390x ISA calls "quiet NaN":
There are two types of NaNs, signaling and quiet. A signaling NaN (SNaN) is distinguished from the corresponding quiet NaN (QNaN) by the leftmost fraction bit: zero for the SNaN and one for the QNaN. A special QNaN is supplied as the default result for an IEEE-invalid-operation exception; it has a plus sign and a leftmost fraction bit of one, with the remaining fraction bits being set to zeros.
The ISA guarantees that the result of any floating-point instruction (except those only manipulating the sign bit) will always be a quiet NaN (if it is a NaN). More specifically if the output is a NaN, it is either a copy of one of the input operands (possibly transformed from a signaling NaN to a quiet NaN), or else the default NaN described in the quote above (which corresponds to what WebAssembly calls the "canonical NaN").
Given that behavior of the ISA, it seems unlikely that code generated by the wasmtime backend would ever be able to violate the WebAssembly NaN propagation rules on s390x (although I haven't explicitly verified all patterns).
Thanks everyone, especially @afonso360 and @uweigand for digging up the RISC-V and s390x specifications. Now that I know of the difference between quiet and singling NaN it became clear to me why Wasm made that choice about the “arithmetic NaN” (quiet NaN). I haven’t checked how this is handled in x86-64 and AArch64 but I presume they either return the canonical NaN (as RISC-V) or some arithmetic NaN (as s390x).
NaNs in IEEE floats are encoded just like infinities, except that at least one of the mantissa bits is set to one. Wasm NaNs in results of float ops (except
fabs
,fneg
andfcopysign
) have the additional requirement that the most significant bit of the mantissa must be one (arithmetic NaN). Is this actually honoured by Cranelift codegen? If NaNs are always canonicalised then this is guaranteed (as the canonical NaN is also an arithmetic one), but if NaNs are simply propagated then this could be an issue.https://webassembly.github.io/spec/core/exec/numerics.html#nan-propagation https://webassembly.github.io/spec/core/syntax/values.html#syntax-float https://webassembly.github.io/spec/core/syntax/values.html#arithmetic-nan