WebAssembly / design

WebAssembly Design Documents
http://webassembly.org
Apache License 2.0
11.41k stars 694 forks source link

What is the motivation for `NaN` canonicalization? #1463

Open penzn opened 2 years ago

penzn commented 2 years ago

I am wondering whether there are any uses of Wasm that depend on NaN value canonicalization.

For some time I thought that our NaN semantics are identical to ECMAScript, but it looks like the latter (with some minor change to formatting, emphasis mine) does not distinguish between NaN bit patterns:

The Number type has exactly 18,437,736,874,454,810,627 values, representing the double-precision 64-bit format IEEE 754-2019 values as specified in the IEEE Standard for Binary Floating-Point Arithmetic, except that the 9,007,199,254,740,990 distinct “Not-a-Number” values of the IEEE Standard are represented in ECMAScript as a single special NaN value. (Note that the NaN value is produced by the program expression NaN.) In some implementations, external code might be able to detect a difference between various Not-a-Number values, but such behaviour is implementation-defined; to ECMAScript code, all NaN values are indistinguishable from each other.

while in Wasm a non-canonical NaN cannot be returned from operation consuming a canonical NaN:

When the result of a floating-point operator other than fabs, fneg, or fcopysign is a NaN, then its sign is non-deterministic and the payload is computed as follows:

  • If the payload of all NaN inputs to the operator is canonical (including the case that there are no NaN inputs), then the payload of the output is canonical as well.

  • Otherwise the payload is picked non-deterministically among all arithmetic NaNs; that is, its most significant bit is and all others are unspecified.

This is much stricter property, and it comes with a cost for operations that might otherwise change NaN bits (fmin and fmax on x86 for example). However this distinction is completely ignored by most programming languages, which on the Web means Wasm code has to perform a somewhat costly conversion the rest of the browser is completely oblivious to. Also, for canonical/non-canonical distinction to be practically useful, non-canonical NaNs need to be produced and consumed as a form of metadata, however spec does not mandate their values are preserved, which makes the distinction at least partially moot.

What are the use cases that practically depend on this behavior? It is of course possible to construct a test that would depend on exact bit pattern, but what language/library does use that in practice (i.e. where the distinction is not 'implementation defined' or 'undefined')?

sunfishcode commented 2 years ago

The use case that motivated this was NaN-boxing at the wasm level. When doing NaN-boxing, it's common to interpret the canonical NaN bitpattern as an actual floating-point NaN, and non-canonical NaN bitpatterns as other types of data. When doing NaN boxing, at the point where one is doing a floating-point arithmetic operation, one has already ensured that the operands are either non-NaN or canonical NaN (because otherwise they're not floats and floating-point arithmetic doesn't make sense). Wasm's NaN rule means that the result in that case will also be either non-NaN or canonical NaN, so the wasm code doesn't need to add extra instructions to manually canonicalize it.

A real-world user of NaN boxing in wasm is the SpiderMonkey JS engine compiled to wasm.

This form of NaN propagation is automatically provided in instructions that implement IEEE 754's NaN propagation recommendation, as well as hardware that only produces canonical NaNs, which covers most instructions on popular architectures.

x86's min and max instructions indeed do not have this behavior, and sometimes require extra instructions to implement wasm's min and max instructions, though the extra instructions can sometimes be omitted or coalesced inside expression trees, because the NaN bits only matter when they're observed.

An IEEE 754-2019 minimum and maximum implementation would support an efficient implementation of Wasm's min and max instructions, as well as JavaScript's Math.min and Math.max functions and Java's java.lang.Math.min and java.lang.Math.max functions. I encourage x86 designers to add instructions which implement the IEEE 754-2019 minimum and maximum instructions.

penzn commented 2 years ago

The question is not really about benefits of IEEE 754, but rather about choices we make when we interpret the standard. This is not the only choice that has been made, for example we don't support signaling NaN values.

The use case that motivated this was NaN-boxing at the wasm level. When doing NaN-boxing, it's common to interpret the canonical NaN bitpattern as an actual floating-point NaN, and non-canonical NaN bitpatterns as other types of data.

Spec defines how operations treat NaN data within floating point context, what happens when non-floating point data is hidden inside a NaN is not relevant to that. Within the boxing technique feeding non-FP boxed values into FP operations does not make sense, moreover we don't preserve non-canonical NaN data on x86, and boxing still works, as you point out. Hiding other data in NaN bits is essentially a more efficient form of C union, and it does not have much to do with how we define floating point operations.

An IEEE 754-2019 minimum and maximum implementation would support an efficient implementation of Wasm's min and max instructions, as well as JavaScript's Math.min and Math.max functions and Java's java.lang.Math.min and java.lang.Math.max functions.

Some previous discussions for background A couple of issues from this org that highlight that JavaScript _does not_ have a notion of canonical `NaN`: - https://github.com/WebAssembly/spec/issues/1496 - https://github.com/WebAssembly/interface-types/issues/134

We are following interpretation of 754-2019 with a specific canonical NaN rule. Neither Java, nor JavaScript do that. This makes Wasm spec stricter than both of those, which in case of the Web means Wasm is stricter than the rest of the runtime that surrounds it. What I am trying to understand is what does this buy us, as this mandates costlier lowering of FP operations, as compared to JS, for example.

sunfishcode commented 2 years ago

Wasm is designed to be a compilation target for other languages.

NaN boxing is used in some popular language implentations.

Almost all hardware, including almost all x86 instructions, already have the specific behavior needed for NaN boxing.

Wasm's design here exposes that behavior, for the benefit of these language implementations.

penzn commented 1 year ago

I am not sure I understand your response, I am having trouble connecting boxing to the semantics of FP operations. Non-FP data within a NaN payload in NaN-boxing is never intended to be passed into FP operations, the technique would work regardless of what the FP ops do with NaNs. It is relatively easy to see if you follow how (for example) adding two boxed integers would work - it would not be done by the corresponding floating point operation, as they boxed values are not numbers as far as the operation is concerned.

754-2019 introduces min/max behavior with respect to sign of zero, Wasm and JS specs both follow that. Canonical NaNs are a bit more ambiguous. To illustrate that, there are three options for canonical NaN behavior (more strict to less strict):

I can see situations where behavior w.r.t sign of zero is observable and desired, for example it can be an indication of underflow and it would be very desireable to preserve the sign. From that point of view it makes sense to pay the price of having to use more instructions. In contrast, I am still trying to understand what we are getting in return for this rule (and conversely what JS is missing).

rossberg commented 1 year ago

@penzn, under NaN-boxing, proper float NaNs must be canonical, otherwise they would be confused with pointers, e.g., by a GC. Hence, you want all float ops to produce canonical NaNs (provided they aren't fed non-canonical ones, which indeed should never happen).

penzn commented 1 year ago

Yes, it seems that if NaN is passed to an op that would break its canonical property the runtime will think it became a different data type. I was focusing non non-FP data and missed that case.

My question then is what do interpreter implementers do when they build for native targets? This must have been solved there, as those are not going to preserve canonical NaNs.

sunfishcode commented 1 year ago

Interpreter code that I've seen does seem to rely on native targets preserving NaN payloads, for add/sub/mul/div/sqrtl, and most native targets do propagate NaNs for those in practice. For min/max, intetrpreters seem to have explicit NaN tests so they determine the NaN result themselves.

KloudKoder commented 1 year ago

TL;DR but this is perhaps related: the Release 2.0 draft of the WebAssembly Specification says in one place:

"There is no undefined behavior, i.e., the execution rules cover all possible cases that can occur in a valid program, and the rules are mutually consistent."

Then it says elsewhere:

"Some operators are non-deterministic, because they can return one of several possible results (such as different NaN values)."

I wonder what other "some operators" there are which might "return one of several possible results". Even seemingly trivial undefined behavior is potentially dangerous.

On the "so what" end of the spectrum, I might be able to approximate the proportion of processors you have from whichever vendor. But at the shopstopper end, distributed computing could be brought to a halt because consensus fails to crystalize (or takes really long due to sporadic disagreement among outputs). Theoretically, this could even serve as a de facto DDoS attack vector by creating superfluous cross-verification traffic.

Not that fixing this would be trivial, however. Unless of course it's merely a doc bug.

sunfishcode commented 1 year ago

The only arithmetic operations that can return "one of several possible results" are floating-point operation which can return NaN, and the only permitted variations are in the NaN bits. The vast majority of code in the world never looks at NaN bits, so this rarely matters in practice.

But indeed, rarely is not never, so I am also now in the process of proposing a fully-deterministic mode for NaNs, for environments that want to have it. There is some discussion here: https://github.com/WebAssembly/relaxed-simd/issues/108

penzn commented 1 year ago

Interpreter code that I've seen does seem to rely on native targets preserving NaN payloads, for add/sub/mul/div/sqrtl, and most native targets do propagate NaNs for those in practice. For min/max, intetrpreters seem to have explicit NaN tests so they determine the NaN result themselves.

That's why having canonical NaN strikes me as an odd choice - the rare use cases that require that can just define stricter operations on top of the ones specified. Realistically we don't even get raw min/max in standard library, as that favors numbers (but that is a topic for a different discussion).

Also, are there any SIMD users of NaN-boxing? I understand that the canonicalization has been added long before SIMD.

sunfishcode commented 1 year ago

That's why having canonical NaN strikes me as an odd choice - the rare use cases that require that can just define stricter operations on top of the ones specified.

That would add overhead though. The motivation for the feature was to avoid this overhead.

Realistically we don't even get raw min/max in standard library, as that favors numbers (but that is a topic for a different discussion).

Historically, Wasm's design has not prioritized the convenience of interpreter implementors.

Also, are there any SIMD users of NaN-boxing? I understand that the canonicalization has been added long before SIMD.

I'm not aware of any today. But in a program that mixes SIMD and scalar on the same data, it could end up being important for SIMD operations to preserve the canonical NaN property to prevent the scalar operations from interpreting non-canonical NaNs as boxed values.

rossberg commented 1 year ago

@KloudKoder:

the Release 2.0 draft of the WebAssembly Specification says in one place:

"There is no undefined behavior, i.e., the execution rules cover all possible cases that can occur in a valid program, and the rules are mutually consistent."

Then it says elsewhere:

"Some operators are non-deterministic, because they can return one of several possible results (such as different NaN values)."

To clarify, this is not a contradiction, as undefined behaviour ≠ non-deterministic behaviour. In particular, the former quote is carefully phrased to say that the "rules cover all possible cases", which doesn't exclude the possibility of multiple allowed cases in some places, as per the latter quote. Which ones those are is still fully defined.

Undefined behaviour is much worse: it implies that anything could happen, including launching missiles. In fact, the usual interpretation of undefined behaviour as e.g. defined in the C world is even worse than that, since it implies that the entire execution is undefined, i.e., anything could have happened even before execution reached an undefined point.

I wonder what other "some operators" there are which might "return one of several possible results".

The standard example is anything related to threading, i.e., accesses to shared memory can yield multiple possible results, dependent on timing or possible reordering in the CPU or engine. Unlike C, Wasm still fully defines all allowed behaviours.

KloudKoder commented 1 year ago

@sunfishcode Thanks, I read your other NaN thread. I think the relaxed SIMD thing would be a good idea, in case the programmer sees no risk in hardware-dependent outputs and just wants maximum speed. "But indeed, rarely is not never" Yeah, that's why I wanted to point out the risks, and I'm happy that you're already aware. I understand that the question before you is: what to do about all the WASM modules already out there which didn't contemplate this? I really wonder if the canonicalization penalty is that high. I think the most common affected instruction would be sqrt, which is slow enough that adding a NaN inspection after the fact would barely move the needle on overall performance. (And most CPUs have status bits that you can inspect in order to do this faster. And branch prediction would hide most of the cost even so.)

@rossberg I know what you mean and that's a good explanation, but if a WASM module can behave like a random oracle, however biased, there's still a potential attack surface.

penzn commented 1 year ago

That would add overhead though. The motivation for the feature was to avoid this overhead.

I think it adds overhead either way - currently it adds it to (x86) implementations of fmin and fmax for all code using those operations, and if it wasn't in the spec then overhead would only be added to the code that uses NaN-boxing. This rule makes regular, non-NaN-boxing users incur the overhead due to design choice that doesn't apply to them.

It is possible to say that it adds the overhead to x86 execution only, while the alternative would've forced NaN-boxing users to add it to all platforms. This doesn't make much sense to me either, given that NaN-boxing is a relatively rare, while the hardware this penalizes is rather common.

Not to nitpick, but I find your last two statements slightly at odds with each other 😉 If we are not prioritizing NaN-boxing interpreters, then we should not be worried about purely hypothetical cases that would affect them, and vice versa.

thomcc commented 1 year ago

Sorry if I missed it, but what was the motivation for keeping the non-determinism around the sign-bit of NaNs?

This is unfortunate, since it means code that sorts floats via the IEEE-754 2019 totalOrder predicate will have NaNs non-deterministically either be at the very front or very end (as that ordering considers -NaN to be smaller than all other floats, and +NaN to be larger).

Concretely, this is kinda problematic for the Rust stdlib, since we've stabilized f{32,64}::total_cmp, which is a bit of a footgun due to this. We also have functions like slice::sort_floats, which we'd like to stabilize, and ideally would behave consistently with total_cmp (although we have not yet stabilized it so we have some leeway there).

It's difficult for us to justify investing time into fixing the non-determinism on other platforms (currently it tends to vary between debug and release builds on several non-wasm targets for various reasons) due to the fact that WebAssembly considers it non-deterministic.

To be clear, there's a lot of leeway for how this is specified from our point of view. We'd just like that it be deterministic based on the inputs -- the status quo is pretty bad for us, and ends up causing some bugs in programs that don't particularly care about NaN beyond not expecting it to non-deterministically sort in a different location (well, to be more precise, to non-deterministically have values which sort into different locations).

See https://github.com/rust-lang/unsafe-code-guidelines/issues/237 for some more discussion here, and I'm sorry if this was the wrong place to bring this up (I can file another issue if that would be better).

workingjubilee commented 1 year ago

I kind of barely understand "A canonical NaN is a floating-point value ±nan(canonₙ) where canonₙ is a payload whose most significant bit is 1 while all others are 0" tbh. Is that supposed to imply both signs for a canonical NaN should be considered canonical NaNs? I can derive that looking at other parts of the formal syntax as well, but it seems to be in conflict with various remarks on the operational semantics.

sunfishcode commented 1 year ago

@penzn

Ah, I misunderstood your question then. Wasm hasn't historically prioritized the convenience of interpreter-based wasm engines, but that's different than interpreters compiled to Wasm.

The idea at the time was that it's only min and max and not the entire instruction set, and in some cases, optimizers can avoid the overhead, such as when one of the operands is constant, which is a fairly common case for min and max. And x86 designers would seem to have several reasons for adding IEEE 754-2019 minimum and maximum instructions, and while it's well known that it takes time to get to shipping new hardware and more time before widespread use, we hope Wasm will be around for a much longer time than even that. Reasonable people can disagree with this reasoning, but that was the reasoning for the choice made in that time.

@thomcc

The main motivation for non-determinism of the sign-bit of NaNs is that x86 and ARM differ. When x86 generates a NaN from non-NaN operands (eg. 0.0/0.0), it sets the sign bit to 1; when ARM does it, it sets the sign bit to 0. This behavior isn't based on the inputs at all, and neither x86 nor ARM hardware provide any way to make it be based on the inputs. We could add extra instructions to canonicalize NaNs, but at the time the rule was made, it was guessed that that would add too much overhead to be worth doing.

One might ask, why not have a rule saying "it's either 0 or 1, but it doesn't change within a run of a program?". At the time the rule was made, the spec didn't have that kind of "nondeterministic but consistent within a run" concept. Today, with the relaxed-simd proposal, there are discussions about adding this kind of concept, however even if we made such a rule today, existing wasm engines conforming to older versions of the spec wouldn't be guaranteed to follow any new NaN restrictions. And there are people doing live migration of wasm programs between x86 and ARM today.

I have a proposal to add a mode where NaNs are deterministic, however the current design is such that toolchains like Rust wouldn't be permitted to depend on this behavior, so I don't think that helps with the issue here.

So I don't see any easy answers here.

@workingjubilee

Yes, the terminology is confusing; the wasm spec uses the term "canonical NaN" to mean a NaN with either sign, however outside the spec in other contexts, "canonical NaN' often means a NaN with a zero sign bit.

conrad-watt commented 1 year ago

To be clear, there's a lot of leeway for how this is specified from our point of view. We'd just like that it be deterministic based on the inputs -- the status quo is pretty bad for us, and ends up causing some bugs in programs that don't particularly care about NaN beyond not expecting it to non-deterministically sort in a different location (well, to be more precise, to non-deterministically have values which sort into different locations).

@thomcc one possible strengthening we could investigate for our NaN semantics would be to make their results deterministic based on inputs across the lifetime of a single cross-module "run" - i.e. the result of fp_op(x,y) may still differ depending on the browser/hardware combination, but the observed result for a given input would always be the same for the lifetime of the program (transitive across all inter-connected modules). Would this be strong enough for Rust?

For Wasm folks, I'm essentially suggesting that we could strengthen NaN results to respect the relaxed SIMD.proposal's list non-determinism, even in relaxed mode. We'd need to open up the conversation about what this would mean for (e.g.) the code migration ideas floated in the last CG, though.

penzn commented 1 year ago

I kind of barely understand "A canonical NaN is a floating-point value ±nan(canonₙ) where canonₙ is a payload whose most significant bit is 1 while all others are 0" tbh. Is that supposed to imply both signs for a canonical NaN should be considered canonical NaNs? I can derive that looking at other parts of the formal syntax as well, but it seems to be in conflict with various remarks on the operational semantics.

@workingjubilee you are exactly right, both signs are canonical.

@thomcc - as @sunfishcode pointed out, the choices are thanks to differences in hardware behavior, and it would be expensive to produce only one NaN value on all hardware. However, Wasm is not the odd one out, NaN bits are non-deterministic for most/all source languages and compilation targets. It is possible to shrink that non-determinism down with some extra code (as discussion of NaN-boxing mentions), but vanilla C/C++/Java/Fortran/JS/etc don't guarantee exact NaN values.

Speaking of which, how does Rust handle this? I am no expert on it, but it looks like the docs say this about NaN constants:

Note that IEEE 754 doesn’t define just a single NaN value; a plethora of bit patterns are considered to be NaN. [...] This constant isn’t guaranteed to equal to any specific NaN bitpattern, and the stability of its representation over Rust versions and target platforms isn’t guaranteed.

This makes me think that it is mostly in line with other languages, in terms that you can't expect (depend on) exact NaN bit patterns, but I can be wrong. What I mean that if you can't depend on all operations returning the same NaN bits on all platforms, then it doesn't help that total ordering is always correct. @thomcc and @workingjubilee can you please shed some light on this?

workingjubilee commented 1 year ago

I don't think that's what nondeterminism means to us, really. The results of NaN inputs on x86 or Arm being nondeterministic is not correct. It's also not correct for the float constant f64::NAN: it's going to be a single value known at compilation time. Instead, they are arbitrary: For a given instruction on these machines, there are only X+1 NaN possible outputs, where X is the number of "arguments" to the instruction:

For the same machine state, the same instruction, and the same inputs, it will be the same output, thus you can always predict the output value. Thus, the output is dependent on the inputs, fully... just in a trivial way, because in the absence of NaNs to propagate, all values map to the same output.

This is not true, as far as I understand it, for wasm, because nondeterminism means that it is acceptable if certain bits are filled in from a randomness function. Thus the same inputs to the same instruction and with the same machine state may have different output.

This matters because Rust allows compile-time function evaluation. Thus, as far as Rust is concerned, the difficulty of implementation is also the difficulty of using it as a compilation target. This is not uniquely true just of this case: it's true of any language that attempts to have real float semantics instead of arbitrarily kowtowing to "whatever the hardware does". Most nonconformant float implementations at least allow shimming their behavior, but true nondeterminism makes even that harder.

This, of course, is what you might think of as a nonsense implementation: "Surely no one would ever do that." However, wasm does not actually rule it out, as I understand it.

conrad-watt commented 1 year ago

@workingjubilee things are deterministic at the level of platform asm, but Wasm, Rust, and LLVM need to specify behaviours that encompass all the possible observable outcomes from different compilation targets (e.g. see the comment above about divergence between x86 and Arm). The "easy" way to solve this problem is to over-approximate things at the source level through some kind of limited non-determinism - with the added concern that Rust needs to care not only about over-approximating platform behaviours, but to some extent the solutions chosen by LLVM and Wasm, since it's targetting those directly.

It looks like Rust has not yet defined a policy here, but interesting discussions are happening in the issue @thomcc linked above, with a possible stop-gap solution being something like Wasm's non-determinism.

workingjubilee commented 1 year ago

@conrad-watt Yes, I am quite aware and have been part of many of those discussions. Again, that is not "non-determinism as she is spoke". If you generate 5 floating-point numbers, say, 0.0, -0.0, 0.5, -0.5, and what the hell, Infinity, and then generate 5 NaNs on a given platform, then sort them according to IEEE754's totalOrder predicate, there are two possible results:

[NaN, NaN, NaN, NaN, NaN, -0.5, -0.0, 0.0, 0.5, inf]
[-0.5, -0.0, 0.0, 0.5, inf, NaN, NaN, NaN, NaN, NaN]

Except on wasm, there are instead also these possible results, because of NaN sign nondeterminism:

[NaN, -0.5, -0.0, 0.0, 0.5, inf, NaN, NaN, NaN, NaN]
[NaN, NaN, -0.5, -0.0, 0.0, 0.5, inf, NaN, NaN, NaN]
[NaN, NaN, NaN, -0.5, -0.0, 0.0, 0.5, inf, NaN, NaN]
[NaN, NaN, NaN, NaN, -0.5, -0.0, 0.0, 0.5, inf, NaN]
RalfJung commented 1 year ago

Related discussion in other venues:

Speaking of which, how does Rust handle this? I am no expert on it, but it looks like the docs say this about NaN constants:

The rules documented for Rust currently are not the rules we want, they are the rules we have because LLVM and wasm don't let us implement the rules we want.

For a given instruction on these machines, there are only X+1 NaN possible outputs, where X is the number of "arguments" to the instruction:

Someone mentioned on the LLVM thread that a possible implementation of the IEEE spec would be to embed some instruction pointer information in the NaN, I guess to let people figure out where it was generated. Not sure if any hardware actually does that though. It would already be in conflict with wasm's "canonical" NaN spec.

conrad-watt commented 1 year ago

@workingjubilee apologies, I think we may be talking past each other somewhat - the non-determinism I'm referring to in my comment above is expressed at a semantic/spec level. We can certainly investigate whether it would be feasible to strengthen Wasm's specification so that only the "first two" outcomes are allowed - from a spec point of view this would still be a (more constrained) kind of non-determinism, although I agree potentially more helpful.

penzn commented 1 year ago

The rules documented for Rust currently are not the rules we want, they are the rules we have because LLVM and wasm don't let us implement the rules we want.

I take this as the Rust rules currently don't require strict behavior. I assume there is some collective decision making process involved, which mean that changing the rules is not a certainty yet. As an aside, I'm not sure how LLVM and Wasm prevent you from changing the rules though - would not it be possible to generate IR conformant with the desired behavior?

according to IEEE754's totalOrder predicate, there are two possible results:

[NaN, NaN, NaN, NaN, NaN, -0.5, -0.0, 0.0, 0.5, inf]
[-0.5, -0.0, 0.0, 0.5, inf, NaN, NaN, NaN, NaN, NaN]

Except on wasm, there are instead also these possible results, because of NaN sign nondeterminism:

[NaN, -0.5, -0.0, 0.0, 0.5, inf, NaN, NaN, NaN, NaN]
[NaN, NaN, -0.5, -0.0, 0.0, 0.5, inf, NaN, NaN, NaN]
[NaN, NaN, NaN, -0.5, -0.0, 0.0, 0.5, inf, NaN, NaN]
[NaN, NaN, NaN, NaN, -0.5, -0.0, 0.0, 0.5, inf, NaN]

What functional difference can be made out from NaN position in the total order? I understand that totalOrder allows "comparing" NaNs to numbers, but they are not really comparable: NaN > 0.0 == NaN < 0.0 == -NaN > 0.0 == -NaN < 0.0 == false. For practical purposes all NaN are the same (with a weird twist of not being equal to each other and even themselves), their 'sign' is just an artifact of floating point format, there is not such thing as "negative NaN" or "positive NaN" really. I am not sure why controlling this should be the job of the runtime. Those are technically valid not only in Wasm, but also in JavaScript and Java (JVM), and likely in some more cross-platform environments.

jar398 commented 1 year ago

Sorry, the comment I deleted did not belong here