erkyrath / glulxe

The Glulx VM reference interpreter
http://eblong.com/zarf/glulx/
MIT License
113 stars 21 forks source link

Glulxe's treatment of NaN payloads violates IEEE 754 #40

Closed dfoxfranke closed 1 week ago

dfoxfranke commented 1 month ago

IEEE754-2019 defines the MSB of a NaN's mantissa to be its signaling bit; a signaling bit of 1 represent a quiet NaN (qNaN), while a bit of 0 represents a signaling NaN (sNaN); all arithmetic operations which return NaNs must return qNaNs. All other bits of the mantissa represent the NaN's payload. qNaNs can carry arbitrary payloads, while the payload of a sNaN must be non-zero because a zero payload would be interpreted as an infinity.

Arithmetic operations are required to preserve NaN payloads. Specifically, per §6.2.3:

An operation that propagates a NaN operand to its result and has a single NaN as an input should produce a NaN with the payload of the input NaN if representable in the destination format.

If two or more inputs are NaN, then the payload of the resulting NaN should be identical to the payload of one of the input NaNs if representable in the destination format. This standard does not specify which of the input NaNs will provide the payload.

When glulxe's encode_float function returns a NaN, it always returns a qNaN with a payload of 0, and encode_double always returns a qNaN with a payload of 1. Therefore, these functions violate the above requirements.

Please hold off on fixing this and wait for a PR from me, which I will have validated against wasm2glulx's test suite. WebAssembly goes beyond IEEE754 and imposes an additional requirement on NaN payloads, which is that if a NaN result is not propagated from a NaN input (e.g. in the case of division by zero), then the payload must be 0. I want to make sure this gets patched in a way that complies with WASM's requirements as well as IEEE754's.

erkyrath commented 1 month ago

Okay...

hanna-kruppe commented 2 weeks ago

I believe that always canonicalizing any NaN result into a qNaN with payload 0 (i.e., setting the MSB of the significand field and clearing the lower bits) conforms to the IEEE 754 and Webassembly standards. Specifically:

  1. encode_float is fine as-is for both standards.
  2. encode_double already conforms to IEEE 754-2019. The only problem for Webassembly is that the low half of a NaN result is set to 1 instead of 0.

No other changes should be needed to support Webassembly, within the limits of what portable C without reliance on Annex F can guarantee.

The rest of this comment is just spec lawyering and can be ignored by anyone not interested in that.


First: the "sign bit operations" (copy, negate and copySign from §5.5.1 in IEEE 754-2019, spelled fNN.{abs,neg,copysign} in Webassembly) are required to operate on bit strings and thus propagate NaN encodings unchanged, except modifying the sign bit. However, Glulx doesn't provide these operations at all. They're required for wasm2glulx, but they can and must be implemented with integer bitwise operations, which side-steps encode_{float,double} entirely. The same is true for simply copying floating point values around (Glulx doesn't have separate "copy float" or "copy double" opcodes anyway).

Second, §6.2.3 of IEEE 754-2009 does not place any requirements on NaN payload propagation, it only makes a recommendation. The quoted section only says should, not shall. In §1.6, should is defined as:

In this standard three words are used to differentiate between different levels of requirements and optionality, as follows: [...]

  • should indicates that among several possibilities, one is recommended as particularly suitable, without mentioning or excluding others; or that a certain course of action is preferred but not necessarily required; or that (in the negative form) a certain course of action is deprecated but not prohibited (“should” means “is recommended to”).

Third, Webassembly also calls this a recommendation and does not require that Webassembly implementations follow it:

Floating-point arithmetic follows the IEEE 754 standard, with the following qualifications: [...]

  • Following the recommendation that operators propagate NaN payloads from their operands is permitted but not required.

Fourth, Webassembly places the following (weaker) requirement on NaN payload propagation:

When the result of a floating-point operator other than fneg, fabs, or fcopysign is a NaN, then its sign is non-deterministic and the payload is computed as follows:

  • If the payload of all NaN inputs to the operator is canonical (including the case that there are no NaN inputs), then the payload of the output is canonical as well.
  • Otherwise the payload is picked non-deterministically among all arithmetic NaNs; that is, its most significant bit is and all others are unspecified.

Leaving aside fneg, fabs and fcopysign (which Glulx doesn't have opcodes for), this requirement is fulfilled if every floating point operation that returns a NaN returns the canonical NaN of either sign (the second clause permits but doesn't require this when some inputs were non-canonical NaNs). Here, "canonical NaN" means a qNaN with payload 0. Unlike IEEE 754-2019, Webassembly considers the "quiet bit" part of the "payload" so they define the canonical NaN as "MSB of the payload is set, other bits are 0". Note that this definition of "canonical" is unrelated to the term "canonical encoding" from IEEE 754. In any case, wasm's "canonical NaN" is exactly the encoding that encode_float prefers and the one encode_double would return if it set the low half to 0 instead of 1.

Finally: Glulxe propagates the sign bit of NaNs, but this is not required by Webassembly ("the sign is non-deterministic" in the previous quote) or by IEEE 754 (see §6.3).

dfoxfranke commented 2 weeks ago

The only problem for Webassembly is that the low half of a NaN result is set to 1 instead of 0.

Correct: I found that changing this was necessary and sufficient for all the relevant tests to pass.

First: the "sign bit operations" (copy, negate and copySign from §5.5.1 in IEEE 754-2019, spelled fNN.{abs,neg,copysign} in Webassembly) are required to operate on bit strings and thus propagate NaN encodings unchanged, except modifying the sign bit. However, Glulx doesn't provide these operations at all. They're required for wasm2glulx, but they can and must be implemented with integer bitwise operations, which side-steps encode_{float,double} entirely.

Also right. Wasm2Glulx does implement these by way of bitwise operations. Doing so is already simple and efficient so there's no need for Glulx to change anything to support these ops.

Second, §6.2.3 of IEEE 754-2009 does not place any requirements on NaN payload propagation, it only makes a recommendation. The quoted section only says should, not shall.

Ah, well spotted; I missed that. Glulx is nonetheless in violation of the "should" but you're right that it doesn't break any requirements. So maybe we should just make the one-line change to change the double payload to a zero and the rest isn't worth the trouble.

hanna-kruppe commented 2 weeks ago

It might be nice if all the hardware vendors, language specifications, and virtual machines (like Webassembly) agreed on some way of propagating NaN payloads, but many don't and software doesn't seem to care. Using payloads to distinguish different kinds of errors is rarely attempted and doesn't work reliably for many reasons. That leaves two use cases where the bits of NaN values matter:

  1. Enabling NaN boxing (e.g., qNaN with payload 0 is a proper float value, other payloads encode integers or object references). This only requires Webassembly's weaker rule, which was specifically designed to enable NaN boxing without sacrificing performance on hardware that doesn't propagate payloads (e.g., RISC-V).
  2. Ensuring bit-identical results across different platforms, even if NaNs are fed into sign bit operations or have their encoding interpreted as integer. The IEEE recommendations are insufficient for this because it leaves many things undefined (sign bit, how to choose the payload to propagate if there's multiple NaN inputs, how to choose payloads when there's no NaN inputs).

Always canonicalizing NaN results into a positive qNaN with payload 0 is easier to implement in software and covers both use cases. So I think that behavior is preferable over emulating payload propagation.

dfoxfranke commented 2 weeks ago

Okay, I'm convinced: payload propagation just isn't worth it. I linked a one-line PR which just changes the payload to zero. It leaves existing logic in place to preserve signs.