Closed penzn closed 9 months ago
While it makes sense that NaN canonicalization in the middle of SIMD loops has substantial overhead, this seems less likely to be the case at component/host boundaries where the cost is amortized by a number of other factors. And the root motivation for this is that many languages and hosts (including browsers) will non-deterministically canonicalize NaNs anyways, so the goal is to avoid a subtle source of importability. That being said, I don't think this particular design choice is set in stone, so if we did find compelling performance data showing meaningful overhead, then it would be worth reconsidering, I think.
It depends on the output, I suppose. For a component returning a large array of floats the whole buffer needs to be processed element by element, even if there are no NaNs in it. Most programming environments don't care about exact NaN bits, notably JS and host languages that would compile to wasm, because as far as arithmetic ops are concerned non-signaling NaN values are equivalent (yes, there is NaN-boxing, but that won't be applicable to interface boundaries).
notably JS
JS ends up caring a fair amount since several JS engines internally use the NaN-boxing value representation, which causes them to have to canonicalize all incoming float values. That being said, that could happen as part of the JS-specific bindings, instead of as part of the Canonical ABI, so that's not a hard requirement or anything. But it is a notable example where non-canonical NaNs are a problem.
Thinking about this some more, it does seem like the cost for a list<f32>
(and perhaps, one day, list<f16>
) could likely end up being significant, whereas the practical benefit is fairly limited to just languages doing NaN-boxing, which are used to canonicalizing at external boundaries anyways. So maybe we should remove this mandatory canonicalization.
@sunfishcode had a good idea though: instead of forcing canonicalization, we could instead specify that, at component-to-component boundaries, canonicalization might (non-deterministically) happen and thus interfaces must not rely on specific NaN payloads to be faithfully transmitted across component boundaries, thereby allowing language implementations to canonicalize if they need to (which JS would). This seems like a pretty good compromise.
I filed #260 to make this change.
Came across what appears a mandate to sanitize all
NaN
values loaded from memory to a specific value:https://github.com/WebAssembly/component-model/blob/673d5c43c3cc0f4aeb8996a5c0931af623f16808/design/mvp/CanonicalABI.md?plain=1#L466-L488 https://github.com/WebAssembly/component-model/blob/673d5c43c3cc0f4aeb8996a5c0931af623f16808/design/mvp/Explainer.md?plain=1#L554-L555 A presentation on in SIMD subgroup admitted that this actually has a substantial overhead on x86 (see slide 7 in the PDF). Can we get a bit more clarity on this, is this proposed to be implemented in browsers when they support Component Model? Is it going to be required in WASI?