WebAssembly / relaxed-simd

Relax the strict determinism requirements of SIMD operations.
Other
43 stars 8 forks source link

Floating point vs integer and fixed point #104

Open penzn opened 1 year ago

penzn commented 1 year ago

A bit of backstory for the discussion, some of this is opinion, but hopefully at least somewhat helpful.

I think it is useful to think about the operations as belonging to two categories: one dealing with floating point semantics and the other with other platform specifics (mostly integer). What this allows is separating questions regarding acceptable floating point output from other, arguably less tricky ones, like encoding invalid values when converting floats to ints. This division is somewhat subjective, but might become clearer with more concrete examples below.

Relaxed versions of existing 'integer' SIMD operations

Swizzle, laneselect, and float to int converstions in existing SIMD spec have Arm semantics, and new operations match them on Arm, while having different output on x86. Unlike floating point the differences are much more subjective (for example, should the invalid value be all zeros or all ones). It might be even possible to imagine a world where both flavors coexist. Emulating such operations is likely to be less tedious than trying to emulate an operations with better FP accuracy, plus they generally don't deviate from semantics already established for scalar operations.

Relaxed versions of existing floating point SIMD operations

The gist is that x86 operations, unlike Arm operations, "short circuit" on NaN and disregard the sign of zero.

Code that cannot rule out NaN inputs would likely expect more symmetric variants that what x86 is providing natively, and there are well known instruction sequences that would bring the behavior up to, say C++ spec, or one or the other IEEE standard. Obviously, the proposed operations have vastly better performance on x86 than the strict ones, but for code that doesn't rule out NaNs there needs to be some mitigation (along the lines of what native libraries do), which still might be worth it from performance point of view.

New operations

Just to summarize:

I think in general those have the same FP vs non-FP considerations as above, with a few extras (like single rounding FMA). The fact that those are new may not be an advantage.

penzn commented 1 year ago

This is a partial answer to @titzer's question about what the alternatives for "union" approach are. I haven't looked into the newer operations as close as the old ones.

penzn commented 1 year ago

Looked into this as a side effect of a different project.

FMA

True FMA can only be emulated via integer ops - the inputs need to be broken up into components, both operations performed, then result needs to be rounded and stored back into a float. It should take about 5 additions and 5 multiplication to get the result. This is expensive, though some existing SIMD instructions have even worse lowering (unsigned int conversions for example).

Floating-point min and max

Edit: removed a couple paragraphs describing emulation of x86 floating-point min and max, since we already have those in the standard. Thanks to @abrown for pointing this out.

We have both deterministic variants in the spec already: