WebAssembly / relaxed-simd

Relax the strict determinism requirements of SIMD operations.
Other
43 stars 9 forks source link

Relaxed Rounding Q-format Multiplication #40

Open Maratyszcza opened 3 years ago

Maratyszcza commented 3 years ago

What are the instructions being proposed?

I propose a relaxed version of the Saturating Rounding Q-format Multiplication i16x8.q15mulr_sat_s introduced in WebAssembly/simd#365. I suggest i16x8.q15mulr_s as the tentative name for the relaxed instruction.

What are the semantics of these instructions?

i16x8.q15mulr_sat_s implements the mathematical operation of multiplication of fixed-point numbers in Q15 format (see WebAssembly/simd#365 for details). The multiplication overflows if and only if both inputs are INT16_MIN, and x86 SSSE3 and ARM NEON instructions differ in how they handle this situation: x86 version wraps around while ARM version saturates. WebAssembly SIMD instruction i16x8.q15mulr_sat_s standardized on the ARM overflow semantics, resulting in additional overflow checks on x86. However, as the case of both inputs INT16_MIN is rare and often can be guaranteed to never happen due to higher-level structure of an algorithm, having an relaxed version that allows both overflow options would help performance on x86.

The proposed i16x8.q15mulr_s Relaxed SIMD instruction computes the lane-wise rounded multiplication of Q15 numbers, and allows for either saturation or wrap-around behavior in the overflow case (where both inputs are INT16_MIN).

How will these instructions be implemented?

x86/x86-64 processors with AVX instruction set

x86/x86-64 processors with SSSE3 instruction set

x86/x86-64 processors with SSE2 instruction set

ARM64 processors

ARMv7 processors with NEON instruction set

Reference lowering through the WAsm SIMD128 instruction set

How does behavior differ across processors? What new fingerprinting surfaces will be exposed?

When both inputs are INT16_MIN, x86/x86-64 will produce INT16_MIN result while ARM/ARM64 will produce INT16_MAX result. x86/x86-64 can already be distinguished from ARM/ARM64 based on NaN behavior, so this instruction doesn't add any new fingerprinting surfaces.

What use cases are there?

ngzhian commented 2 years ago

Instruction LGTM, please leave comments or thumbs up. I will add this to overview some time next week.