Closed Maratyszcza closed 3 years ago
Strong support, I'm adding this to Highway as well. It would be much harder for users to emulate this, especially if we do not add sign select nor i64 gt_s.
Adding a preliminary vote for the inclusion of i64x2.abs operation to the SIMD proposal below. Please vote with -
👍 For including i64x2.abs 👎 Against including i64x2.abs
I do have an issue with examples here - they seem to be all wrapper libraries. It isn't surprising that wrapper libraries would ave all sorts of operations, but this isn't the same as an app somebody could run.
Fixed a bug in suggested lowering on SSE2 and ARM NEON (thanks @ngzhian for reporting).
Introduction
This is proposal to add 64-bit variant of existing
abs
instruction. ARM64 and x86 with AVX512 natively support this instruction, and on earlier instruction sets it can be emulated with 3-5 instructions.Applications
Mapping to Common Instruction Sets
This section illustrates how the new WebAssembly instructions can be lowered on common instruction sets. However, these patterns are provided only for convenience, compliant WebAssembly implementations do not have to follow the same code generation patterns.
x86/x86-64 processors with AVX512F and AVX512VL instruction sets
y = i64x2.abs(x)
is lowered toVPABSQ xmm_y, xmm_x
x86/x86-64 processors with AVX instruction set
y = i64x2.abs(x)
(x
is noty
) is lowered to:VPXOR xmm_y, xmm_y, xmm_y
VPSUBQ xmm_y, xmm_y, xmm_x
VBLENDVPD xmm_y, xmm_x, xmm_y, xmm_x
x86/x86-64 processors with SSE4.1 instruction set
y = i64x2.abs(x)
(x
is noty
andx
/y
is not inxmm0
) is lowered to:PXOR xmm0, xmm0, xmm0
PSUBQ xmm0, xmm_x
MOVDQA xmm_y, xmm0
BLENDVPD xmm_y, xmm_x
x86/x86-64 processors with SSE2 instruction set
y = i64x2.abs(x)
is lowered to:PSHUFD xmm_tmp, xmm_x, 0xF5
MOVDQA xmm_y, xmm_x
PSRAD xmm_tmp, 31
PXOR xmm_y, xmm_tmp
PSUBQ xmm_y, xmm_tmp
x = i64x2.abs(x)
is lowered to:PSHUFD xmm_tmp, xmm_x, 0xF5
PSRAD xmm_tmp, 31
PXOR xmm_x, xmm_tmp
PSUBQ xmm_x, xmm_tmp
ARM64 processors
y = i64x2.abs(x)
is lowered toABS Vy.2D, Vx.2D
ARMv7 processors with NEON instruction set
y = i64x2.abs(x)
is lowered to:VSHR.S64 Qtmp, Qx, #63
VEOR Qy, Qy, Qtmp
VSUB.I64 Qy, Qx, Qtmp