Maratyszcza commented 3 years ago

Introduction

This is proposal to add 64-bit variant of existing abs instruction. ARM64 and x86 with AVX512 natively support this instruction, and on earlier instruction sets it can be emulated with 3-5 instructions.

Applications

Mapping to Common Instruction Sets

This section illustrates how the new WebAssembly instructions can be lowered on common instruction sets. However, these patterns are provided only for convenience, compliant WebAssembly implementations do not have to follow the same code generation patterns.

x86/x86-64 processors with AVX512F and AVX512VL instruction sets

i64x2.abs
- y = i64x2.abs(x) is lowered to VPABSQ xmm_y, xmm_x

x86/x86-64 processors with AVX instruction set

i64x2.abs
- y = i64x2.abs(x) (x is not y) is lowered to:
- VPXOR xmm_y, xmm_y, xmm_y
- VPSUBQ xmm_y, xmm_y, xmm_x
- VBLENDVPD xmm_y, xmm_x, xmm_y, xmm_x

x86/x86-64 processors with SSE4.1 instruction set

i64x2.abs
- y = i64x2.abs(x) (x is not y and x/y is not in xmm0) is lowered to:
- PXOR xmm0, xmm0, xmm0
- PSUBQ xmm0, xmm_x
- MOVDQA xmm_y, xmm0
- BLENDVPD xmm_y, xmm_x

x86/x86-64 processors with SSE2 instruction set

i64x2.abs
- y = i64x2.abs(x) is lowered to:
- PSHUFD xmm_tmp, xmm_x, 0xF5
- MOVDQA xmm_y, xmm_x
- PSRAD xmm_tmp, 31
- PXOR xmm_y, xmm_tmp
- PSUBQ xmm_y, xmm_tmp
- x = i64x2.abs(x) is lowered to:
- PSHUFD xmm_tmp, xmm_x, 0xF5
- PSRAD xmm_tmp, 31
- PXOR xmm_x, xmm_tmp
- PSUBQ xmm_x, xmm_tmp

ARM64 processors

i64x2.abs
- y = i64x2.abs(x) is lowered to ABS Vy.2D, Vx.2D

ARMv7 processors with NEON instruction set

i64x2.abs
- y = i64x2.abs(x) is lowered to:
- VSHR.S64 Qtmp, Qx, #63
- VEOR Qy, Qy, Qtmp
- VSUB.I64 Qy, Qx, Qtmp

jan-wassenberg commented 3 years ago

Strong support, I'm adding this to Highway as well. It would be much harder for users to emulate this, especially if we do not add sign select nor i64 gt_s.

dtig commented 3 years ago

Adding a preliminary vote for the inclusion of i64x2.abs operation to the SIMD proposal below. Please vote with -

👍 For including i64x2.abs 👎 Against including i64x2.abs

penzn commented 3 years ago

I do have an issue with examples here - they seem to be all wrapper libraries. It isn't surprising that wrapper libraries would ave all sorts of operations, but this isn't the same as an app somebody could run.