WebAssembly / simd

Branch of the spec repo scoped to discussion of SIMD in WebAssembly
Other
531 stars 43 forks source link

i64x2.eq instruction #381

Closed Maratyszcza closed 3 years ago

Maratyszcza commented 3 years ago

Introduction

This is proposal to add 64-bit variant of existing eq instruction. ARM64 and x86 (since SSE4.1) natively support this instruction, and on ARMv7 NEON and SSE2 is can be efficiently emulated with 3-4 instructions.

Applications

Mapping to Common Instruction Sets

This section illustrates how the new WebAssembly instructions can be lowered on common instruction sets. However, these patterns are provided only for convenience, compliant WebAssembly implementations do not have to follow the same code generation patterns.

x86/x86-64 processors with AVX instruction set

x86/x86-64 processors with SSE4.1 instruction set

x86/x86-64 processors with SSE2 instruction set

ARM64 processors

ARMv7 processors with NEON instruction set

ngzhian commented 3 years ago

Any reason why we specifically want only i64x2.eq? In #101 we decided that the set of i64x2 instructions we would keep did not include any of the comparisons.

Maratyszcza commented 3 years ago

@ngzhian Of course, I'd rather have a full set of compare instructions, but ordered comparisons are hard to emulate in lieu of hardware support. On the other side, emulating 64-bit compare is trivial, and it is in our baseline ISAs (SSE4.1 and ARM64 NEON).

ngzhian commented 3 years ago

It looks incomplete that we only have i64x2.eq, and no other i64x2 comparisons.

How useful will only adding this instruction be? Are there use cases where adding this instruction is sufficient to unlock?

Maratyszcza commented 3 years ago

I don't have any use-cases in mind, just trying to orthogonalize the instruction set.

lemaitre commented 3 years ago

I have no code to present, but a use case for that is when vectorizing code that mix doubles and integers: in order to limit the number of shuffles (going back and forth 32-bit elements), one would use 64-bit integers. There, it would be nice to have a i64x2.eq.

tlively commented 3 years ago

This has been prototyped in LLVM (but not Binaryen) as __builtin_wasm_eq_i64x2. It should be usable from tot Emscripten in a few hours as long as you don't use optimization flags at link time.

omnisip commented 3 years ago

Any reason why we specifically want only i64x2.eq? In #101 we decided that the set of i64x2 instructions we would keep did not include any of the comparisons.

Ditto on the same question. When posted to Stackoverflow regarding pcmpgtq, a response was provided that produced a high-quality result for both SSE2 as well as ARMv7+Neon.

Maratyszcza commented 3 years ago

Added examples of applications

abrown commented 3 years ago

I actually think this would be nice to add if it didn't have orthogonality implications. Could we just merge this without i64x2.ne and the i64x2 comparisons?