Are shuffle's lane indices dynamic?

gnzlbg commented 6 years ago

Reading the spec:

Shuffle lanes

v8x16.shuffle(a: v128, b: v128, s: LaneIdx32[16]) -> v128

it appears that the lane indices argument for the shuffle intrinsic is a non-immediate mode argument.

If my memory doesn't fail me:

LLVM's shufflevector requires compile-time indices.
Most architectures do not support shuffling vector lanes of arbitrary width with dynamic indices:
- x86/x86_64 SSSE3's pshufb (and AVX-512 variants) support shuffling bytes with non-immediate mode indices
- ARM 32/64 vtbl/vtbx family of instructions support shuffling bytes with non-immediate mode indices

That is, if the user is shuffling a v8x16, we can use those instructions directly to shuffle with dynamic indices without issues.

If the user attempts to shuffle a v16x8, v32x4, etc. what's the best machine code that we can generate for dynamic indices?

If I remember correctly, generating the appropriate sequence of shuffle bytes instructions requires knowing the lane indices during machine code generation. If these aren't known, the only thing a code generator could do AFAICT is fall back to scalar code, that is, copy the vectors to memory, do a scalar shuffle there e.g. by copying into a third vector, and copy the result of this third vector back to a vector register. This sounds like a huge performance cliff to me.

What am I missing? How can we generate efficient vector shuffles from dynamic lane indices?

gnzlbg commented 6 years ago

Duh...

What I was missing is that LaneIdx32 is an immediate mode operand.

Honestly, I think the spec should be a bit more idiot proof here, by:

renaming LaneIdx32 to ImmLaneIdx32 (for consistency with ImmByte)
renaming the s argument in v128.shuffle to sImm, imm, or similar

gnzlbg commented 6 years ago

So it appears that LaneIdx are not always immediate mode operands (e.g. in the extract_lanes instructions), which would mean that my original interpretation of the spec is correct, and that the vector shuffles accept dynamic run-time indices.

So I am re-opening this for clarification.

gnzlbg commented 6 years ago

@billbudge how does V8 implement these?

gnzlbg commented 6 years ago

Duh again. So yeah, the lane indices are dynamic, but that's ok because the shuffle only operates on v8x16 and one can generate efficient code for that on pretty much every architecture.

gnzlbg commented 6 years ago

So... no... the https://github.com/WebAssembly/simd/blob/master/proposals/simd/BinarySIMD.md document says that these operands are immediate mode arguments...

And that the operands for extract lanes are immediates as well...

I'm leaving this open until someone clarifies and the spec is made idiot proof. To check whether v8x16 shuffle takes immediate arguments or not, I read the definition in the spec, which doesn't mention it and doesn't call the argument imm, like other places in the spec do. Then I have to jump to the definition of the LaneIdx types, which do not start with Imm_ like ImmBytes do, and then I have to look at the binary encoding doc which does state that these are immediates. . .

I mean, is it me, or is the spec currently unnecessarily ambiguous ?

If instead of all that I would just read v8x16.shuffle(..., imm: ImmLaneIdx[16]) and had a comment next to it saying that imm is an immediate mode operand, the spec would be crystal clear.

penzn commented 6 years ago

They are immediate mode arguments, and so are the arguments to hardware shuffle operations (at least on some popular platforms). I don't know why the wording in the spec is the way it is, but you can definitely try submitting a PR to correct it :)

billbudge commented 5 years ago

V8's implementation treats the lane index vector as an immediate value. However, there's a strong case to add the dynamic (dynamic shuffle held in a vector register) shuffle. See #24

gnzlbg commented 5 years ago

For what it's worth, there has been a PR that closes this issue open for a long time (https://github.com/WebAssembly/simd/pull/34), who should I ping to figure out what's required for it to make progress ?

However, there's a strong case to add the dynamic (dynamic shuffle held in a vector register) shuffle.

There has also been a PR open to address that for a long time and that hasn't seen much progress: https://github.com/WebAssembly/simd/pull/30 .

binji commented 5 years ago

who should I ping to figure out what's required for it to make progress ?

@aretmr @PeterJensen are the SIMD champions, perhaps they have thoughts about this.

dtig commented 5 years ago

I've left comments on #34, and opened issue #58 to handle triage of the older issues/PRs.

gnzlbg commented 5 years ago

IIRC this was fixed by #34 , thank you all!

WebAssembly / simd

Are shuffle's lane indices dynamic? #33