Closed gnzlbg closed 5 years ago
Duh...
What I was missing is that LaneIdx32
is an immediate mode operand.
Honestly, I think the spec should be a bit more idiot proof here, by:
LaneIdx32
to ImmLaneIdx32
(for consistency with ImmByte
) s
argument in v128.shuffle
to sImm
, imm
, or similarSo it appears that LaneIdx
are not always immediate mode operands (e.g. in the extract_lanes
instructions), which would mean that my original interpretation of the spec is correct, and that the vector shuffles accept dynamic run-time indices.
So I am re-opening this for clarification.
@billbudge how does V8 implement these?
Duh again. So yeah, the lane indices are dynamic, but that's ok because the shuffle
only operates on v8x16
and one can generate efficient code for that on pretty much every architecture.
So... no... the https://github.com/WebAssembly/simd/blob/master/proposals/simd/BinarySIMD.md document says that these operands are immediate mode arguments...
And that the operands for extract lanes are immediates as well...
I'm leaving this open until someone clarifies and the spec is made idiot proof. To check whether v8x16
shuffle takes immediate arguments or not, I read the definition in the spec, which doesn't mention it and doesn't call the argument imm
, like other places in the spec do. Then I have to jump to the definition of the LaneIdx
types, which do not start with Imm_
like ImmBytes
do, and then I have to look at the binary encoding doc which does state that these are immediates. . .
I mean, is it me, or is the spec currently unnecessarily ambiguous ?
If instead of all that I would just read v8x16.shuffle(..., imm: ImmLaneIdx[16])
and had a comment next to it saying that imm
is an immediate mode operand, the spec would be crystal clear.
They are immediate mode arguments, and so are the arguments to hardware shuffle operations (at least on some popular platforms). I don't know why the wording in the spec is the way it is, but you can definitely try submitting a PR to correct it :)
V8's implementation treats the lane index vector as an immediate value. However, there's a strong case to add the dynamic (dynamic shuffle held in a vector register) shuffle. See #24
For what it's worth, there has been a PR that closes this issue open for a long time (https://github.com/WebAssembly/simd/pull/34), who should I ping to figure out what's required for it to make progress ?
However, there's a strong case to add the dynamic (dynamic shuffle held in a vector register) shuffle.
There has also been a PR open to address that for a long time and that hasn't seen much progress: https://github.com/WebAssembly/simd/pull/30 .
who should I ping to figure out what's required for it to make progress ?
@aretmr @PeterJensen are the SIMD champions, perhaps they have thoughts about this.
I've left comments on #34, and opened issue #58 to handle triage of the older issues/PRs.
IIRC this was fixed by #34 , thank you all!
Reading the spec:
it appears that the lane indices argument for the
shuffle
intrinsic is a non-immediate mode argument.If my memory doesn't fail me:
LLVM's
shufflevector
requires compile-time indices.Most architectures do not support shuffling vector lanes of arbitrary width with dynamic indices:
That is, if the user is shuffling a
v8x16
, we can use those instructions directly to shuffle with dynamic indices without issues.If the user attempts to shuffle a
v16x8
,v32x4
, etc. what's the best machine code that we can generate for dynamic indices?If I remember correctly, generating the appropriate sequence of shuffle bytes instructions requires knowing the lane indices during machine code generation. If these aren't known, the only thing a code generator could do AFAICT is fall back to scalar code, that is, copy the vectors to memory, do a scalar shuffle there e.g. by copying into a third vector, and copy the result of this third vector back to a vector register. This sounds like a huge performance cliff to me.
What am I missing? How can we generate efficient vector shuffles from dynamic lane indices?