Open ppenzin opened 3 months ago
From my brief look at this spec, it looks like we'd use dup_odd
to pattern match a pairwise operation. So, IIUC, pattern matching with flexible vectors should be a bit easier than the current SIMD shuffles.
But we would still have the trouble of choosing a canonical form that matches well to hardware and for all the runtimes to perform the matching. For instance, the current shuffle approach in LLVM would map to concat_lower_upper
and that, again, isn't useful for the horizontal FP instructions that I'm aware of.
So, I would definitely be in favour of having dedicated wasm instruction(s), for both fixed and flexible!
Arm hardware-wise, Neon includes faddp
for floats, which are chained for a full reduction, and addv
is used for integer reduction. SVE includes faddv
, which performs a recursive pairwise reduction on floats, but I'm not sure what we use for integers.
We have the SADDV
and UADDV
instructions to deal with integers (signed and unsigned respectively) in SVE. BTW another option for floating-point values is FADDA
, which is strictly ordered, but realistically has a performance cost.
There are also pairwise operations, i.e. ADDP
and FADDP
.
I vaguely remember horizontal ops were not great on x86. @rrwinterton, do you have any thoughts?
Originally thought to be a post-MVP feature: https://github.com/WebAssembly/simd/issues/20
There is PR to LLVM to introduce shuffle patterns that combined with other instructions would translate to horizontal additions - motivation to restart the conversation on horizontal SIMD ops or at least provide a way to disseminate this among the runtimes.
@sparker-arm as the author of the patch, sorry to put you on the spot.