Open elliottslaughter opened 7 months ago
I put in an initial change to just shut off vectorization, for now:
https://gitlab.com/StanfordLegion/legion/-/merge_requests/1205
I can't find good documentation on the ARM instruction set so I've been looking through random LLVM tests trying to find the names of the intrinsics. I see:
https://github.com/llvm/llvm-project/blob/main/llvm/test/CodeGen/ARM/vpadd.ll#L64
But this is a 64-bit (?!) vector and I see no options for larger vector widths or for 64-bit floating point.
I merged the change to disable the vectorizer so that at least we're not generating invalid vector intrinsics for ARM now.
I can't find good documentation on the ARM instruction set
This is pretty good: https://arm-software.github.io/acle/neon_intrinsics/advsimd.html
For example, here are the addition ones: https://arm-software.github.io/acle/neon_intrinsics/advsimd.html#addition
Just since we were discussing it here is the 2-wide 64-bit add instruction: https://developer.arm.com/architectures/instruction-sets/intrinsics/vaddq_s64
Does anyone know where the LLVM versions of these intrinsics are defined? Grepping through the LLVM 17.0.5 source, I cannot find any evidence that these instructions are exposed as intrinsics. Presumably they must be, but I can't find them.
@pmccormick?
They use TableGen to generate .h header files with all the intrinsics. For neon as an example the .td file (input to TableGen) is in https://github.com/llvm/llvm-project/blob/main/clang/include/clang/Basic/arm_neon.td
We're missing the case for ARM here:
https://gitlab.com/StanfordLegion/legion/-/blob/d44c4d4062cb726a64aba19c023844ec62ef1077/language/src/regent/vectorize_loops.t#L101
In addition to some severely lacking CPU detection logic:
https://gitlab.com/StanfordLegion/legion/-/blob/d44c4d4062cb726a64aba19c023844ec62ef1077/language/src/regent/vectorize_loops.t#L33-40
This results in us attempting to generate x86 intrinsics for ARM, which obviously fails.