StanfordLegion / legion

The Legion Parallel Programming System
https://legion.stanford.edu
Apache License 2.0
657 stars 146 forks source link

Regent: Implement vectorization on ARM #1676

Open elliottslaughter opened 2 months ago

elliottslaughter commented 2 months ago

We're missing the case for ARM here:

https://gitlab.com/StanfordLegion/legion/-/blob/d44c4d4062cb726a64aba19c023844ec62ef1077/language/src/regent/vectorize_loops.t#L101

In addition to some severely lacking CPU detection logic:

https://gitlab.com/StanfordLegion/legion/-/blob/d44c4d4062cb726a64aba19c023844ec62ef1077/language/src/regent/vectorize_loops.t#L33-40

This results in us attempting to generate x86 intrinsics for ARM, which obviously fails.

elliottslaughter commented 2 months ago

I put in an initial change to just shut off vectorization, for now:

https://gitlab.com/StanfordLegion/legion/-/merge_requests/1205

I can't find good documentation on the ARM instruction set so I've been looking through random LLVM tests trying to find the names of the intrinsics. I see:

https://github.com/llvm/llvm-project/blob/main/llvm/test/CodeGen/ARM/vpadd.ll#L64

But this is a 64-bit (?!) vector and I see no options for larger vector widths or for 64-bit floating point.

elliottslaughter commented 2 months ago

I merged the change to disable the vectorizer so that at least we're not generating invalid vector intrinsics for ARM now.

lightsighter commented 2 months ago

I can't find good documentation on the ARM instruction set

This is pretty good: https://arm-software.github.io/acle/neon_intrinsics/advsimd.html

For example, here are the addition ones: https://arm-software.github.io/acle/neon_intrinsics/advsimd.html#addition

Just since we were discussing it here is the 2-wide 64-bit add instruction: https://developer.arm.com/architectures/instruction-sets/intrinsics/vaddq_s64

elliottslaughter commented 2 months ago

Does anyone know where the LLVM versions of these intrinsics are defined? Grepping through the LLVM 17.0.5 source, I cannot find any evidence that these instructions are exposed as intrinsics. Presumably they must be, but I can't find them.

@pmccormick?

seemamirch commented 2 months ago

They use TableGen to generate .h header files with all the intrinsics. For neon as an example the .td file (input to TableGen) is in https://github.com/llvm/llvm-project/blob/main/clang/include/clang/Basic/arm_neon.td