Quuxplusone / LLVMBugzillaTest

0 stars 0 forks source link

[AVX] missing intrinsics for integer min/max instructions #9112

Closed Quuxplusone closed 13 years ago

Quuxplusone commented 13 years ago
Bugzilla Link PR10496
Status RESOLVED INVALID
Importance P normal
Reported by Matt Pharr (matt@pharr.org)
Reported on 2011-07-26 05:34:39 -0700
Last modified on 2011-08-04 07:45:38 -0700
Version trunk
Hardware PC All
CC bruno.cardoso@gmail.com, llvm-bugs@lists.llvm.org
Fixed by commit(s)
Attachments
Blocks
Blocked by
See also

AVX has instructions for {signed,unsigned} x {byte,word,dword} x {min,max} of AVX registers. (VPMINUB, VPMAXSD, etc, etc.) It looks like these aren't yet supported in LLVM.

Quuxplusone commented 13 years ago
Hi Matt,

What do you mean by don't support? There are definitions:

defm VPMINUB  : PDI_binop_rm_int<0xDA, "vpminub", int_x86_sse2_pminu_b, 1, 0>,
                                 VEX_4V;
defm VPMINSW  : PDI_binop_rm_int<0xEA, "vpminsw", int_x86_sse2_pmins_w, 1, 0>,
                                 VEX_4V;
defm VPMAXUB  : PDI_binop_rm_int<0xDE, "vpmaxub", int_x86_sse2_pmaxu_b, 1, 0>,
                                 VEX_4V;
defm VPMAXSW  : PDI_binop_rm_int<0xEE, "vpmaxsw", int_x86_sse2_pmaxs_w, 1, 0>,
                                 VEX_4V;

And also intrinsics available on emmintrin.h (clang), tests in
clang/test/CodeGen/builtins-x86.c and test/CodeGen/X86/avx-intrinsics-x86.ll.
Let me know what's the issue you are running into.
Quuxplusone commented 13 years ago
Hmm, I'm still only seeing the SSE versions of those, not the <8 x i32> AVX
ones--the avx intrinsics test only seems to test the SSE ones.  Also, if I do:

% grep pminu include/llvm/Intrinsic* | grep x86

I only see mmx and sse versions of those.  (The way I look up intrinsics tends
to be the preceeding, which I can't necessarily defend as the best way to do
so!)

(Also, how come the below are defined as x86_sse2 rather than x86_avx?)
Quuxplusone commented 13 years ago
We don't want to duplicate intrinsics, so we just match x86_sse2* to the right
sse/avx instruction later. Both VPMINUB and PMINUB for example, are matched
from int_x86_sse2_pminu_b. If you check other compilers you will see that they
don't define a new builtin for every new AVX encoded 128-bit instruction, we do
the same with the builtins and the IR intrinsics. You can use
int_x86_sse2_pminu_b in your generated IR, and when specified -mattr=+avx, you
get the right instruction. If it happens that some of them won't select the
right instruction, then let me know cause it's a bug! We have tests to check
all of them though, look at test/CodeGen/X86/avx-intrinsics-x86.ll
Quuxplusone commented 13 years ago
Got it; thanks.  Sorry for the confusion. I will do that and let you know if
anything doesn't work as expected.

Thanks,
-matt
Quuxplusone commented 13 years ago
I still seem to be confused (or something is missing).  I'd like to take two <8
x i32> vectors and compute the element-wise minimum of them.  My understanding
from your explanation below is that the following should work, but llc -
mattr=+avx gives me two errors of:

Intrinsic prototype has incorrect number of vector elements!
<8 x i32> (<8 x i32>, <8 x i32>)* @llvm.x86.sse41.pminsd

before crashing.  (Note the 'dword' / i32 variants of the integer min/max stuff
were introduced in sse4.1.)

Help?

declare <8 x i32> @llvm.x86.sse41.pminsd(<8 x i32>, <8 x i32>) nounwind readnone

define <8 x i32> @m(<8 x i32> %a, <8 x i32> %b) nounwind readnone {
  %s = call <8 x i32> @llvm.x86.sse41.pminsd(<8 x i32> %a, <8 x i32> %b)
  ret <8 x i32> %s
}
Quuxplusone commented 13 years ago

Hi,

I think you're still making confusion! :)

There's no pminsd instruction for 256-bit vectors, only re-encoded ones for 128-bit. So, if you want to have min/max for 256-bit, you should issue two @llvm.x86.sse41.pminsd and join the result vectors. This happens because llvm doesn't have native "max" or "min", so if you're using intrinsics, you should use the right ones, because the target independent legalizer won't touch it.

AVX2 probably has a max/min 256-bit instruction, but this is future-future work!

Quuxplusone commented 13 years ago

Yes, or to put it more directly: I'm an idiot! :-)

I somehow had forgotten that AVX1 doesn't have 8-wide integer stuff, even though I have known that quite well for quite some time.

I will partially blame my confusion on the latest Intel Programming Guide, which has the new AVX2 instructions mixed in there (which do have 8-wide integer ops), but doesn't make clear which are AVX and which are AVX2.

But indeed, yes, now I'm definitely straightened out. Sorry for the repeated trouble.