Closed Quuxplusone closed 13 years ago
Hi Matt,
What do you mean by don't support? There are definitions:
defm VPMINUB : PDI_binop_rm_int<0xDA, "vpminub", int_x86_sse2_pminu_b, 1, 0>,
VEX_4V;
defm VPMINSW : PDI_binop_rm_int<0xEA, "vpminsw", int_x86_sse2_pmins_w, 1, 0>,
VEX_4V;
defm VPMAXUB : PDI_binop_rm_int<0xDE, "vpmaxub", int_x86_sse2_pmaxu_b, 1, 0>,
VEX_4V;
defm VPMAXSW : PDI_binop_rm_int<0xEE, "vpmaxsw", int_x86_sse2_pmaxs_w, 1, 0>,
VEX_4V;
And also intrinsics available on emmintrin.h (clang), tests in
clang/test/CodeGen/builtins-x86.c and test/CodeGen/X86/avx-intrinsics-x86.ll.
Let me know what's the issue you are running into.
Hmm, I'm still only seeing the SSE versions of those, not the <8 x i32> AVX
ones--the avx intrinsics test only seems to test the SSE ones. Also, if I do:
% grep pminu include/llvm/Intrinsic* | grep x86
I only see mmx and sse versions of those. (The way I look up intrinsics tends
to be the preceeding, which I can't necessarily defend as the best way to do
so!)
(Also, how come the below are defined as x86_sse2 rather than x86_avx?)
We don't want to duplicate intrinsics, so we just match x86_sse2* to the right
sse/avx instruction later. Both VPMINUB and PMINUB for example, are matched
from int_x86_sse2_pminu_b. If you check other compilers you will see that they
don't define a new builtin for every new AVX encoded 128-bit instruction, we do
the same with the builtins and the IR intrinsics. You can use
int_x86_sse2_pminu_b in your generated IR, and when specified -mattr=+avx, you
get the right instruction. If it happens that some of them won't select the
right instruction, then let me know cause it's a bug! We have tests to check
all of them though, look at test/CodeGen/X86/avx-intrinsics-x86.ll
Got it; thanks. Sorry for the confusion. I will do that and let you know if
anything doesn't work as expected.
Thanks,
-matt
I still seem to be confused (or something is missing). I'd like to take two <8
x i32> vectors and compute the element-wise minimum of them. My understanding
from your explanation below is that the following should work, but llc -
mattr=+avx gives me two errors of:
Intrinsic prototype has incorrect number of vector elements!
<8 x i32> (<8 x i32>, <8 x i32>)* @llvm.x86.sse41.pminsd
before crashing. (Note the 'dword' / i32 variants of the integer min/max stuff
were introduced in sse4.1.)
Help?
declare <8 x i32> @llvm.x86.sse41.pminsd(<8 x i32>, <8 x i32>) nounwind readnone
define <8 x i32> @m(<8 x i32> %a, <8 x i32> %b) nounwind readnone {
%s = call <8 x i32> @llvm.x86.sse41.pminsd(<8 x i32> %a, <8 x i32> %b)
ret <8 x i32> %s
}
Hi,
I think you're still making confusion! :)
There's no pminsd instruction for 256-bit vectors, only re-encoded ones for 128-bit. So, if you want to have min/max for 256-bit, you should issue two @llvm.x86.sse41.pminsd and join the result vectors. This happens because llvm doesn't have native "max" or "min", so if you're using intrinsics, you should use the right ones, because the target independent legalizer won't touch it.
AVX2 probably has a max/min 256-bit instruction, but this is future-future work!
Yes, or to put it more directly: I'm an idiot! :-)
I somehow had forgotten that AVX1 doesn't have 8-wide integer stuff, even though I have known that quite well for quite some time.
I will partially blame my confusion on the latest Intel Programming Guide, which has the new AVX2 instructions mixed in there (which do have 8-wide integer ops), but doesn't make clear which are AVX and which are AVX2.
But indeed, yes, now I'm definitely straightened out. Sorry for the repeated trouble.
AVX has instructions for {signed,unsigned} x {byte,word,dword} x {min,max} of AVX registers. (VPMINUB, VPMAXSD, etc, etc.) It looks like these aren't yet supported in LLVM.