MMX inserts fail - Githubissues

Quuxplusone commented 16 years ago


Bugzilla Link	PR2562
Status	RESOLVED WORKSFORME
Importance	P normal
Reported by	Nicolas Capens (nicolas.capens@gmail.com)
Reported on	2008-07-17 12:46:33 -0700
Last modified on	2009-06-07 12:52:53 -0700
Version	unspecified
Hardware	PC All
CC	efriedma@quicinc.com, evan.cheng@apple.com, isanbard@gmail.com, llvm-bugs@lists.llvm.org, nicolas.capens@gmail.com
Fixed by commit(s)
Attachments	`fibonacci.cpp` (3091 bytes, text/plain) `llvm-dump1.txt` (351 bytes, text/plain) `llvm-dump2.txt` (519 bytes, text/plain)
Blocks
Blocked by
See also

Created attachment 1838
fibonacci.cpp replacement generating the IR

Attempting to insert a 16-bit or 32-bit value into a 64-bit vector fails. Both
currently lower to a shufflevector but I get a "Cannot yet select" assert in
X86GenDAGISel.inc.

Quuxplusone commented 16 years ago

Attached fibonacci.cpp (3091 bytes, text/plain): fibonacci.cpp replacement generating the IR

Quuxplusone commented 16 years ago

Attached llvm-dump1.txt (351 bytes, text/plain): LLVM text for 16-bit insert into 64-bit vector

Quuxplusone commented 16 years ago

Attached llvm-dump2.txt (519 bytes, text/plain): LLVM text for 32-bit insert into 64-bit vector

Quuxplusone commented 16 years ago

MMX is one of Bill's favorite thing in the world.

Quuxplusone commented 16 years ago

I found that a zext works fine for getting a 16-bit or 32-bit value into a 64-bit vector. So that's a good workaround for me for now. Still though, a working insertelement would be nice.

Quuxplusone commented 16 years ago

Fix for 16-bit insert here:

http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20080714/065224.html

Quuxplusone commented 16 years ago

What's the C source for the 32-bit insert case? What does GCC generate for it?

Quuxplusone commented 16 years ago

(In reply to comment #6)
> What's the C source for the 32-bit insert case? What does GCC generate for it?

Hi Bill, a very short sequence for inserting a 32-bit value into the *upper*
half of an MMX register is the following:

movd mm1, eax
punpckldq mm0, mm1   // result in mm0

I don't think an insert into the lower half can be done in two instructions. So
likely the shortest code is this:

movd mm1, eax
punpckldq mm1, mm1
punpckhdq mm1, mm0   // result in mm1

Of course if the source operand is know to be zero then it simply becomes a
single movd.

Note for the 16-bit inserts that pinsrw is part of the SSE instruction set, so
I'm not entirely sure if your partial patch works on older CPUs.

Quuxplusone commented 16 years ago

(In reply to comment #7)
> (In reply to comment #6)
> > What's the C source for the 32-bit insert case? What does GCC generate for
it?
>
> Hi Bill, a very short sequence for inserting a 32-bit value into the *upper*
> half of an MMX register is the following:
>
> movd mm1, eax
> punpckldq mm0, mm1   // result in mm0
>
> I don't think an insert into the lower half can be done in two instructions.
So
> likely the shortest code is this:
>
> movd mm1, eax
> punpckldq mm1, mm1
> punpckhdq mm1, mm0   // result in mm1
>
> Of course if the source operand is know to be zero then it simply becomes a
> single movd.
>
> Note for the 16-bit inserts that pinsrw is part of the SSE instruction set, so
> I'm not entirely sure if your partial patch works on older CPUs.
>
No. It is an MMX instruction according to the documentation. SSE deals with 128
registers but that vector is 64 bits which falls squarely in the MMX realm.

I wanted you to give me source code that would generate your v2i32 insert
because I couldn't find it in the instruction set and am curious to know what
situations it's used in. It sounds to me not like a missing tranformation (like
the v4i16 insert) but an enhancement request.

Quuxplusone commented 16 years ago

(In reply to comment #8)
> No. It is an MMX instruction according to the documentation. SSE deals with
128
> registers but that vector is 64 bits which falls squarely in the MMX realm.

Despite having a variant operating on MMX registers, pinsrw really was
introduced with SSE. The Intel documentation isn't very clear though. But the
AMD documentation reveals the pitfall: pinsrw is part of the AMD extensions to
MMX (indicated by bit 22 of CPUID extended function 8000_0001h). Intel never
really recognised it that way. Anyway, to remove all doubt, I wrote a little
test application and ran it on my old Compaq Armada 1750, equipped with a
Pentium II. It crashed on the pinsrw instruction.

The AMD extensions to MMX, sometimes also referred to as MMX+, were introduced
with the original Athlon. Only the Athlon XP featured SSE. Other MMX+
instructions are: maskmovq, pmaxsw, pminsw, pmaxub, pminub, pmovmskb, mulhuw,
psadbw, pshufw.

Note that Intel does not set bit 22 of CPUID function 80000001. However, all
processors supporting SSE also support MMX+. I'll leave it up to you and the
rest of the team to decide whether to support MMX+ as a separate feature or
classify the instructions as SSE.

I'm afraid this also means that an alternative implementation of <4 x i16>
inserts is necessary for targets without MMX+/SSE.

Quuxplusone commented 16 years ago

(In reply to comment #8)
> I wanted you to give me source code that would generate your v2i32 insert
> because I couldn't find it in the instruction set and am curious to know what
> situations it's used in. It sounds to me not like a missing tranformation
(like
> the v4i16 insert) but an enhancement request.
I'm sorry, could you clarify what you need here? The LLVM IR for 32-bit inserts
into 64-bit vectors has already been attached, and asserts in
X86GenDAGISel.inc, clearly indicating a bug. Let me know if I can be of help
with anything else.

Quuxplusone commented 16 years ago

(In reply to comment #8)
> I wanted you to give me source code that would generate your v2i32 insert
> because I couldn't find it in the instruction set and am curious to know what
> situations it's used in. It sounds to me not like a missing tranformation
(like
> the v4i16 insert) but an enhancement request.

I think I see what you mean now... You'd like to see *C* code (using
intrinsics) that generates a v2i32 insert?

This isn't for a typical C compiler though, but rather for a vectorizing (JIT)
compiler.

Maybe the fibonacci.cpp file I've attached is of use to you? It's currently for
a v4i16 insert but could easily be rewritten for a v2i32 insert.

Thank you.

Quuxplusone commented 16 years ago

(In reply to comment #9)
> Despite having a variant operating on MMX registers, pinsrw really was
> introduced with SSE. The Intel documentation isn't very clear though. But the
> AMD documentation reveals the pitfall: pinsrw is part of the AMD extensions to
> MMX (indicated by bit 22 of CPUID extended function 8000_0001h). Intel never
> really recognised it that way. Anyway, to remove all doubt, I wrote a little
> test application and ran it on my old Compaq Armada 1750, equipped with a
> Pentium II. It crashed on the pinsrw instruction.
>
I see what you mean. I didn't know that it was an AMD extension that Intel
picked up later.

Quuxplusone commented 16 years ago

(In reply to comment #11)
> (In reply to comment #8)
> > I wanted you to give me source code that would generate your v2i32 insert
> > because I couldn't find it in the instruction set and am curious to know
what
> > situations it's used in. It sounds to me not like a missing tranformation
(like
> > the v4i16 insert) but an enhancement request.
>
> I think I see what you mean now... You'd like to see *C* code (using
> intrinsics) that generates a v2i32 insert?
>
> This isn't for a typical C compiler though, but rather for a vectorizing (JIT)
> compiler.
>
> Maybe the fibonacci.cpp file I've attached is of use to you? It's currently
for
> a v4i16 insert but could easily be rewritten for a v2i32 insert.
>
Yeah, the C source code. I just wanted to know how GCC et al treat such a
thing. It could be another language that's generating those types, but I
couldn't find any builtins in the GCC header files that do inserts into such a
vector.

Quuxplusone commented 15 years ago

Is there still an issue here? We don't generate the prettiest code here, but it works as far as I know.

Quuxplusone commented 15 years ago

(In reply to comment #14)
> Is there still an issue here?  We don't generate the prettiest code here, but
> it works as far as I know.

As far as I'm aware it works ok now.

Quuxplusone commented 15 years ago

Okay then, resolving.

Quuxplusone / LLVMBugzillaTest

MMX inserts fail #2793