Closed Quuxplusone closed 15 years ago
Attached fibonacci.cpp
(3091 bytes, text/plain): fibonacci.cpp replacement generating the IR
Attached llvm-dump1.txt
(351 bytes, text/plain): LLVM text for 16-bit insert into 64-bit vector
Attached llvm-dump2.txt
(519 bytes, text/plain): LLVM text for 32-bit insert into 64-bit vector
MMX is one of Bill's favorite thing in the world.
I found that a zext works fine for getting a 16-bit or 32-bit value into a 64-bit vector. So that's a good workaround for me for now. Still though, a working insertelement would be nice.
Fix for 16-bit insert here:
http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20080714/065224.html
What's the C source for the 32-bit insert case? What does GCC generate for it?
(In reply to comment #6)
> What's the C source for the 32-bit insert case? What does GCC generate for it?
Hi Bill, a very short sequence for inserting a 32-bit value into the *upper*
half of an MMX register is the following:
movd mm1, eax
punpckldq mm0, mm1 // result in mm0
I don't think an insert into the lower half can be done in two instructions. So
likely the shortest code is this:
movd mm1, eax
punpckldq mm1, mm1
punpckhdq mm1, mm0 // result in mm1
Of course if the source operand is know to be zero then it simply becomes a
single movd.
Note for the 16-bit inserts that pinsrw is part of the SSE instruction set, so
I'm not entirely sure if your partial patch works on older CPUs.
(In reply to comment #7)
> (In reply to comment #6)
> > What's the C source for the 32-bit insert case? What does GCC generate for
it?
>
> Hi Bill, a very short sequence for inserting a 32-bit value into the *upper*
> half of an MMX register is the following:
>
> movd mm1, eax
> punpckldq mm0, mm1 // result in mm0
>
> I don't think an insert into the lower half can be done in two instructions.
So
> likely the shortest code is this:
>
> movd mm1, eax
> punpckldq mm1, mm1
> punpckhdq mm1, mm0 // result in mm1
>
> Of course if the source operand is know to be zero then it simply becomes a
> single movd.
>
> Note for the 16-bit inserts that pinsrw is part of the SSE instruction set, so
> I'm not entirely sure if your partial patch works on older CPUs.
>
No. It is an MMX instruction according to the documentation. SSE deals with 128
registers but that vector is 64 bits which falls squarely in the MMX realm.
I wanted you to give me source code that would generate your v2i32 insert
because I couldn't find it in the instruction set and am curious to know what
situations it's used in. It sounds to me not like a missing tranformation (like
the v4i16 insert) but an enhancement request.
(In reply to comment #8)
> No. It is an MMX instruction according to the documentation. SSE deals with
128
> registers but that vector is 64 bits which falls squarely in the MMX realm.
Despite having a variant operating on MMX registers, pinsrw really was
introduced with SSE. The Intel documentation isn't very clear though. But the
AMD documentation reveals the pitfall: pinsrw is part of the AMD extensions to
MMX (indicated by bit 22 of CPUID extended function 8000_0001h). Intel never
really recognised it that way. Anyway, to remove all doubt, I wrote a little
test application and ran it on my old Compaq Armada 1750, equipped with a
Pentium II. It crashed on the pinsrw instruction.
The AMD extensions to MMX, sometimes also referred to as MMX+, were introduced
with the original Athlon. Only the Athlon XP featured SSE. Other MMX+
instructions are: maskmovq, pmaxsw, pminsw, pmaxub, pminub, pmovmskb, mulhuw,
psadbw, pshufw.
Note that Intel does not set bit 22 of CPUID function 80000001. However, all
processors supporting SSE also support MMX+. I'll leave it up to you and the
rest of the team to decide whether to support MMX+ as a separate feature or
classify the instructions as SSE.
I'm afraid this also means that an alternative implementation of <4 x i16>
inserts is necessary for targets without MMX+/SSE.
(In reply to comment #8)
> I wanted you to give me source code that would generate your v2i32 insert
> because I couldn't find it in the instruction set and am curious to know what
> situations it's used in. It sounds to me not like a missing tranformation
(like
> the v4i16 insert) but an enhancement request.
I'm sorry, could you clarify what you need here? The LLVM IR for 32-bit inserts
into 64-bit vectors has already been attached, and asserts in
X86GenDAGISel.inc, clearly indicating a bug. Let me know if I can be of help
with anything else.
(In reply to comment #8)
> I wanted you to give me source code that would generate your v2i32 insert
> because I couldn't find it in the instruction set and am curious to know what
> situations it's used in. It sounds to me not like a missing tranformation
(like
> the v4i16 insert) but an enhancement request.
I think I see what you mean now... You'd like to see *C* code (using
intrinsics) that generates a v2i32 insert?
This isn't for a typical C compiler though, but rather for a vectorizing (JIT)
compiler.
Maybe the fibonacci.cpp file I've attached is of use to you? It's currently for
a v4i16 insert but could easily be rewritten for a v2i32 insert.
Thank you.
(In reply to comment #9)
> Despite having a variant operating on MMX registers, pinsrw really was
> introduced with SSE. The Intel documentation isn't very clear though. But the
> AMD documentation reveals the pitfall: pinsrw is part of the AMD extensions to
> MMX (indicated by bit 22 of CPUID extended function 8000_0001h). Intel never
> really recognised it that way. Anyway, to remove all doubt, I wrote a little
> test application and ran it on my old Compaq Armada 1750, equipped with a
> Pentium II. It crashed on the pinsrw instruction.
>
I see what you mean. I didn't know that it was an AMD extension that Intel
picked up later.
(In reply to comment #11)
> (In reply to comment #8)
> > I wanted you to give me source code that would generate your v2i32 insert
> > because I couldn't find it in the instruction set and am curious to know
what
> > situations it's used in. It sounds to me not like a missing tranformation
(like
> > the v4i16 insert) but an enhancement request.
>
> I think I see what you mean now... You'd like to see *C* code (using
> intrinsics) that generates a v2i32 insert?
>
> This isn't for a typical C compiler though, but rather for a vectorizing (JIT)
> compiler.
>
> Maybe the fibonacci.cpp file I've attached is of use to you? It's currently
for
> a v4i16 insert but could easily be rewritten for a v2i32 insert.
>
Yeah, the C source code. I just wanted to know how GCC et al treat such a
thing. It could be another language that's generating those types, but I
couldn't find any builtins in the GCC header files that do inserts into such a
vector.
Is there still an issue here? We don't generate the prettiest code here, but it works as far as I know.
(In reply to comment #14)
> Is there still an issue here? We don't generate the prettiest code here, but
> it works as far as I know.
As far as I'm aware it works ok now.
Okay then, resolving.
fibonacci.cpp
(3091 bytes, text/plain)llvm-dump1.txt
(351 bytes, text/plain)llvm-dump2.txt
(519 bytes, text/plain)