poor x86-64 ABI calling conv passing of struct with 3 or 4 floats

Quuxplusone commented 14 years ago


Bugzilla Link	PR6194
Status	RESOLVED FIXED
Importance	P normal
Reported by	Ralf Karrenberg (karrenberg@cs.uni-saarland.de)
Reported on	2010-02-01 04:51:49 -0800
Last modified on	2010-12-02 01:54:58 -0800
Version	trunk
Hardware	PC Linux
CC	baldrick@free.fr, clattner@nondot.org, daniel@zuster.org, evan.cheng@apple.com, llvm-bugs@lists.llvm.org
Fixed by commit(s)
Attachments	`test1.cpp` (579 bytes, text/x-c++src) `test2.cpp` (816 bytes, text/x-c++src) `test3.cpp` (1031 bytes, text/x-c++src) `tests.tar.gz` (1803 bytes, application/x-gzip)
Blocks
Blocked by
See also

For the code below, llvm-gcc generates non-optimal code for accessing and
returning structs on x64.

In order to conform to ABI requirements, a struct parameter which holds 3
floats is split into two parameters: a double and a float.
Accessing the first two struct-values results in a series of bitcast-, zext-,
lshr-, and trunc-operations (test1).

It should be possible to reduce this to bitcast(i64)-shift-bitcast(float).

More importantly however, if code that returns a struct is inlined into a
function that defines that same struct, the corresponding series of operations
as above are entirely redundant (float ->
bitcast/zext/shl/zext/lshr/trunc/bitcast -> float) but do not get optimized
away (test3):

define void @test2(float %aX, float %aY, float %aZ, %struct.float3* nocapture
%res) nounwind noinline {
entry:
  %tmp27 = bitcast float %aY to i32               ; <i32> [#uses=1]
  %tmp28 = zext i32 %tmp27 to i96                 ; <i96> [#uses=1]
  %tmp29 = shl i96 %tmp28, 32                     ; <i96> [#uses=1]
  %tmp21 = zext i96 %tmp29 to i128                ; <i128> [#uses=1]
  %0 = getelementptr inbounds %struct.float3* %res, i64 0, i32 0 ; <float*> [#uses=1]
  store float %aX, float* %0, align 4
  %1 = getelementptr inbounds %struct.float3* %res, i64 0, i32 1 ; <float*> [#uses=1]
  %tmp10 = lshr i128 %tmp21, 32                   ; <i128> [#uses=1]
  %tmp11 = trunc i128 %tmp10 to i32               ; <i32> [#uses=1]
  %tmp12 = bitcast i32 %tmp11 to float            ; <float> [#uses=1]
  store float %tmp12, float* %1, align 4
  %2 = getelementptr inbounds %struct.float3* %res, i64 0, i32 2 ; <float*> [#uses=1]
  store float %aZ, float* %2, align 4
  ret void
}

Quuxplusone commented 14 years ago

Attached test2.cpp (816 bytes, text/x-c++src): test case 2 (self-contained c++ code)

Quuxplusone commented 14 years ago

Attached test3.cpp (1031 bytes, text/x-c++src): test case 3 (self-contained c++ code)

Quuxplusone commented 14 years ago

Attached test1.cpp (579 bytes, text/x-c++src): test case 1 (self-contained c++ code)

Quuxplusone commented 14 years ago

Attached tests.tar.gz (1803 bytes, application/x-gzip): .ll-files for the test cases for llvm-gcc 2.5, 2.6, and 2.7svn

Quuxplusone commented 14 years ago

Yes, this is really unfortunate. The root cause of this problem is that the ABI code in both llvm-gcc and clang are passing "two floats" as a double instead of as the low two elements of a 4x float. Both are equivalent in the X86-64 ABI, but <4x float> will always produce much more efficient code.

Quuxplusone commented 14 years ago

This is related to rdar://6778419, which covers the return case.

Quuxplusone commented 14 years ago

That said, it seems to me that instcombine could clean up test case 1 and
test case 3 considerably.

Quuxplusone commented 14 years ago

Instcombine can clean up test2 from the original bug submission and the same thing in comment #3. It can't really do anything useful with comment #1 because the frontend is passing as double.

Quuxplusone commented 14 years ago

test2 from the original report is fine now:

define void @test2(float %aX, float %aY, float %aZ, %struct.float3* nocapture
%res) nounwind noinline ssp {
entry:
  %0 = getelementptr inbounds %struct.float3* %res, i64 0, i32 0 ; <float*> [#uses=1]
  store float %aX, float* %0, align 4
  %1 = getelementptr inbounds %struct.float3* %res, i64 0, i32 1 ; <float*> [#uses=1]
  store float %aY, float* %1, align 4
  %2 = getelementptr inbounds %struct.float3* %res, i64 0, i32 2 ; <float*> [#uses=1]
  store float %aZ, float* %2, align 4
  ret void
}

Test3 is the same ABI passing stuff, so there isn't anything instcombine can do
here left, this is all a frontend issue now.

Quuxplusone commented 13 years ago

Clang generates great code for all of these testcases.

Quuxplusone / LLVMBugzillaTest

poor x86-64 ABI calling conv passing of struct with 3 or 4 floats #6644