Closed Quuxplusone closed 13 years ago
Attached test2.cpp
(816 bytes, text/x-c++src): test case 2 (self-contained c++ code)
Attached test3.cpp
(1031 bytes, text/x-c++src): test case 3 (self-contained c++ code)
Attached test1.cpp
(579 bytes, text/x-c++src): test case 1 (self-contained c++ code)
Attached tests.tar.gz
(1803 bytes, application/x-gzip): .ll-files for the test cases for llvm-gcc 2.5, 2.6, and 2.7svn
Yes, this is really unfortunate. The root cause of this problem is that the ABI code in both llvm-gcc and clang are passing "two floats" as a double instead of as the low two elements of a 4x float. Both are equivalent in the X86-64 ABI, but <4x float> will always produce much more efficient code.
This is related to rdar://6778419, which covers the return case.
That said, it seems to me that instcombine could clean up test case 1 and
test case 3 considerably.
Instcombine can clean up test2 from the original bug submission and the same thing in comment #3. It can't really do anything useful with comment #1 because the frontend is passing as double.
test2 from the original report is fine now:
define void @test2(float %aX, float %aY, float %aZ, %struct.float3* nocapture
%res) nounwind noinline ssp {
entry:
%0 = getelementptr inbounds %struct.float3* %res, i64 0, i32 0 ; <float*> [#uses=1]
store float %aX, float* %0, align 4
%1 = getelementptr inbounds %struct.float3* %res, i64 0, i32 1 ; <float*> [#uses=1]
store float %aY, float* %1, align 4
%2 = getelementptr inbounds %struct.float3* %res, i64 0, i32 2 ; <float*> [#uses=1]
store float %aZ, float* %2, align 4
ret void
}
Test3 is the same ABI passing stuff, so there isn't anything instcombine can do
here left, this is all a frontend issue now.
Clang generates great code for all of these testcases.
test1.cpp
(579 bytes, text/x-c++src)test2.cpp
(816 bytes, text/x-c++src)test3.cpp
(1031 bytes, text/x-c++src)tests.tar.gz
(1803 bytes, application/x-gzip)