Quuxplusone / LLVMBugzillaTest

0 stars 0 forks source link

clang 2.6 -O1 faster than clang 2.7 #7341

Open Quuxplusone opened 14 years ago

Quuxplusone commented 14 years ago
Bugzilla Link PR6891
Status NEW
Importance P enhancement
Reported by Török Edwin (edwin+bugs@etorok.eu)
Reported on 2010-04-22 03:22:52 -0700
Last modified on 2016-04-29 15:11:40 -0700
Version unspecified
Hardware PC Linux
CC david.l.kreitzer@intel.com, hjl.tools@gmail.com, llvm-bugs@lists.llvm.org, rafael@espindo.la, zia.ansari@intel.com
Fixed by commit(s)
Attachments crc32test.c (1289 bytes, text/plain)
crc32.c (13721 bytes, text/plain)
crc32.h (30667 bytes, text/x-chdr)
clang26.ll (64135 bytes, application/octet-stream)
clang27.ll (77307 bytes, application/octet-stream)
clang26.s (47147 bytes, text/plain)
clang27.s (47162 bytes, text/plain)
Blocks
Blocked by
See also
Timing results on the crc32 code in zlib, all code compiled as 64-bit, on an
Intel Core 2 Quad Q9550.
64/32 refers to the element size in the crc32 table.

     (64) O1    O2    O3    Os  (32) O1    O2    O3   Os
gcc-4.4   87.2  92.8  93.3  98.9     90.0  92.8  92.8 100.0
gcc-4.5   94.4  92.8  92.8  98.9     92.2  92.8  92.8  99.4
clang-2.6 82.2  88.9  88.9  88.9     85.6  87.2  86.7  86.7
clang-2.7 91.7  90.6  92.8  90.6     86.1  85.6  87.2  83.9
dragonegg 89.4  87.2  86.7  86.7     90.0  86.7  86.7  87.8

clang 2.6 -O1 / 64 is better than everything else, while LLVM usually prefers
32-bit elements, and -O2/-Os. (gcc tends to prefer -O1 and 64).

It would be interesting to investigate why clang 2.6 -O1 is faster, and fix
clang 2.7 (well trunk/2.8) to be faster too.
Quuxplusone commented 14 years ago
build script:
#!/bin/bash
export dragonegg_disable_version_check=1
for opt in O1 O2 O3 Os ; do
    gcc-4.4 -$opt crc32.c crc32test.c -march=native -lm -o crc32_64_44_$opt
    gcc-4.5 -$opt crc32.c crc32test.c -march=native -lm -o crc32_64_45_$opt
    ~/llvm2.6/obj/Release/bin/clang -w -$opt crc32.c crc32test.c -march=core2 -lm -o crc32_64_clang26_$opt
    ~/llvm-git/obj/Release/bin/clang -w -$opt crc32.c crc32test.c -march=native -lm -o crc32_64_clang27_$opt
    llvm-gcc -$opt crc32.c crc32test.c -march=native -lm -o crc32_64_dragonegg_$opt
    gcc-4.4 -$opt crc32.c crc32test.c -march=native -lm -o crc32_32_44_$opt -DCRC32_32bit
    gcc-4.5 -$opt crc32.c crc32test.c -march=native -lm -o crc32_32_45_$opt -DCRC32_32bit
    ~/llvm2.6/obj/Release/bin/clang -w -$opt crc32.c crc32test.c -march=core2 -lm -o crc32_32_clang26_$opt -DCRC32_32bit
    ~/llvm-git/obj/Release/bin/clang -w -$opt crc32.c crc32test.c -march=native -lm -o crc32_32_clang27_$opt -DCRC32_32bit
    llvm-gcc -$opt crc32.c crc32test.c -march=native -lm -o crc32_32_dragonegg_$opt -DCRC32_32bit
done

for compiler in 44 45 clang26 clang27 dragonegg; do
    for bit in 64 32 ; do
        for opt in O1 O2 O3 Os ; do
            ./crc32_"$bit"_"$compiler"_"$opt"
        done
    done
    echo
done
Quuxplusone commented 14 years ago

Attached crc32test.c (1289 bytes, text/plain): crc32test.c

Quuxplusone commented 14 years ago

Attached crc32.c (13721 bytes, text/plain): crc32.c

Quuxplusone commented 14 years ago

Attached crc32.h (30667 bytes, text/x-chdr): crc32.h

Quuxplusone commented 14 years ago

Attached clang26.ll (64135 bytes, application/octet-stream): the fastest code: clang 2.6 -O1 (.ll)

Quuxplusone commented 14 years ago

Attached clang27.ll (77307 bytes, application/octet-stream): clang 2.7 -O2 code

Quuxplusone commented 14 years ago

Attached clang26.s (47147 bytes, text/plain): clang 2.6 -O1 -S

Quuxplusone commented 14 years ago

Attached clang27.s (47162 bytes, text/plain): clang 2.7 -O2 -S

Quuxplusone commented 8 years ago
At -O2, GCC 5.3 and 6.1 generate binaries 7% faster than clang 3.7 and 3.9
on Haswell.