Quuxplusone / LLVMBugzillaTest

0 stars 0 forks source link

Much slower compilation for proto code compared to LLVM3.2 #18054

Open Quuxplusone opened 10 years ago

Quuxplusone commented 10 years ago
Bugzilla Link PR18055
Status NEW
Importance P normal
Reported by Bart Janssens (bart@bartjanssens.org)
Reported on 2013-11-25 04:01:38 -0800
Last modified on 2013-12-07 20:15:43 -0800
Version trunk
Hardware PC All
CC bart@bartjanssens.org, dgregor@apple.com, llvm-bugs@lists.llvm.org, rafael@espindo.la
Fixed by commit(s)
Attachments proto.ii.gz (245332 bytes, application/x-gzip)
Blocks
Blocked by
See also
Created attachment 11607
Test case demonstrating the slowdown

Code extensively using the Boost Proto library compiles slower by a factor of 3
to 6 comparing clang 3.2 against clang 3.3 and 3.4. The attached preprocessed
code demonstrates the problem, running "clang++ -ftemplate-depth=512 proto.ii -
stdlib=libstdc++ -o proto":

Clang 3.2 (OS X binary from website):
1.49s user 0.11s system 93% cpu 1.712 total

Apple LLVM version 5.0 (clang-500.2.79) (based on LLVM 3.3svn):
10.16s user 0.09s system 93% cpu 10.996 total

latest from git (reports clang version 3.5):
5.51s user 0.09s system 88% cpu 6.334 total
Quuxplusone commented 10 years ago

Attached proto.ii.gz (245332 bytes, application/x-gzip): Test case demonstrating the slowdown

Quuxplusone commented 10 years ago
Most of the slowdown is in the ir generation:

"clang -cc1 -fblocks"
3.2:             0m0.893s
trunk (r195679): 0m0.951s

"clang -cc1 -fblocks -emit-llvm-only"
3.2:             0m0.988s
trunk (r195679): 0m3.864s
Quuxplusone commented 10 years ago
A big part of the time is spent in getMostRecentDecl for namespaces. This is
called from the visibility computation, which is why we only spend time on it
when generating IR.

The are two problems in here:
* We don't cache visibility computation anymore since it can change as new
declarations are found. We would cache it if we delayed its computation until
the end of the file.
* getMostRecentDecl is really slow. We could speed it up if we changed decl to
point to the next decl in the file instead of the previous one.
Quuxplusone commented 10 years ago

r195768 helps a bit, but there is still a long way to go.

Quuxplusone commented 10 years ago
r196712 helps a bit more, but still more to go:

$ perf stat -r 5 -e instructions   ./clang-196711 -cc1 -fblocks proto.ii  -w -
ftemplate-depth 512 -emit-llvm-only

 Performance counter stats for './clang-196711 -cc1 -fblocks proto.ii -w -ftemplate-depth 512 -emit-llvm-only' (5 runs):

     3,761,083,276 instructions              #    0.00  insns per cycle          ( +-  0.04% )

       1.389783619 seconds time elapsed                                          ( +-  0.71% )

$ perf stat -r 5 -e instructions   ./clang-196712 -cc1 -fblocks proto.ii  -w -
ftemplate-depth 512 -emit-llvm-only

 Performance counter stats for './clang-196712 -cc1 -fblocks proto.ii -w -ftemplate-depth 512 -emit-llvm-only' (5 runs):

     3,389,372,356 instructions              #    0.00  insns per cycle          ( +-  0.06% )

       1.229214031 seconds time elapsed                                          ( +-  0.52% )