Open Quuxplusone opened 10 years ago
Attached proto.ii.gz
(245332 bytes, application/x-gzip): Test case demonstrating the slowdown
Most of the slowdown is in the ir generation:
"clang -cc1 -fblocks"
3.2: 0m0.893s
trunk (r195679): 0m0.951s
"clang -cc1 -fblocks -emit-llvm-only"
3.2: 0m0.988s
trunk (r195679): 0m3.864s
A big part of the time is spent in getMostRecentDecl for namespaces. This is
called from the visibility computation, which is why we only spend time on it
when generating IR.
The are two problems in here:
* We don't cache visibility computation anymore since it can change as new
declarations are found. We would cache it if we delayed its computation until
the end of the file.
* getMostRecentDecl is really slow. We could speed it up if we changed decl to
point to the next decl in the file instead of the previous one.
r195768 helps a bit, but there is still a long way to go.
r196712 helps a bit more, but still more to go:
$ perf stat -r 5 -e instructions ./clang-196711 -cc1 -fblocks proto.ii -w -
ftemplate-depth 512 -emit-llvm-only
Performance counter stats for './clang-196711 -cc1 -fblocks proto.ii -w -ftemplate-depth 512 -emit-llvm-only' (5 runs):
3,761,083,276 instructions # 0.00 insns per cycle ( +- 0.04% )
1.389783619 seconds time elapsed ( +- 0.71% )
$ perf stat -r 5 -e instructions ./clang-196712 -cc1 -fblocks proto.ii -w -
ftemplate-depth 512 -emit-llvm-only
Performance counter stats for './clang-196712 -cc1 -fblocks proto.ii -w -ftemplate-depth 512 -emit-llvm-only' (5 runs):
3,389,372,356 instructions # 0.00 insns per cycle ( +- 0.06% )
1.229214031 seconds time elapsed ( +- 0.52% )
proto.ii.gz
(245332 bytes, application/x-gzip)