Open mattkretz opened 7 years ago
43e534d improves the situation by using less optimization passes with GCC, Clang, and ICC (i.e. -O2
instead of -O3
). This will, of course, miss failures due to optimizer bugs, so this is not a perfect solution.
Here's another interesting find (using -O2 -v -ftime-report
with GCC 6.2 compiling loadstore_avx512_mayalias_int_short_uint_ushort
):
Execution times (seconds)
phase setup : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.02 ( 0%) wall 2579 kB ( 0%) ggc
phase parsing : 1.62 ( 3%) usr 0.60 (11%) sys 2.30 ( 4%) wall 189239 kB ( 7%) ggc
phase lang. deferred : 5.62 (10%) usr 0.56 (10%) sys 6.23 (10%) wall 473587 kB (18%) ggc
phase opt and generate : 47.13 (87%) usr 4.18 (78%) sys 51.70 (86%) wall 1954366 kB (75%) ggc
phase finalize : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall 0 kB ( 0%) ggc
|name lookup : 0.61 ( 1%) usr 0.09 ( 2%) sys 0.66 ( 1%) wall 76215 kB ( 3%) ggc
|overload resolution : 4.02 ( 7%) usr 0.45 ( 8%) sys 4.59 ( 8%) wall 381546 kB (15%) ggc
garbage collection : 2.21 ( 4%) usr 0.00 ( 0%) sys 2.22 ( 4%) wall 0 kB ( 0%) ggc
dump files : 0.30 ( 1%) usr 0.07 ( 1%) sys 0.44 ( 1%) wall 0 kB ( 0%) ggc
callgraph construction : 0.83 ( 2%) usr 0.13 ( 2%) sys 0.99 ( 2%) wall 47852 kB ( 2%) ggc
callgraph optimization : 0.86 ( 2%) usr 0.20 ( 4%) sys 1.05 ( 2%) wall 31896 kB ( 1%) ggc
ipa dead code removal : 0.16 ( 0%) usr 0.02 ( 0%) sys 0.15 ( 0%) wall 0 kB ( 0%) ggc
ipa inheritance graph : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall 6 kB ( 0%) ggc
ipa cp : 0.09 ( 0%) usr 0.00 ( 0%) sys 0.13 ( 0%) wall 12078 kB ( 0%) ggc
ipa inlining heuristics : 0.66 ( 1%) usr 0.01 ( 0%) sys 0.69 ( 1%) wall 26183 kB ( 1%) ggc
ipa function splitting : 0.08 ( 0%) usr 0.00 ( 0%) sys 0.16 ( 0%) wall 95 kB ( 0%) ggc
ipa comdats : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall 0 kB ( 0%) ggc
ipa reference : 0.03 ( 0%) usr 0.00 ( 0%) sys 0.03 ( 0%) wall 0 kB ( 0%) ggc
ipa profile : 0.06 ( 0%) usr 0.00 ( 0%) sys 0.06 ( 0%) wall 0 kB ( 0%) ggc
ipa pure const : 0.19 ( 0%) usr 0.04 ( 1%) sys 0.13 ( 0%) wall 44 kB ( 0%) ggc
ipa icf : 0.20 ( 0%) usr 0.00 ( 0%) sys 0.19 ( 0%) wall 20 kB ( 0%) ggc
ipa SRA : 1.62 ( 3%) usr 0.36 ( 7%) sys 1.73 ( 3%) wall 143128 kB ( 5%) ggc
ipa free lang data : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall 0 kB ( 0%) ggc
ipa free inline summary : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.02 ( 0%) wall 0 kB ( 0%) ggc
cfg construction : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.02 ( 0%) wall 2783 kB ( 0%) ggc
cfg cleanup : 0.62 ( 1%) usr 0.02 ( 0%) sys 0.59 ( 1%) wall 8870 kB ( 0%) ggc
trivially dead code : 0.25 ( 0%) usr 0.02 ( 0%) sys 0.21 ( 0%) wall 0 kB ( 0%) ggc
df scan insns : 0.22 ( 0%) usr 0.01 ( 0%) sys 0.18 ( 0%) wall 72 kB ( 0%) ggc
df multiple defs : 0.18 ( 0%) usr 0.00 ( 0%) sys 0.08 ( 0%) wall 0 kB ( 0%) ggc
df reaching defs : 0.30 ( 1%) usr 0.00 ( 0%) sys 0.24 ( 0%) wall 0 kB ( 0%) ggc
df live regs : 1.32 ( 2%) usr 0.02 ( 0%) sys 1.49 ( 2%) wall 0 kB ( 0%) ggc
df live&initialized regs: 0.66 ( 1%) usr 0.00 ( 0%) sys 0.66 ( 1%) wall 0 kB ( 0%) ggc
df must-initialized regs: 0.09 ( 0%) usr 0.00 ( 0%) sys 0.08 ( 0%) wall 0 kB ( 0%) ggc
df use-def / def-use chains: 0.12 ( 0%) usr 0.00 ( 0%) sys 0.10 ( 0%) wall 0 kB ( 0%) ggc
df reg dead/unused notes: 0.55 ( 1%) usr 0.02 ( 0%) sys 0.56 ( 1%) wall 13611 kB ( 1%) ggc
register information : 0.10 ( 0%) usr 0.00 ( 0%) sys 0.17 ( 0%) wall 0 kB ( 0%) ggc
alias analysis : 0.35 ( 1%) usr 0.01 ( 0%) sys 0.38 ( 1%) wall 38532 kB ( 1%) ggc
alias stmt walking : 1.88 ( 3%) usr 0.12 ( 2%) sys 1.88 ( 3%) wall 6090 kB ( 0%) ggc
register scan : 0.07 ( 0%) usr 0.00 ( 0%) sys 0.08 ( 0%) wall 220 kB ( 0%) ggc
rebuild jump labels : 0.13 ( 0%) usr 0.00 ( 0%) sys 0.18 ( 0%) wall 0 kB ( 0%) ggc
preprocessing : 0.14 ( 0%) usr 0.17 ( 3%) sys 0.22 ( 0%) wall 4495 kB ( 0%) ggc
parser (global) : 0.30 ( 1%) usr 0.18 ( 3%) sys 0.72 ( 1%) wall 76173 kB ( 3%) ggc
parser struct body : 0.12 ( 0%) usr 0.01 ( 0%) sys 0.17 ( 0%) wall 16036 kB ( 1%) ggc
parser function body : 0.07 ( 0%) usr 0.02 ( 0%) sys 0.08 ( 0%) wall 4327 kB ( 0%) ggc
parser inl. func. body : 0.50 ( 1%) usr 0.09 ( 2%) sys 0.44 ( 1%) wall 31362 kB ( 1%) ggc
parser inl. meth. body : 0.15 ( 0%) usr 0.03 ( 1%) sys 0.18 ( 0%) wall 9710 kB ( 0%) ggc
template instantiation : 5.60 (10%) usr 0.66 (12%) sys 6.36 (11%) wall 520420 kB (20%) ggc
early inlining heuristics: 0.21 ( 0%) usr 0.05 ( 1%) sys 0.38 ( 1%) wall 32939 kB ( 1%) ggc
inline parameters : 0.48 ( 1%) usr 0.11 ( 2%) sys 0.63 ( 1%) wall 26778 kB ( 1%) ggc
integration : 2.34 ( 4%) usr 0.52 (10%) sys 3.09 ( 5%) wall 402418 kB (15%) ggc
tree gimplify : 0.79 ( 1%) usr 0.15 ( 3%) sys 0.96 ( 2%) wall 101338 kB ( 4%) ggc
tree eh : 0.16 ( 0%) usr 0.03 ( 1%) sys 0.20 ( 0%) wall 25971 kB ( 1%) ggc
tree CFG construction : 0.11 ( 0%) usr 0.03 ( 1%) sys 0.16 ( 0%) wall 46961 kB ( 2%) ggc
tree CFG cleanup : 0.96 ( 2%) usr 0.05 ( 1%) sys 0.96 ( 2%) wall 568 kB ( 0%) ggc
tree tail merge : 0.12 ( 0%) usr 0.00 ( 0%) sys 0.12 ( 0%) wall 164 kB ( 0%) ggc
tree VRP : 0.87 ( 2%) usr 0.05 ( 1%) sys 0.96 ( 2%) wall 30066 kB ( 1%) ggc
tree copy propagation : 0.09 ( 0%) usr 0.00 ( 0%) sys 0.13 ( 0%) wall 44 kB ( 0%) ggc
tree PTA : 1.24 ( 2%) usr 0.20 ( 4%) sys 1.51 ( 3%) wall 7385 kB ( 0%) ggc
tree PHI insertion : 0.02 ( 0%) usr 0.02 ( 0%) sys 0.06 ( 0%) wall 7672 kB ( 0%) ggc
tree SSA rewrite : 0.48 ( 1%) usr 0.07 ( 1%) sys 0.50 ( 1%) wall 52607 kB ( 2%) ggc
tree SSA other : 0.14 ( 0%) usr 0.08 ( 1%) sys 0.23 ( 0%) wall 3138 kB ( 0%) ggc
tree SSA incremental : 0.80 ( 1%) usr 0.05 ( 1%) sys 0.76 ( 1%) wall 17553 kB ( 1%) ggc
tree operand scan : 0.94 ( 2%) usr 0.26 ( 5%) sys 1.37 ( 2%) wall 137855 kB ( 5%) ggc
dominator optimization : 0.58 ( 1%) usr 0.03 ( 1%) sys 0.55 ( 1%) wall 36049 kB ( 1%) ggc
tree SRA : 0.34 ( 1%) usr 0.02 ( 0%) sys 0.34 ( 1%) wall 13704 kB ( 1%) ggc
isolate eroneous paths : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall 0 kB ( 0%) ggc
tree CCP : 1.27 ( 2%) usr 0.19 ( 4%) sys 1.47 ( 2%) wall 24833 kB ( 1%) ggc
tree PHI const/copy prop: 0.01 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall 2 kB ( 0%) ggc
tree split crit edges : 0.05 ( 0%) usr 0.00 ( 0%) sys 0.08 ( 0%) wall 25053 kB ( 1%) ggc
tree reassociation : 0.05 ( 0%) usr 0.00 ( 0%) sys 0.08 ( 0%) wall 0 kB ( 0%) ggc
tree PRE : 1.20 ( 2%) usr 0.08 ( 1%) sys 1.36 ( 2%) wall 22556 kB ( 1%) ggc
tree FRE : 1.48 ( 3%) usr 0.15 ( 3%) sys 1.90 ( 3%) wall 26763 kB ( 1%) ggc
tree code sinking : 0.07 ( 0%) usr 0.00 ( 0%) sys 0.09 ( 0%) wall 8514 kB ( 0%) ggc
tree linearize phis : 0.07 ( 0%) usr 0.00 ( 0%) sys 0.07 ( 0%) wall 815 kB ( 0%) ggc
tree backward propagate : 0.05 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall 0 kB ( 0%) ggc
tree forward propagate : 0.52 ( 1%) usr 0.01 ( 0%) sys 0.48 ( 1%) wall 7731 kB ( 0%) ggc
tree conservative DCE : 0.12 ( 0%) usr 0.01 ( 0%) sys 0.21 ( 0%) wall 0 kB ( 0%) ggc
tree aggressive DCE : 0.42 ( 1%) usr 0.02 ( 0%) sys 0.52 ( 1%) wall 28415 kB ( 1%) ggc
tree DSE : 0.27 ( 0%) usr 0.06 ( 1%) sys 0.38 ( 1%) wall 187 kB ( 0%) ggc
PHI merge : 0.03 ( 0%) usr 0.00 ( 0%) sys 0.02 ( 0%) wall 0 kB ( 0%) ggc
tree loop bounds : 0.04 ( 0%) usr 0.01 ( 0%) sys 0.06 ( 0%) wall 2358 kB ( 0%) ggc
tree loop invariant motion: 0.09 ( 0%) usr 0.01 ( 0%) sys 0.09 ( 0%) wall 98 kB ( 0%) ggc
tree canonical iv : 0.02 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall 3413 kB ( 0%) ggc
scev constant prop : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall 931 kB ( 0%) ggc
complete unrolling : 0.27 ( 0%) usr 0.04 ( 1%) sys 0.24 ( 0%) wall 19229 kB ( 1%) ggc
tree iv optimization : 0.39 ( 1%) usr 0.01 ( 0%) sys 0.40 ( 1%) wall 34176 kB ( 1%) ggc
tree copy headers : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall 180 kB ( 0%) ggc
tree SSA uncprop : 0.02 ( 0%) usr 0.02 ( 0%) sys 0.03 ( 0%) wall 0 kB ( 0%) ggc
tree switch conversion : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.03 ( 0%) wall 0 kB ( 0%) ggc
tree strlen optimization: 0.03 ( 0%) usr 0.00 ( 0%) sys 0.04 ( 0%) wall 21 kB ( 0%) ggc
dominance frontiers : 0.07 ( 0%) usr 0.02 ( 0%) sys 0.12 ( 0%) wall 0 kB ( 0%) ggc
dominance computation : 1.10 ( 2%) usr 0.16 ( 3%) sys 1.20 ( 2%) wall 0 kB ( 0%) ggc
control dependences : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.04 ( 0%) wall 0 kB ( 0%) ggc
out of ssa : 0.11 ( 0%) usr 0.00 ( 0%) sys 0.10 ( 0%) wall 618 kB ( 0%) ggc
expand vars : 0.12 ( 0%) usr 0.00 ( 0%) sys 0.06 ( 0%) wall 13615 kB ( 1%) ggc
expand : 0.43 ( 1%) usr 0.00 ( 0%) sys 0.39 ( 1%) wall 101778 kB ( 4%) ggc
post expand cleanups : 0.05 ( 0%) usr 0.00 ( 0%) sys 0.05 ( 0%) wall 7170 kB ( 0%) ggc
varconst : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall 106 kB ( 0%) ggc
lower subreg : 0.04 ( 0%) usr 0.00 ( 0%) sys 0.03 ( 0%) wall 4210 kB ( 0%) ggc
jump : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall 0 kB ( 0%) ggc
forward prop : 0.24 ( 0%) usr 0.00 ( 0%) sys 0.31 ( 1%) wall 8593 kB ( 0%) ggc
CSE : 0.82 ( 2%) usr 0.02 ( 0%) sys 0.79 ( 1%) wall 5235 kB ( 0%) ggc
dead code elimination : 0.07 ( 0%) usr 0.00 ( 0%) sys 0.09 ( 0%) wall 0 kB ( 0%) ggc
dead store elim1 : 0.27 ( 0%) usr 0.02 ( 0%) sys 0.23 ( 0%) wall 8995 kB ( 0%) ggc
dead store elim2 : 0.24 ( 0%) usr 0.00 ( 0%) sys 0.28 ( 0%) wall 9109 kB ( 0%) ggc
loop init : 0.57 ( 1%) usr 0.02 ( 0%) sys 0.50 ( 1%) wall 42537 kB ( 2%) ggc
loop invariant motion : 0.05 ( 0%) usr 0.00 ( 0%) sys 0.05 ( 0%) wall 764 kB ( 0%) ggc
loop fini : 0.06 ( 0%) usr 0.00 ( 0%) sys 0.12 ( 0%) wall 0 kB ( 0%) ggc
CPROP : 0.70 ( 1%) usr 0.02 ( 0%) sys 0.72 ( 1%) wall 20664 kB ( 1%) ggc
PRE : 0.36 ( 1%) usr 0.02 ( 0%) sys 0.39 ( 1%) wall 5731 kB ( 0%) ggc
CSE 2 : 0.34 ( 1%) usr 0.01 ( 0%) sys 0.45 ( 1%) wall 2495 kB ( 0%) ggc
branch prediction : 0.28 ( 1%) usr 0.06 ( 1%) sys 0.27 ( 0%) wall 7710 kB ( 0%) ggc
combiner : 0.71 ( 1%) usr 0.01 ( 0%) sys 0.68 ( 1%) wall 24666 kB ( 1%) ggc
if-conversion : 0.06 ( 0%) usr 0.00 ( 0%) sys 0.06 ( 0%) wall 1808 kB ( 0%) ggc
integrated RA : 1.61 ( 3%) usr 0.03 ( 1%) sys 1.75 ( 3%) wall 102369 kB ( 4%) ggc
LRA non-specific : 0.68 ( 1%) usr 0.03 ( 1%) sys 0.52 ( 1%) wall 11719 kB ( 0%) ggc
LRA virtuals elimination: 0.13 ( 0%) usr 0.00 ( 0%) sys 0.12 ( 0%) wall 9027 kB ( 0%) ggc
LRA reload inheritance : 0.11 ( 0%) usr 0.00 ( 0%) sys 0.05 ( 0%) wall 269 kB ( 0%) ggc
LRA create live ranges : 0.56 ( 1%) usr 0.00 ( 0%) sys 0.57 ( 1%) wall 2012 kB ( 0%) ggc
LRA hard reg assignment : 0.07 ( 0%) usr 0.00 ( 0%) sys 0.13 ( 0%) wall 0 kB ( 0%) ggc
LRA coalesce pseudo regs: 0.02 ( 0%) usr 0.00 ( 0%) sys 0.02 ( 0%) wall 0 kB ( 0%) ggc
LRA rematerialization : 0.03 ( 0%) usr 0.01 ( 0%) sys 0.11 ( 0%) wall 0 kB ( 0%) ggc
reload : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.02 ( 0%) wall 0 kB ( 0%) ggc
reload CSE regs : 0.75 ( 1%) usr 0.00 ( 0%) sys 0.81 ( 1%) wall 15907 kB ( 1%) ggc
ree : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.06 ( 0%) wall 71 kB ( 0%) ggc
thread pro- & epilogue : 0.13 ( 0%) usr 0.00 ( 0%) sys 0.14 ( 0%) wall 2467 kB ( 0%) ggc
if-conversion 2 : 0.02 ( 0%) usr 0.00 ( 0%) sys 0.03 ( 0%) wall 3 kB ( 0%) ggc
combine stack adjustments: 0.03 ( 0%) usr 0.00 ( 0%) sys 0.03 ( 0%) wall 0 kB ( 0%) ggc
peephole 2 : 0.08 ( 0%) usr 0.00 ( 0%) sys 0.08 ( 0%) wall 2237 kB ( 0%) ggc
hard reg cprop : 0.12 ( 0%) usr 0.02 ( 0%) sys 0.12 ( 0%) wall 156 kB ( 0%) ggc
scheduling 2 : 2.07 ( 4%) usr 0.06 ( 1%) sys 2.14 ( 4%) wall 3336 kB ( 0%) ggc
reorder blocks : 0.11 ( 0%) usr 0.00 ( 0%) sys 0.13 ( 0%) wall 10484 kB ( 0%) ggc
shorten branches : 0.18 ( 0%) usr 0.00 ( 0%) sys 0.11 ( 0%) wall 0 kB ( 0%) ggc
reg stack : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall 180 kB ( 0%) ggc
final : 0.28 ( 1%) usr 0.01 ( 0%) sys 0.28 ( 0%) wall 15132 kB ( 1%) ggc
variable output : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall 74 kB ( 0%) ggc
symout : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall 0 kB ( 0%) ggc
tree if-combine : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall 0 kB ( 0%) ggc
uninit var analysis : 0.02 ( 0%) usr 0.01 ( 0%) sys 0.02 ( 0%) wall 0 kB ( 0%) ggc
straight-line strength reduction: 0.03 ( 0%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall 156 kB ( 0%) ggc
address lowering : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall 741 kB ( 0%) ggc
early local passes : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall 0 kB ( 0%) ggc
rest of compilation : 0.66 ( 1%) usr 0.06 ( 1%) sys 0.49 ( 1%) wall 8781 kB ( 0%) ggc
remove unused locals : 0.44 ( 1%) usr 0.07 ( 1%) sys 0.46 ( 1%) wall 420 kB ( 0%) ggc
address taken : 0.21 ( 0%) usr 0.02 ( 0%) sys 0.25 ( 0%) wall 0 kB ( 0%) ggc
unaccounted todo : 1.03 ( 2%) usr 0.10 ( 2%) sys 1.16 ( 2%) wall 22158 kB ( 1%) ggc
rebuild frequencies : 0.02 ( 0%) usr 0.01 ( 0%) sys 0.03 ( 0%) wall 939 kB ( 0%) ggc
repair loop structures : 0.02 ( 0%) usr 0.01 ( 0%) sys 0.04 ( 0%) wall 0 kB ( 0%) ggc
TOTAL : 54.37 5.34 60.26 2619783 kB
native_test_types
does not follow TEST_TYPES
for the non-full ABIs. This is especially relevant for the knl
builds which instantiate more templates than intended and test some functions three times.
Maybe PCH can help. Have to try https://github.com/sakra/cotire/
I tried PCH manually with GCC6 and saw no reduction in compilation time for the datapar test. Since almost every test requires different compiler flags (the same flags are used only four times: datapar.cpp, datapar_mask.cpp, loadstore.cpp, and where.cpp) there's nothing to be gained here - only complexity in the buildsystem...
Edit: I precompiled tests/unittest.h, so not much left to be added to the PCH.
Compiling the unit tests is too slow. Modify, build, test cycles are too slow because of the build times.
Also building and testing all of Vc on Travis goes over the time limit. That's why I already had to implement the
subset
envvar, which reduces the number of targets per matrix item, but OTOH increases the overhead.