VcDevel / Vc

SIMD Vector Classes for C++
BSD 3-Clause "New" or "Revised" License
1.45k stars 152 forks source link

Improve test compile times (how?) #157

Open mattkretz opened 7 years ago

mattkretz commented 7 years ago

Compiling the unit tests is too slow. Modify, build, test cycles are too slow because of the build times.

Also building and testing all of Vc on Travis goes over the time limit. That's why I already had to implement the subset envvar, which reduces the number of targets per matrix item, but OTOH increases the overhead.

mattkretz commented 7 years ago

43e534d improves the situation by using less optimization passes with GCC, Clang, and ICC (i.e. -O2 instead of -O3). This will, of course, miss failures due to optimizer bugs, so this is not a perfect solution.

Here's another interesting find (using -O2 -v -ftime-report with GCC 6.2 compiling loadstore_avx512_mayalias_int_short_uint_ushort):

Execution times (seconds)
 phase setup             :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 0%) wall    2579 kB ( 0%) ggc
 phase parsing           :   1.62 ( 3%) usr   0.60 (11%) sys   2.30 ( 4%) wall  189239 kB ( 7%) ggc
 phase lang. deferred    :   5.62 (10%) usr   0.56 (10%) sys   6.23 (10%) wall  473587 kB (18%) ggc
 phase opt and generate  :  47.13 (87%) usr   4.18 (78%) sys  51.70 (86%) wall 1954366 kB (75%) ggc
 phase finalize          :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall       0 kB ( 0%) ggc
 |name lookup            :   0.61 ( 1%) usr   0.09 ( 2%) sys   0.66 ( 1%) wall   76215 kB ( 3%) ggc
 |overload resolution    :   4.02 ( 7%) usr   0.45 ( 8%) sys   4.59 ( 8%) wall  381546 kB (15%) ggc
 garbage collection      :   2.21 ( 4%) usr   0.00 ( 0%) sys   2.22 ( 4%) wall       0 kB ( 0%) ggc
 dump files              :   0.30 ( 1%) usr   0.07 ( 1%) sys   0.44 ( 1%) wall       0 kB ( 0%) ggc
 callgraph construction  :   0.83 ( 2%) usr   0.13 ( 2%) sys   0.99 ( 2%) wall   47852 kB ( 2%) ggc
 callgraph optimization  :   0.86 ( 2%) usr   0.20 ( 4%) sys   1.05 ( 2%) wall   31896 kB ( 1%) ggc
 ipa dead code removal   :   0.16 ( 0%) usr   0.02 ( 0%) sys   0.15 ( 0%) wall       0 kB ( 0%) ggc
 ipa inheritance graph   :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall       6 kB ( 0%) ggc
 ipa cp                  :   0.09 ( 0%) usr   0.00 ( 0%) sys   0.13 ( 0%) wall   12078 kB ( 0%) ggc
 ipa inlining heuristics :   0.66 ( 1%) usr   0.01 ( 0%) sys   0.69 ( 1%) wall   26183 kB ( 1%) ggc
 ipa function splitting  :   0.08 ( 0%) usr   0.00 ( 0%) sys   0.16 ( 0%) wall      95 kB ( 0%) ggc
 ipa comdats             :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall       0 kB ( 0%) ggc
 ipa reference           :   0.03 ( 0%) usr   0.00 ( 0%) sys   0.03 ( 0%) wall       0 kB ( 0%) ggc
 ipa profile             :   0.06 ( 0%) usr   0.00 ( 0%) sys   0.06 ( 0%) wall       0 kB ( 0%) ggc
 ipa pure const          :   0.19 ( 0%) usr   0.04 ( 1%) sys   0.13 ( 0%) wall      44 kB ( 0%) ggc
 ipa icf                 :   0.20 ( 0%) usr   0.00 ( 0%) sys   0.19 ( 0%) wall      20 kB ( 0%) ggc
 ipa SRA                 :   1.62 ( 3%) usr   0.36 ( 7%) sys   1.73 ( 3%) wall  143128 kB ( 5%) ggc
 ipa free lang data      :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall       0 kB ( 0%) ggc
 ipa free inline summary :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 0%) wall       0 kB ( 0%) ggc
 cfg construction        :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 0%) wall    2783 kB ( 0%) ggc
 cfg cleanup             :   0.62 ( 1%) usr   0.02 ( 0%) sys   0.59 ( 1%) wall    8870 kB ( 0%) ggc
 trivially dead code     :   0.25 ( 0%) usr   0.02 ( 0%) sys   0.21 ( 0%) wall       0 kB ( 0%) ggc
 df scan insns           :   0.22 ( 0%) usr   0.01 ( 0%) sys   0.18 ( 0%) wall      72 kB ( 0%) ggc
 df multiple defs        :   0.18 ( 0%) usr   0.00 ( 0%) sys   0.08 ( 0%) wall       0 kB ( 0%) ggc
 df reaching defs        :   0.30 ( 1%) usr   0.00 ( 0%) sys   0.24 ( 0%) wall       0 kB ( 0%) ggc
 df live regs            :   1.32 ( 2%) usr   0.02 ( 0%) sys   1.49 ( 2%) wall       0 kB ( 0%) ggc
 df live&initialized regs:   0.66 ( 1%) usr   0.00 ( 0%) sys   0.66 ( 1%) wall       0 kB ( 0%) ggc
 df must-initialized regs:   0.09 ( 0%) usr   0.00 ( 0%) sys   0.08 ( 0%) wall       0 kB ( 0%) ggc
 df use-def / def-use chains:   0.12 ( 0%) usr   0.00 ( 0%) sys   0.10 ( 0%) wall       0 kB ( 0%) ggc
 df reg dead/unused notes:   0.55 ( 1%) usr   0.02 ( 0%) sys   0.56 ( 1%) wall   13611 kB ( 1%) ggc
 register information    :   0.10 ( 0%) usr   0.00 ( 0%) sys   0.17 ( 0%) wall       0 kB ( 0%) ggc
 alias analysis          :   0.35 ( 1%) usr   0.01 ( 0%) sys   0.38 ( 1%) wall   38532 kB ( 1%) ggc
 alias stmt walking      :   1.88 ( 3%) usr   0.12 ( 2%) sys   1.88 ( 3%) wall    6090 kB ( 0%) ggc
 register scan           :   0.07 ( 0%) usr   0.00 ( 0%) sys   0.08 ( 0%) wall     220 kB ( 0%) ggc
 rebuild jump labels     :   0.13 ( 0%) usr   0.00 ( 0%) sys   0.18 ( 0%) wall       0 kB ( 0%) ggc
 preprocessing           :   0.14 ( 0%) usr   0.17 ( 3%) sys   0.22 ( 0%) wall    4495 kB ( 0%) ggc
 parser (global)         :   0.30 ( 1%) usr   0.18 ( 3%) sys   0.72 ( 1%) wall   76173 kB ( 3%) ggc
 parser struct body      :   0.12 ( 0%) usr   0.01 ( 0%) sys   0.17 ( 0%) wall   16036 kB ( 1%) ggc
 parser function body    :   0.07 ( 0%) usr   0.02 ( 0%) sys   0.08 ( 0%) wall    4327 kB ( 0%) ggc
 parser inl. func. body  :   0.50 ( 1%) usr   0.09 ( 2%) sys   0.44 ( 1%) wall   31362 kB ( 1%) ggc
 parser inl. meth. body  :   0.15 ( 0%) usr   0.03 ( 1%) sys   0.18 ( 0%) wall    9710 kB ( 0%) ggc
 template instantiation  :   5.60 (10%) usr   0.66 (12%) sys   6.36 (11%) wall  520420 kB (20%) ggc
 early inlining heuristics:   0.21 ( 0%) usr   0.05 ( 1%) sys   0.38 ( 1%) wall   32939 kB ( 1%) ggc
 inline parameters       :   0.48 ( 1%) usr   0.11 ( 2%) sys   0.63 ( 1%) wall   26778 kB ( 1%) ggc
 integration             :   2.34 ( 4%) usr   0.52 (10%) sys   3.09 ( 5%) wall  402418 kB (15%) ggc
 tree gimplify           :   0.79 ( 1%) usr   0.15 ( 3%) sys   0.96 ( 2%) wall  101338 kB ( 4%) ggc
 tree eh                 :   0.16 ( 0%) usr   0.03 ( 1%) sys   0.20 ( 0%) wall   25971 kB ( 1%) ggc
 tree CFG construction   :   0.11 ( 0%) usr   0.03 ( 1%) sys   0.16 ( 0%) wall   46961 kB ( 2%) ggc
 tree CFG cleanup        :   0.96 ( 2%) usr   0.05 ( 1%) sys   0.96 ( 2%) wall     568 kB ( 0%) ggc
 tree tail merge         :   0.12 ( 0%) usr   0.00 ( 0%) sys   0.12 ( 0%) wall     164 kB ( 0%) ggc
 tree VRP                :   0.87 ( 2%) usr   0.05 ( 1%) sys   0.96 ( 2%) wall   30066 kB ( 1%) ggc
 tree copy propagation   :   0.09 ( 0%) usr   0.00 ( 0%) sys   0.13 ( 0%) wall      44 kB ( 0%) ggc
 tree PTA                :   1.24 ( 2%) usr   0.20 ( 4%) sys   1.51 ( 3%) wall    7385 kB ( 0%) ggc
 tree PHI insertion      :   0.02 ( 0%) usr   0.02 ( 0%) sys   0.06 ( 0%) wall    7672 kB ( 0%) ggc
 tree SSA rewrite        :   0.48 ( 1%) usr   0.07 ( 1%) sys   0.50 ( 1%) wall   52607 kB ( 2%) ggc
 tree SSA other          :   0.14 ( 0%) usr   0.08 ( 1%) sys   0.23 ( 0%) wall    3138 kB ( 0%) ggc
 tree SSA incremental    :   0.80 ( 1%) usr   0.05 ( 1%) sys   0.76 ( 1%) wall   17553 kB ( 1%) ggc
 tree operand scan       :   0.94 ( 2%) usr   0.26 ( 5%) sys   1.37 ( 2%) wall  137855 kB ( 5%) ggc
 dominator optimization  :   0.58 ( 1%) usr   0.03 ( 1%) sys   0.55 ( 1%) wall   36049 kB ( 1%) ggc
 tree SRA                :   0.34 ( 1%) usr   0.02 ( 0%) sys   0.34 ( 1%) wall   13704 kB ( 1%) ggc
 isolate eroneous paths  :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall       0 kB ( 0%) ggc
 tree CCP                :   1.27 ( 2%) usr   0.19 ( 4%) sys   1.47 ( 2%) wall   24833 kB ( 1%) ggc
 tree PHI const/copy prop:   0.01 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall       2 kB ( 0%) ggc
 tree split crit edges   :   0.05 ( 0%) usr   0.00 ( 0%) sys   0.08 ( 0%) wall   25053 kB ( 1%) ggc
 tree reassociation      :   0.05 ( 0%) usr   0.00 ( 0%) sys   0.08 ( 0%) wall       0 kB ( 0%) ggc
 tree PRE                :   1.20 ( 2%) usr   0.08 ( 1%) sys   1.36 ( 2%) wall   22556 kB ( 1%) ggc
 tree FRE                :   1.48 ( 3%) usr   0.15 ( 3%) sys   1.90 ( 3%) wall   26763 kB ( 1%) ggc
 tree code sinking       :   0.07 ( 0%) usr   0.00 ( 0%) sys   0.09 ( 0%) wall    8514 kB ( 0%) ggc
 tree linearize phis     :   0.07 ( 0%) usr   0.00 ( 0%) sys   0.07 ( 0%) wall     815 kB ( 0%) ggc
 tree backward propagate :   0.05 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall       0 kB ( 0%) ggc
 tree forward propagate  :   0.52 ( 1%) usr   0.01 ( 0%) sys   0.48 ( 1%) wall    7731 kB ( 0%) ggc
 tree conservative DCE   :   0.12 ( 0%) usr   0.01 ( 0%) sys   0.21 ( 0%) wall       0 kB ( 0%) ggc
 tree aggressive DCE     :   0.42 ( 1%) usr   0.02 ( 0%) sys   0.52 ( 1%) wall   28415 kB ( 1%) ggc
 tree DSE                :   0.27 ( 0%) usr   0.06 ( 1%) sys   0.38 ( 1%) wall     187 kB ( 0%) ggc
 PHI merge               :   0.03 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 0%) wall       0 kB ( 0%) ggc
 tree loop bounds        :   0.04 ( 0%) usr   0.01 ( 0%) sys   0.06 ( 0%) wall    2358 kB ( 0%) ggc
 tree loop invariant motion:   0.09 ( 0%) usr   0.01 ( 0%) sys   0.09 ( 0%) wall      98 kB ( 0%) ggc
 tree canonical iv       :   0.02 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall    3413 kB ( 0%) ggc
 scev constant prop      :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall     931 kB ( 0%) ggc
 complete unrolling      :   0.27 ( 0%) usr   0.04 ( 1%) sys   0.24 ( 0%) wall   19229 kB ( 1%) ggc
 tree iv optimization    :   0.39 ( 1%) usr   0.01 ( 0%) sys   0.40 ( 1%) wall   34176 kB ( 1%) ggc
 tree copy headers       :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall     180 kB ( 0%) ggc
 tree SSA uncprop        :   0.02 ( 0%) usr   0.02 ( 0%) sys   0.03 ( 0%) wall       0 kB ( 0%) ggc
 tree switch conversion  :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.03 ( 0%) wall       0 kB ( 0%) ggc
 tree strlen optimization:   0.03 ( 0%) usr   0.00 ( 0%) sys   0.04 ( 0%) wall      21 kB ( 0%) ggc
 dominance frontiers     :   0.07 ( 0%) usr   0.02 ( 0%) sys   0.12 ( 0%) wall       0 kB ( 0%) ggc
 dominance computation   :   1.10 ( 2%) usr   0.16 ( 3%) sys   1.20 ( 2%) wall       0 kB ( 0%) ggc
 control dependences     :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.04 ( 0%) wall       0 kB ( 0%) ggc
 out of ssa              :   0.11 ( 0%) usr   0.00 ( 0%) sys   0.10 ( 0%) wall     618 kB ( 0%) ggc
 expand vars             :   0.12 ( 0%) usr   0.00 ( 0%) sys   0.06 ( 0%) wall   13615 kB ( 1%) ggc
 expand                  :   0.43 ( 1%) usr   0.00 ( 0%) sys   0.39 ( 1%) wall  101778 kB ( 4%) ggc
 post expand cleanups    :   0.05 ( 0%) usr   0.00 ( 0%) sys   0.05 ( 0%) wall    7170 kB ( 0%) ggc
 varconst                :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall     106 kB ( 0%) ggc
 lower subreg            :   0.04 ( 0%) usr   0.00 ( 0%) sys   0.03 ( 0%) wall    4210 kB ( 0%) ggc
 jump                    :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall       0 kB ( 0%) ggc
 forward prop            :   0.24 ( 0%) usr   0.00 ( 0%) sys   0.31 ( 1%) wall    8593 kB ( 0%) ggc
 CSE                     :   0.82 ( 2%) usr   0.02 ( 0%) sys   0.79 ( 1%) wall    5235 kB ( 0%) ggc
 dead code elimination   :   0.07 ( 0%) usr   0.00 ( 0%) sys   0.09 ( 0%) wall       0 kB ( 0%) ggc
 dead store elim1        :   0.27 ( 0%) usr   0.02 ( 0%) sys   0.23 ( 0%) wall    8995 kB ( 0%) ggc
 dead store elim2        :   0.24 ( 0%) usr   0.00 ( 0%) sys   0.28 ( 0%) wall    9109 kB ( 0%) ggc
 loop init               :   0.57 ( 1%) usr   0.02 ( 0%) sys   0.50 ( 1%) wall   42537 kB ( 2%) ggc
 loop invariant motion   :   0.05 ( 0%) usr   0.00 ( 0%) sys   0.05 ( 0%) wall     764 kB ( 0%) ggc
 loop fini               :   0.06 ( 0%) usr   0.00 ( 0%) sys   0.12 ( 0%) wall       0 kB ( 0%) ggc
 CPROP                   :   0.70 ( 1%) usr   0.02 ( 0%) sys   0.72 ( 1%) wall   20664 kB ( 1%) ggc
 PRE                     :   0.36 ( 1%) usr   0.02 ( 0%) sys   0.39 ( 1%) wall    5731 kB ( 0%) ggc
 CSE 2                   :   0.34 ( 1%) usr   0.01 ( 0%) sys   0.45 ( 1%) wall    2495 kB ( 0%) ggc
 branch prediction       :   0.28 ( 1%) usr   0.06 ( 1%) sys   0.27 ( 0%) wall    7710 kB ( 0%) ggc
 combiner                :   0.71 ( 1%) usr   0.01 ( 0%) sys   0.68 ( 1%) wall   24666 kB ( 1%) ggc
 if-conversion           :   0.06 ( 0%) usr   0.00 ( 0%) sys   0.06 ( 0%) wall    1808 kB ( 0%) ggc
 integrated RA           :   1.61 ( 3%) usr   0.03 ( 1%) sys   1.75 ( 3%) wall  102369 kB ( 4%) ggc
 LRA non-specific        :   0.68 ( 1%) usr   0.03 ( 1%) sys   0.52 ( 1%) wall   11719 kB ( 0%) ggc
 LRA virtuals elimination:   0.13 ( 0%) usr   0.00 ( 0%) sys   0.12 ( 0%) wall    9027 kB ( 0%) ggc
 LRA reload inheritance  :   0.11 ( 0%) usr   0.00 ( 0%) sys   0.05 ( 0%) wall     269 kB ( 0%) ggc
 LRA create live ranges  :   0.56 ( 1%) usr   0.00 ( 0%) sys   0.57 ( 1%) wall    2012 kB ( 0%) ggc
 LRA hard reg assignment :   0.07 ( 0%) usr   0.00 ( 0%) sys   0.13 ( 0%) wall       0 kB ( 0%) ggc
 LRA coalesce pseudo regs:   0.02 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 0%) wall       0 kB ( 0%) ggc
 LRA rematerialization   :   0.03 ( 0%) usr   0.01 ( 0%) sys   0.11 ( 0%) wall       0 kB ( 0%) ggc
 reload                  :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 0%) wall       0 kB ( 0%) ggc
 reload CSE regs         :   0.75 ( 1%) usr   0.00 ( 0%) sys   0.81 ( 1%) wall   15907 kB ( 1%) ggc
 ree                     :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.06 ( 0%) wall      71 kB ( 0%) ggc
 thread pro- & epilogue  :   0.13 ( 0%) usr   0.00 ( 0%) sys   0.14 ( 0%) wall    2467 kB ( 0%) ggc
 if-conversion 2         :   0.02 ( 0%) usr   0.00 ( 0%) sys   0.03 ( 0%) wall       3 kB ( 0%) ggc
 combine stack adjustments:   0.03 ( 0%) usr   0.00 ( 0%) sys   0.03 ( 0%) wall       0 kB ( 0%) ggc
 peephole 2              :   0.08 ( 0%) usr   0.00 ( 0%) sys   0.08 ( 0%) wall    2237 kB ( 0%) ggc
 hard reg cprop          :   0.12 ( 0%) usr   0.02 ( 0%) sys   0.12 ( 0%) wall     156 kB ( 0%) ggc
 scheduling 2            :   2.07 ( 4%) usr   0.06 ( 1%) sys   2.14 ( 4%) wall    3336 kB ( 0%) ggc
 reorder blocks          :   0.11 ( 0%) usr   0.00 ( 0%) sys   0.13 ( 0%) wall   10484 kB ( 0%) ggc
 shorten branches        :   0.18 ( 0%) usr   0.00 ( 0%) sys   0.11 ( 0%) wall       0 kB ( 0%) ggc
 reg stack               :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall     180 kB ( 0%) ggc
 final                   :   0.28 ( 1%) usr   0.01 ( 0%) sys   0.28 ( 0%) wall   15132 kB ( 1%) ggc
 variable output         :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall      74 kB ( 0%) ggc
 symout                  :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall       0 kB ( 0%) ggc
 tree if-combine         :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall       0 kB ( 0%) ggc
 uninit var analysis     :   0.02 ( 0%) usr   0.01 ( 0%) sys   0.02 ( 0%) wall       0 kB ( 0%) ggc
 straight-line strength reduction:   0.03 ( 0%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall     156 kB ( 0%) ggc
 address lowering        :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall     741 kB ( 0%) ggc
 early local passes      :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall       0 kB ( 0%) ggc
 rest of compilation     :   0.66 ( 1%) usr   0.06 ( 1%) sys   0.49 ( 1%) wall    8781 kB ( 0%) ggc
 remove unused locals    :   0.44 ( 1%) usr   0.07 ( 1%) sys   0.46 ( 1%) wall     420 kB ( 0%) ggc
 address taken           :   0.21 ( 0%) usr   0.02 ( 0%) sys   0.25 ( 0%) wall       0 kB ( 0%) ggc
 unaccounted todo        :   1.03 ( 2%) usr   0.10 ( 2%) sys   1.16 ( 2%) wall   22158 kB ( 1%) ggc
 rebuild frequencies     :   0.02 ( 0%) usr   0.01 ( 0%) sys   0.03 ( 0%) wall     939 kB ( 0%) ggc
 repair loop structures  :   0.02 ( 0%) usr   0.01 ( 0%) sys   0.04 ( 0%) wall       0 kB ( 0%) ggc
 TOTAL                 :  54.37             5.34            60.26            2619783 kB
mattkretz commented 7 years ago

native_test_types does not follow TEST_TYPES for the non-full ABIs. This is especially relevant for the knl builds which instantiate more templates than intended and test some functions three times.

mattkretz commented 7 years ago

Maybe PCH can help. Have to try https://github.com/sakra/cotire/

mattkretz commented 7 years ago

I tried PCH manually with GCC6 and saw no reduction in compilation time for the datapar test. Since almost every test requires different compiler flags (the same flags are used only four times: datapar.cpp, datapar_mask.cpp, loadstore.cpp, and where.cpp) there's nothing to be gained here - only complexity in the buildsystem...

Edit: I precompiled tests/unittest.h, so not much left to be added to the PCH.