YosysHQ / yosys

Yosys Open SYnthesis Suite
https://yosyshq.net/yosys/
ISC License
3.3k stars 860 forks source link

CXXRTL: >20x compile time regression with clang++-18 #4419

Open Wren6991 opened 1 month ago

Wren6991 commented 1 month ago

Version

Yosys 0.39+4 (git sha1 3231c1cd9, clang++ 16.0.6 -fPIC -Os)

On which OS did this happen?

Linux

Reproduction Steps

This zip file contains the full dut.cpp from write_cxxrtl, which shows a 24x compile time regression: dut-full.zip

Or, this zip file contains a reduced dut.cpp (900k instead of 9M), which shows a 6x compile time regression: dut-reduced.zip

Good:

time clang++-17 -O3 -c -std=c++14 -I $(yosys-config --datdir)/include/backends/cxxrtl/runtime dut.cpp

Bad:

time clang++-18 -O3 -c -std=c++14 -I $(yosys-config --datdir)/include/backends/cxxrtl/runtime dut.cpp

The full dut.cpp can be reproduced from Verilog source by:

git clone git@github.com:Wren6991/Hazard3.git hazard3
cd hazard3
git checkout 5b31e2679
git submodule update --init -- scripts
. sourceme
cd test/sim/tb_cxxrtl
make

The reduced DUT has the hierarchy trimmed to just the hazard3_decode module, and all optional instruction extensions disabled.

This reproduces at -Og and above, but not at -O0.

Expected Behavior

Compile takes 45 seconds, as with clang++-16, clang++-17.

Actual Behavior

Compile takes 18 minutes with clang++-18.

I dithered on whether to report this here, but it seems like there is something about CXXRTL's code generation that hits some pathologically slow case in clang++-18, and maybe this could be improved from the CXXRTL side.

Alternatively if there is some recommended set of clang/llvm flags to use with CXXRTL to disable problematic passes and stop the compile time from blowing up, documenting that would also be helpful.

clang++-18 is the default clang++ as of Ubuntu 24.04 LTS, so I imagine more people will start hitting this.

whitequark commented 1 month ago

Oh wow, that hurts.

I think there is almost certainly an LLVM/Clang regression underlying. Can you profile it please?

Wren6991 commented 4 weeks ago

Here is the output from -ftime-report. First for clang++-17 (good):

``` clang++-17 -ftime-report -O3 -std=c++14 -I /usr/local/share/yosys/include/backends/cxxrtl/runtime -I build-tb tb.cpp -o tb ===-------------------------------------------------------------------------=== Pass execution timing report ===-------------------------------------------------------------------------=== Total Execution Time: 30.6422 seconds (30.6881 wall clock) ---User Time--- --System Time-- --User+System-- ---Wall Time--- --- Name --- 16.1551 ( 53.7%) 0.0449 ( 8.0%) 16.2000 ( 52.9%) 16.2014 ( 52.8%) SROAPass 5.1537 ( 17.1%) 0.1296 ( 23.0%) 5.2834 ( 17.2%) 5.2919 ( 17.2%) InstCombinePass 2.0826 ( 6.9%) 0.0125 ( 2.2%) 2.0951 ( 6.8%) 2.0950 ( 6.8%) JumpThreadingPass 1.9233 ( 6.4%) 0.1124 ( 20.0%) 2.0357 ( 6.6%) 2.0483 ( 6.7%) InlinerPass 1.3596 ( 4.5%) 0.0224 ( 4.0%) 1.3821 ( 4.5%) 1.3815 ( 4.5%) CorrelatedValuePropagationPass 0.5307 ( 1.8%) 0.0081 ( 1.4%) 0.5388 ( 1.8%) 0.5402 ( 1.8%) IPSCCPPass 0.5058 ( 1.7%) 0.0223 ( 4.0%) 0.5282 ( 1.7%) 0.5277 ( 1.7%) GVNPass 0.4481 ( 1.5%) 0.0391 ( 6.9%) 0.4872 ( 1.6%) 0.4944 ( 1.6%) EarlyCSEPass 0.3813 ( 1.3%) 0.0320 ( 5.7%) 0.4133 ( 1.3%) 0.4133 ( 1.3%) SimplifyCFGPass 0.2193 ( 0.7%) 0.0000 ( 0.0%) 0.2193 ( 0.7%) 0.2202 ( 0.7%) GlobalOptPass 0.2147 ( 0.7%) 0.0000 ( 0.0%) 0.2147 ( 0.7%) 0.2148 ( 0.7%) CalledValuePropagationPass 0.1690 ( 0.6%) 0.0006 ( 0.1%) 0.1696 ( 0.6%) 0.1773 ( 0.6%) SLPVectorizerPass 0.1158 ( 0.4%) 0.0000 ( 0.0%) 0.1158 ( 0.4%) 0.1168 ( 0.4%) RequireAnalysisPass> 0.0669 ( 0.2%) 0.0219 ( 3.9%) 0.0888 ( 0.3%) 0.0890 ( 0.3%) PostOrderFunctionAttrsPass 0.0670 ( 0.2%) 0.0072 ( 1.3%) 0.0741 ( 0.2%) 0.0739 ( 0.2%) DSEPass 0.0666 ( 0.2%) 0.0021 ( 0.4%) 0.0688 ( 0.2%) 0.0689 ( 0.2%) CallSiteSplittingPass 0.0524 ( 0.2%) 0.0091 ( 1.6%) 0.0615 ( 0.2%) 0.0611 ( 0.2%) ReassociatePass 0.0502 ( 0.2%) 0.0070 ( 1.2%) 0.0573 ( 0.2%) 0.0571 ( 0.2%) SCCPPass 0.0457 ( 0.2%) 0.0098 ( 1.7%) 0.0555 ( 0.2%) 0.0553 ( 0.2%) ADCEPass 0.0416 ( 0.1%) 0.0071 ( 1.3%) 0.0487 ( 0.2%) 0.0486 ( 0.2%) BDCEPass 0.0355 ( 0.1%) 0.0104 ( 1.9%) 0.0459 ( 0.1%) 0.0460 ( 0.2%) LoopSimplifyPass 0.0340 ( 0.1%) 0.0037 ( 0.7%) 0.0376 ( 0.1%) 0.0378 ( 0.1%) LowerExpectIntrinsicPass 0.0289 ( 0.1%) 0.0044 ( 0.8%) 0.0333 ( 0.1%) 0.0332 ( 0.1%) MemCpyOptPass 0.0279 ( 0.1%) 0.0000 ( 0.0%) 0.0279 ( 0.1%) 0.0279 ( 0.1%) AssignmentTrackingPass 0.0221 ( 0.1%) 0.0036 ( 0.6%) 0.0257 ( 0.1%) 0.0258 ( 0.1%) PromotePass 0.0190 ( 0.1%) 0.0035 ( 0.6%) 0.0226 ( 0.1%) 0.0224 ( 0.1%) ConstraintEliminationPass 0.0154 ( 0.1%) 0.0045 ( 0.8%) 0.0198 ( 0.1%) 0.0199 ( 0.1%) LCSSAPass 0.0163 ( 0.1%) 0.0031 ( 0.5%) 0.0194 ( 0.1%) 0.0192 ( 0.1%) TailCallElimPass 0.0155 ( 0.1%) 0.0022 ( 0.4%) 0.0176 ( 0.1%) 0.0176 ( 0.1%) IndVarSimplifyPass 0.0147 ( 0.0%) 0.0023 ( 0.4%) 0.0170 ( 0.1%) 0.0169 ( 0.1%) LICMPass 0.0129 ( 0.0%) 0.0038 ( 0.7%) 0.0167 ( 0.1%) 0.0166 ( 0.1%) LoopRotatePass 0.0156 ( 0.1%) 0.0000 ( 0.0%) 0.0156 ( 0.1%) 0.0156 ( 0.1%) RecomputeGlobalsAAPass 0.0121 ( 0.0%) 0.0029 ( 0.5%) 0.0150 ( 0.0%) 0.0149 ( 0.0%) AggressiveInstCombinePass 0.0105 ( 0.0%) 0.0038 ( 0.7%) 0.0142 ( 0.0%) 0.0143 ( 0.0%) RequireAnalysisPass> 0.0110 ( 0.0%) 0.0022 ( 0.4%) 0.0132 ( 0.0%) 0.0132 ( 0.0%) LoopIdiomRecognizePass 0.0131 ( 0.0%) 0.0000 ( 0.0%) 0.0131 ( 0.0%) 0.0131 ( 0.0%) InstSimplifyPass 0.0088 ( 0.0%) 0.0021 ( 0.4%) 0.0110 ( 0.0%) 0.0109 ( 0.0%) VectorCombinePass 0.0073 ( 0.0%) 0.0033 ( 0.6%) 0.0105 ( 0.0%) 0.0105 ( 0.0%) InvalidateAnalysisPass 0.0084 ( 0.0%) 0.0000 ( 0.0%) 0.0084 ( 0.0%) 0.0084 ( 0.0%) ReversePostOrderFunctionAttrsPass 0.0074 ( 0.0%) 0.0011 ( 0.2%) 0.0084 ( 0.0%) 0.0084 ( 0.0%) LoopFullUnrollPass 0.0067 ( 0.0%) 0.0013 ( 0.2%) 0.0080 ( 0.0%) 0.0080 ( 0.0%) Float2IntPass 0.0078 ( 0.0%) 0.0000 ( 0.0%) 0.0078 ( 0.0%) 0.0078 ( 0.0%) LoopUnrollPass 0.0070 ( 0.0%) 0.0006 ( 0.1%) 0.0077 ( 0.0%) 0.0076 ( 0.0%) LoopDeletionPass 0.0073 ( 0.0%) 0.0000 ( 0.0%) 0.0073 ( 0.0%) 0.0073 ( 0.0%) GlobalDCEPass 0.0000 ( 0.0%) 0.0003 ( 0.1%) 0.0003 ( 0.0%) 0.0069 ( 0.0%) CGProfilePass 0.0051 ( 0.0%) 0.0016 ( 0.3%) 0.0067 ( 0.0%) 0.0066 ( 0.0%) LibCallsShrinkWrapPass 0.0046 ( 0.0%) 0.0017 ( 0.3%) 0.0063 ( 0.0%) 0.0063 ( 0.0%) CoroSplitPass 0.0045 ( 0.0%) 0.0016 ( 0.3%) 0.0061 ( 0.0%) 0.0062 ( 0.0%) CoroElidePass 0.0046 ( 0.0%) 0.0016 ( 0.3%) 0.0062 ( 0.0%) 0.0061 ( 0.0%) OpenMPOptCGSCCPass 0.0044 ( 0.0%) 0.0016 ( 0.3%) 0.0061 ( 0.0%) 0.0060 ( 0.0%) SpeculativeExecutionPass 0.0044 ( 0.0%) 0.0015 ( 0.3%) 0.0059 ( 0.0%) 0.0059 ( 0.0%) MoveAutoInitPass 0.0041 ( 0.0%) 0.0015 ( 0.3%) 0.0057 ( 0.0%) 0.0057 ( 0.0%) ArgumentPromotionPass 0.0042 ( 0.0%) 0.0014 ( 0.3%) 0.0056 ( 0.0%) 0.0056 ( 0.0%) MergedLoadStoreMotionPass 0.0055 ( 0.0%) 0.0000 ( 0.0%) 0.0055 ( 0.0%) 0.0055 ( 0.0%) DeadArgumentEliminationPass 0.0032 ( 0.0%) 0.0008 ( 0.1%) 0.0040 ( 0.0%) 0.0040 ( 0.0%) SimpleLoopUnswitchPass 0.0032 ( 0.0%) 0.0008 ( 0.1%) 0.0040 ( 0.0%) 0.0040 ( 0.0%) LoopInstSimplifyPass 0.0027 ( 0.0%) 0.0001 ( 0.0%) 0.0027 ( 0.0%) 0.0033 ( 0.0%) LoopVectorizePass 0.0021 ( 0.0%) 0.0003 ( 0.1%) 0.0024 ( 0.0%) 0.0024 ( 0.0%) LoopDistributePass 0.0018 ( 0.0%) 0.0005 ( 0.1%) 0.0023 ( 0.0%) 0.0022 ( 0.0%) LoopSimplifyCFGPass 0.0020 ( 0.0%) 0.0000 ( 0.0%) 0.0020 ( 0.0%) 0.0020 ( 0.0%) EliminateAvailableExternallyPass 0.0018 ( 0.0%) 0.0000 ( 0.0%) 0.0018 ( 0.0%) 0.0018 ( 0.0%) LoopLoadEliminationPass 0.0016 ( 0.0%) 0.0001 ( 0.0%) 0.0017 ( 0.0%) 0.0017 ( 0.0%) LowerConstantIntrinsicsPass 0.0010 ( 0.0%) 0.0000 ( 0.0%) 0.0010 ( 0.0%) 0.0010 ( 0.0%) ConstantMergePass 0.0010 ( 0.0%) 0.0000 ( 0.0%) 0.0010 ( 0.0%) 0.0010 ( 0.0%) InjectTLIMappings 0.0010 ( 0.0%) 0.0000 ( 0.0%) 0.0010 ( 0.0%) 0.0010 ( 0.0%) DivRemPairsPass 0.0006 ( 0.0%) 0.0000 ( 0.0%) 0.0006 ( 0.0%) 0.0006 ( 0.0%) InferFunctionAttrsPass 0.0004 ( 0.0%) 0.0001 ( 0.0%) 0.0004 ( 0.0%) 0.0005 ( 0.0%) ControlHeightReductionPass 0.0000 ( 0.0%) 0.0004 ( 0.1%) 0.0004 ( 0.0%) 0.0004 ( 0.0%) AnnotationRemarksPass 0.0004 ( 0.0%) 0.0000 ( 0.0%) 0.0004 ( 0.0%) 0.0004 ( 0.0%) InvalidateAnalysisPass 0.0002 ( 0.0%) 0.0000 ( 0.0%) 0.0002 ( 0.0%) 0.0002 ( 0.0%) WarnMissedTransformationsPass 0.0001 ( 0.0%) 0.0000 ( 0.0%) 0.0001 ( 0.0%) 0.0002 ( 0.0%) CoroEarlyPass 0.0001 ( 0.0%) 0.0000 ( 0.0%) 0.0001 ( 0.0%) 0.0001 ( 0.0%) AlignmentFromAssumptionsPass 0.0001 ( 0.0%) 0.0000 ( 0.0%) 0.0001 ( 0.0%) 0.0001 ( 0.0%) LoopSinkPass 0.0000 ( 0.0%) 0.0001 ( 0.0%) 0.0001 ( 0.0%) 0.0001 ( 0.0%) RelLookupTableConverterPass 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) RequireAnalysisPass> 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) CoroCleanupPass 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Annotation2MetadataPass 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) OpenMPOptPass 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) ForceFunctionAttrsPass 30.0792 (100.0%) 0.5630 (100.0%) 30.6422 (100.0%) 30.6881 (100.0%) Total ===-------------------------------------------------------------------------=== Analysis execution timing report ===-------------------------------------------------------------------------=== Total Execution Time: 1.0995 seconds (1.0997 wall clock) ---User Time--- --System Time-- --User+System-- ---Wall Time--- --- Name --- 0.1626 ( 16.7%) 0.0177 ( 14.2%) 0.1802 ( 16.4%) 0.1796 ( 16.3%) DominatorTreeAnalysis 0.1652 ( 16.9%) 0.0090 ( 7.3%) 0.1742 ( 15.8%) 0.1740 ( 15.8%) MemorySSAAnalysis 0.1618 ( 16.6%) 0.0107 ( 8.6%) 0.1725 ( 15.7%) 0.1723 ( 15.7%) BranchProbabilityAnalysis 0.0984 ( 10.1%) 0.0000 ( 0.0%) 0.0984 ( 8.9%) 0.0984 ( 8.9%) CallGraphAnalysis 0.0806 ( 8.3%) 0.0086 ( 6.9%) 0.0892 ( 8.1%) 0.0896 ( 8.1%) BlockFrequencyAnalysis 0.0792 ( 8.1%) 0.0096 ( 7.7%) 0.0888 ( 8.1%) 0.0882 ( 8.0%) PostDominatorTreeAnalysis 0.0715 ( 7.3%) 0.0099 ( 7.9%) 0.0813 ( 7.4%) 0.0813 ( 7.4%) LoopAnalysis 0.0431 ( 4.4%) 0.0223 ( 18.0%) 0.0654 ( 6.0%) 0.0650 ( 5.9%) AAManager 0.0303 ( 3.1%) 0.0000 ( 0.0%) 0.0303 ( 2.8%) 0.0313 ( 2.8%) GlobalsAA 0.0162 ( 1.7%) 0.0071 ( 5.7%) 0.0233 ( 2.1%) 0.0230 ( 2.1%) BasicAA 0.0090 ( 0.9%) 0.0034 ( 2.7%) 0.0124 ( 1.1%) 0.0126 ( 1.1%) FunctionAnalysisManagerCGSCCProxy 0.0078 ( 0.8%) 0.0029 ( 2.3%) 0.0107 ( 1.0%) 0.0108 ( 1.0%) LazyValueAnalysis 0.0057 ( 0.6%) 0.0023 ( 1.8%) 0.0079 ( 0.7%) 0.0085 ( 0.8%) TargetIRAnalysis 0.0049 ( 0.5%) 0.0018 ( 1.4%) 0.0067 ( 0.6%) 0.0066 ( 0.6%) MemoryDependenceAnalysis 0.0042 ( 0.4%) 0.0021 ( 1.7%) 0.0063 ( 0.6%) 0.0063 ( 0.6%) TargetLibraryAnalysis 0.0043 ( 0.4%) 0.0018 ( 1.5%) 0.0061 ( 0.6%) 0.0061 ( 0.6%) AssumptionAnalysis 0.0050 ( 0.5%) 0.0006 ( 0.5%) 0.0056 ( 0.5%) 0.0059 ( 0.5%) ScalarEvolutionAnalysis 0.0044 ( 0.4%) 0.0016 ( 1.3%) 0.0059 ( 0.5%) 0.0058 ( 0.5%) OuterAnalysisManagerProxy 0.0041 ( 0.4%) 0.0015 ( 1.2%) 0.0056 ( 0.5%) 0.0056 ( 0.5%) DemandedBitsAnalysis 0.0030 ( 0.3%) 0.0024 ( 1.9%) 0.0054 ( 0.5%) 0.0053 ( 0.5%) OuterAnalysisManagerProxy 0.0026 ( 0.3%) 0.0022 ( 1.7%) 0.0047 ( 0.4%) 0.0049 ( 0.4%) OptimizationRemarkEmitterAnalysis 0.0026 ( 0.3%) 0.0022 ( 1.8%) 0.0049 ( 0.4%) 0.0048 ( 0.4%) TypeBasedAA 0.0031 ( 0.3%) 0.0012 ( 1.0%) 0.0043 ( 0.4%) 0.0043 ( 0.4%) ShouldNotRunFunctionPassesAnalysis 0.0022 ( 0.2%) 0.0020 ( 1.6%) 0.0042 ( 0.4%) 0.0042 ( 0.4%) ScopedNoAliasAA 0.0024 ( 0.2%) 0.0010 ( 0.8%) 0.0034 ( 0.3%) 0.0035 ( 0.3%) LazyCallGraphAnalysis 0.0007 ( 0.1%) 0.0002 ( 0.1%) 0.0008 ( 0.1%) 0.0008 ( 0.1%) InnerAnalysisManagerProxy 0.0006 ( 0.1%) 0.0002 ( 0.1%) 0.0007 ( 0.1%) 0.0007 ( 0.1%) OuterAnalysisManagerProxy 0.0002 ( 0.0%) 0.0000 ( 0.0%) 0.0002 ( 0.0%) 0.0002 ( 0.0%) LoopAccessAnalysis 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) ShouldRunExtraVectorPasses 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) InnerAnalysisManagerProxy 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) InnerAnalysisManagerProxy 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) ProfileSummaryAnalysis 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) InlineAdvisorAnalysis 0.9755 (100.0%) 0.1240 (100.0%) 1.0995 (100.0%) 1.0997 (100.0%) Total ===-------------------------------------------------------------------------=== Miscellaneous Ungrouped Timers ===-------------------------------------------------------------------------=== ---User Time--- --System Time-- --User+System-- ---Wall Time--- --- Name --- 36.6557 ( 93.8%) 1.3339 ( 92.1%) 37.9896 ( 93.7%) 38.0714 ( 93.7%) Code Generation Time 2.4234 ( 6.2%) 0.1142 ( 7.9%) 2.5376 ( 6.3%) 2.5584 ( 6.3%) LLVM IR Generation Time 39.0791 (100.0%) 1.4481 (100.0%) 40.5272 (100.0%) 40.6298 (100.0%) Total ===-------------------------------------------------------------------------=== Register Allocation ===-------------------------------------------------------------------------=== Total Execution Time: 0.3985 seconds (0.3982 wall clock) ---User Time--- --System Time-- --User+System-- ---Wall Time--- --- Name --- 0.2610 ( 69.0%) 0.0072 ( 35.5%) 0.2682 ( 67.3%) 0.2680 ( 67.3%) Global Splitting 0.0472 ( 12.5%) 0.0022 ( 10.9%) 0.0494 ( 12.4%) 0.0491 ( 12.3%) Spiller 0.0400 ( 10.6%) 0.0074 ( 36.3%) 0.0474 ( 11.9%) 0.0475 ( 11.9%) Evict 0.0206 ( 5.5%) 0.0034 ( 16.9%) 0.0241 ( 6.0%) 0.0241 ( 6.0%) Local Splitting 0.0094 ( 2.5%) 0.0001 ( 0.5%) 0.0094 ( 2.4%) 0.0095 ( 2.4%) Seed Live Regs 0.3781 (100.0%) 0.0204 (100.0%) 0.3985 (100.0%) 0.3982 (100.0%) Total ===-------------------------------------------------------------------------=== Instruction Selection and Scheduling ===-------------------------------------------------------------------------=== Total Execution Time: 0.7124 seconds (0.7202 wall clock) ---User Time--- --System Time-- --User+System-- ---Wall Time--- --- Name --- 0.1365 ( 22.7%) 0.0242 ( 22.0%) 0.1607 ( 22.6%) 0.1630 ( 22.6%) DAG Combining 1 0.0873 ( 14.5%) 0.0178 ( 16.2%) 0.1051 ( 14.8%) 0.1056 ( 14.7%) Instruction Selection 0.0874 ( 14.5%) 0.0132 ( 12.1%) 0.1006 ( 14.1%) 0.1000 ( 13.9%) DAG Combining 2 0.0755 ( 12.5%) 0.0155 ( 14.1%) 0.0910 ( 12.8%) 0.0906 ( 12.6%) Instruction Scheduling 0.0627 ( 10.4%) 0.0075 ( 6.9%) 0.0702 ( 9.9%) 0.0699 ( 9.7%) DAG Combining after legalize types 0.0509 ( 8.4%) 0.0111 ( 10.1%) 0.0620 ( 8.7%) 0.0613 ( 8.5%) Instruction Creation 0.0344 ( 5.7%) 0.0077 ( 7.0%) 0.0421 ( 5.9%) 0.0424 ( 5.9%) Type Legalization 0.0319 ( 5.3%) 0.0071 ( 6.5%) 0.0390 ( 5.5%) 0.0392 ( 5.4%) DAG Legalization 0.0131 ( 2.2%) 0.0002 ( 0.2%) 0.0133 ( 1.9%) 0.0198 ( 2.8%) DAG Combining after legalize vectors 0.0114 ( 1.9%) 0.0026 ( 2.4%) 0.0140 ( 2.0%) 0.0142 ( 2.0%) Vector Legalization 0.0113 ( 1.9%) 0.0028 ( 2.5%) 0.0141 ( 2.0%) 0.0139 ( 1.9%) Instruction Scheduling Cleanup 0.0002 ( 0.0%) 0.0000 ( 0.0%) 0.0002 ( 0.0%) 0.0002 ( 0.0%) Type Legalization 2 0.6025 (100.0%) 0.1099 (100.0%) 0.7124 (100.0%) 0.7202 (100.0%) Total ===-------------------------------------------------------------------------=== Pass execution timing report ===-------------------------------------------------------------------------=== Total Execution Time: 6.4543 seconds (6.4786 wall clock) ---User Time--- --System Time-- --User+System-- ---Wall Time--- --- Name --- 1.3179 ( 22.6%) 0.0006 ( 0.1%) 1.3185 ( 20.4%) 1.3186 ( 20.4%) Register Coalescer 0.9560 ( 16.4%) 0.1799 ( 29.3%) 1.1359 ( 17.6%) 1.1534 ( 17.8%) X86 DAG->DAG Instruction Selection 0.8288 ( 14.2%) 0.0305 ( 5.0%) 0.8593 ( 13.3%) 0.8595 ( 13.3%) Greedy Register Allocator #2 0.2104 ( 3.6%) 0.3266 ( 53.2%) 0.5370 ( 8.3%) 0.5371 ( 8.3%) X86 Assembly Printer 0.4820 ( 8.3%) 0.0285 ( 4.6%) 0.5105 ( 7.9%) 0.5101 ( 7.9%) Induction Variable Users 0.2793 ( 4.8%) 0.0008 ( 0.1%) 0.2802 ( 4.3%) 0.2802 ( 4.3%) Live Interval Analysis 0.1538 ( 2.6%) 0.0013 ( 0.2%) 0.1552 ( 2.4%) 0.1552 ( 2.4%) Live Variable Analysis 0.1443 ( 2.5%) 0.0003 ( 0.0%) 0.1446 ( 2.2%) 0.1447 ( 2.2%) Machine code sinking 0.0855 ( 1.5%) 0.0022 ( 0.4%) 0.0877 ( 1.4%) 0.0879 ( 1.4%) Machine Instruction Scheduler 0.0790 ( 1.4%) 0.0019 ( 0.3%) 0.0809 ( 1.3%) 0.0809 ( 1.2%) ReachingDefAnalysis 0.0595 ( 1.0%) 0.0003 ( 0.1%) 0.0599 ( 0.9%) 0.0598 ( 0.9%) Eliminate PHI nodes for register allocation 0.0535 ( 0.9%) 0.0004 ( 0.1%) 0.0539 ( 0.8%) 0.0539 ( 0.8%) Virtual Register Rewriter 0.0481 ( 0.8%) 0.0030 ( 0.5%) 0.0511 ( 0.8%) 0.0511 ( 0.8%) CodeGen Prepare 0.0471 ( 0.8%) 0.0006 ( 0.1%) 0.0477 ( 0.7%) 0.0477 ( 0.7%) Prologue/Epilogue Insertion & Frame Finalization 0.0376 ( 0.6%) 0.0004 ( 0.1%) 0.0380 ( 0.6%) 0.0446 ( 0.7%) Two-Address instruction pass 0.0400 ( 0.7%) 0.0005 ( 0.1%) 0.0405 ( 0.6%) 0.0405 ( 0.6%) Control Flow Optimizer 0.0361 ( 0.6%) 0.0003 ( 0.0%) 0.0363 ( 0.6%) 0.0363 ( 0.6%) Merge disjoint stack slots 0.0319 ( 0.5%) 0.0002 ( 0.0%) 0.0322 ( 0.5%) 0.0322 ( 0.5%) Machine InstCombiner 0.0314 ( 0.5%) 0.0003 ( 0.0%) 0.0317 ( 0.5%) 0.0317 ( 0.5%) MachineDominator Tree Construction #9 0.0270 ( 0.5%) 0.0002 ( 0.0%) 0.0271 ( 0.4%) 0.0271 ( 0.4%) Slot index numbering #2 0.0258 ( 0.4%) 0.0003 ( 0.1%) 0.0261 ( 0.4%) 0.0262 ( 0.4%) Branch Probability Analysis #2 0.0253 ( 0.4%) 0.0005 ( 0.1%) 0.0259 ( 0.4%) 0.0258 ( 0.4%) Branch Probability Basic Block Placement 0.0217 ( 0.4%) 0.0007 ( 0.1%) 0.0224 ( 0.3%) 0.0224 ( 0.3%) Machine Common Subexpression Elimination 0.0221 ( 0.4%) 0.0002 ( 0.0%) 0.0223 ( 0.3%) 0.0223 ( 0.3%) Slot index numbering 0.0207 ( 0.4%) 0.0001 ( 0.0%) 0.0208 ( 0.3%) 0.0208 ( 0.3%) Stack Slot Coloring 0.0198 ( 0.3%) 0.0006 ( 0.1%) 0.0204 ( 0.3%) 0.0204 ( 0.3%) Branch Probability Analysis 0.0180 ( 0.3%) 0.0001 ( 0.0%) 0.0181 ( 0.3%) 0.0181 ( 0.3%) MachineDominator Tree Construction #5 0.0173 ( 0.3%) 0.0005 ( 0.1%) 0.0178 ( 0.3%) 0.0178 ( 0.3%) Block Frequency Analysis 0.0159 ( 0.3%) 0.0010 ( 0.2%) 0.0169 ( 0.3%) 0.0169 ( 0.3%) X86 Byte/Word Instruction Fixup 0.0162 ( 0.3%) 0.0002 ( 0.0%) 0.0164 ( 0.3%) 0.0164 ( 0.3%) Machine Dominance Frontier Construction 0.0142 ( 0.2%) 0.0020 ( 0.3%) 0.0162 ( 0.3%) 0.0163 ( 0.3%) Loop Strength Reduction 0.0160 ( 0.3%) 0.0002 ( 0.0%) 0.0163 ( 0.3%) 0.0163 ( 0.3%) MachinePostDominator Tree Construction #2 0.0159 ( 0.3%) 0.0002 ( 0.0%) 0.0161 ( 0.2%) 0.0161 ( 0.2%) MachinePostDominator Tree Construction 0.0136 ( 0.2%) 0.0022 ( 0.4%) 0.0158 ( 0.2%) 0.0158 ( 0.2%) Natural Loop Information 0.0158 ( 0.3%) 0.0000 ( 0.0%) 0.0158 ( 0.2%) 0.0158 ( 0.2%) Dominator Tree Construction 0.0152 ( 0.3%) 0.0006 ( 0.1%) 0.0158 ( 0.2%) 0.0158 ( 0.2%) Post-Dominator Tree Construction 0.0155 ( 0.3%) 0.0003 ( 0.0%) 0.0158 ( 0.2%) 0.0157 ( 0.2%) MachinePostDominator Tree Construction #3 0.0154 ( 0.3%) 0.0003 ( 0.0%) 0.0156 ( 0.2%) 0.0156 ( 0.2%) MachineDominator Tree Construction #7 0.0148 ( 0.3%) 0.0003 ( 0.0%) 0.0151 ( 0.2%) 0.0151 ( 0.2%) Post-Dominator Tree Construction #2 0.0148 ( 0.3%) 0.0002 ( 0.0%) 0.0150 ( 0.2%) 0.0150 ( 0.2%) MachineDominator Tree Construction 0.0145 ( 0.2%) 0.0001 ( 0.0%) 0.0147 ( 0.2%) 0.0147 ( 0.2%) MachineDominator Tree Construction #6 0.0138 ( 0.2%) 0.0004 ( 0.1%) 0.0142 ( 0.2%) 0.0142 ( 0.2%) MachineDominator Tree Construction #8 0.0141 ( 0.2%) 0.0001 ( 0.0%) 0.0142 ( 0.2%) 0.0142 ( 0.2%) MachineDominator Tree Construction #2 0.0138 ( 0.2%) 0.0003 ( 0.0%) 0.0141 ( 0.2%) 0.0141 ( 0.2%) Dominator Tree Construction #3 0.0139 ( 0.2%) 0.0002 ( 0.0%) 0.0140 ( 0.2%) 0.0140 ( 0.2%) Machine Block Frequency Analysis #4 0.0138 ( 0.2%) 0.0002 ( 0.0%) 0.0140 ( 0.2%) 0.0140 ( 0.2%) Check CFA info and insert CFI instructions if needed 0.0139 ( 0.2%) 0.0000 ( 0.0%) 0.0139 ( 0.2%) 0.0139 ( 0.2%) ObjC ARC contraction 0.0133 ( 0.2%) 0.0002 ( 0.0%) 0.0135 ( 0.2%) 0.0135 ( 0.2%) Machine Block Frequency Analysis #3 0.0132 ( 0.2%) 0.0001 ( 0.0%) 0.0133 ( 0.2%) 0.0133 ( 0.2%) Tile Register Pre-configure 0.0129 ( 0.2%) 0.0003 ( 0.0%) 0.0131 ( 0.2%) 0.0131 ( 0.2%) Natural Loop Information #2 0.0120 ( 0.2%) 0.0007 ( 0.1%) 0.0127 ( 0.2%) 0.0127 ( 0.2%) Dominator Tree Construction #2 0.0124 ( 0.2%) 0.0003 ( 0.0%) 0.0127 ( 0.2%) 0.0127 ( 0.2%) Free MachineFunction 0.0125 ( 0.2%) 0.0001 ( 0.0%) 0.0126 ( 0.2%) 0.0126 ( 0.2%) MachineDominator Tree Construction #3 0.0121 ( 0.2%) 0.0002 ( 0.0%) 0.0123 ( 0.2%) 0.0123 ( 0.2%) Natural Loop Information #4 0.0113 ( 0.2%) 0.0004 ( 0.1%) 0.0118 ( 0.2%) 0.0118 ( 0.2%) Lower AMX type for load/store 0.0115 ( 0.2%) 0.0001 ( 0.0%) 0.0116 ( 0.2%) 0.0116 ( 0.2%) X86 EFLAGS copy lowering 0.0112 ( 0.2%) 0.0001 ( 0.0%) 0.0113 ( 0.2%) 0.0113 ( 0.2%) MachineDominator Tree Construction #4 0.0099 ( 0.2%) 0.0009 ( 0.1%) 0.0108 ( 0.2%) 0.0108 ( 0.2%) Machine Block Frequency Analysis #5 0.0096 ( 0.2%) 0.0001 ( 0.0%) 0.0098 ( 0.2%) 0.0098 ( 0.2%) Machine Block Frequency Analysis #2 0.0095 ( 0.2%) 0.0001 ( 0.0%) 0.0096 ( 0.1%) 0.0096 ( 0.1%) Machine Natural Loop Construction #3 0.0094 ( 0.2%) 0.0002 ( 0.0%) 0.0096 ( 0.1%) 0.0096 ( 0.1%) Machine Block Frequency Analysis 0.0093 ( 0.2%) 0.0001 ( 0.0%) 0.0094 ( 0.1%) 0.0094 ( 0.1%) X86 Fixup SetCC 0.0091 ( 0.2%) 0.0001 ( 0.0%) 0.0092 ( 0.1%) 0.0092 ( 0.1%) Machine Natural Loop Construction 0.0083 ( 0.1%) 0.0006 ( 0.1%) 0.0090 ( 0.1%) 0.0089 ( 0.1%) Machine Copy Propagation Pass 0.0088 ( 0.2%) 0.0001 ( 0.0%) 0.0089 ( 0.1%) 0.0089 ( 0.1%) Machine Natural Loop Construction #5 0.0086 ( 0.1%) 0.0001 ( 0.0%) 0.0087 ( 0.1%) 0.0087 ( 0.1%) Machine Natural Loop Construction #4 0.0082 ( 0.1%) 0.0001 ( 0.0%) 0.0083 ( 0.1%) 0.0083 ( 0.1%) Machine Natural Loop Construction #2 0.0062 ( 0.1%) 0.0018 ( 0.3%) 0.0080 ( 0.1%) 0.0081 ( 0.1%) Canonicalize Freeze Instructions in Loops 0.0079 ( 0.1%) 0.0001 ( 0.0%) 0.0080 ( 0.1%) 0.0080 ( 0.1%) Natural Loop Information #6 0.0078 ( 0.1%) 0.0001 ( 0.0%) 0.0079 ( 0.1%) 0.0079 ( 0.1%) Finalize ISel and expand pseudo-instructions 0.0074 ( 0.1%) 0.0005 ( 0.1%) 0.0079 ( 0.1%) 0.0079 ( 0.1%) Expand large div/rem 0.0076 ( 0.1%) 0.0002 ( 0.0%) 0.0078 ( 0.1%) 0.0078 ( 0.1%) Natural Loop Information #3 0.0076 ( 0.1%) 0.0001 ( 0.0%) 0.0077 ( 0.1%) 0.0077 ( 0.1%) Debug Variable Analysis 0.0075 ( 0.1%) 0.0001 ( 0.0%) 0.0076 ( 0.1%) 0.0076 ( 0.1%) X86 Fixup Inst Tuning 0.0065 ( 0.1%) 0.0011 ( 0.2%) 0.0076 ( 0.1%) 0.0076 ( 0.1%) Post-RA pseudo instruction expansion pass 0.0074 ( 0.1%) 0.0001 ( 0.0%) 0.0075 ( 0.1%) 0.0075 ( 0.1%) Natural Loop Information #5 0.0074 ( 0.1%) 0.0001 ( 0.0%) 0.0075 ( 0.1%) 0.0075 ( 0.1%) Process Implicit Definitions 0.0069 ( 0.1%) 0.0005 ( 0.1%) 0.0074 ( 0.1%) 0.0074 ( 0.1%) Machine Copy Propagation Pass #2 0.0069 ( 0.1%) 0.0001 ( 0.0%) 0.0070 ( 0.1%) 0.0070 ( 0.1%) Remove unreachable machine basic blocks 0.0065 ( 0.1%) 0.0001 ( 0.0%) 0.0066 ( 0.1%) 0.0066 ( 0.1%) X86 Lower Tile Copy 0.0062 ( 0.1%) 0.0002 ( 0.0%) 0.0065 ( 0.1%) 0.0065 ( 0.1%) Lower constant intrinsics 0.0062 ( 0.1%) 0.0001 ( 0.0%) 0.0064 ( 0.1%) 0.0064 ( 0.1%) X86 pseudo instruction expansion pass 0.0062 ( 0.1%) 0.0001 ( 0.0%) 0.0063 ( 0.1%) 0.0063 ( 0.1%) X86 Fixup Vector Constants 0.0061 ( 0.1%) 0.0001 ( 0.0%) 0.0063 ( 0.1%) 0.0063 ( 0.1%) Replace intrinsics with calls to vector library 0.0052 ( 0.1%) 0.0003 ( 0.1%) 0.0055 ( 0.1%) 0.0055 ( 0.1%) Machine Late Instructions Cleanup Pass 0.0050 ( 0.1%) 0.0003 ( 0.1%) 0.0053 ( 0.1%) 0.0053 ( 0.1%) Peephole Optimizations 0.0051 ( 0.1%) 0.0001 ( 0.0%) 0.0052 ( 0.1%) 0.0052 ( 0.1%) Bundle Machine CFG Edges #2 0.0050 ( 0.1%) 0.0002 ( 0.0%) 0.0051 ( 0.1%) 0.0051 ( 0.1%) Interleaved Access Pass 0.0050 ( 0.1%) 0.0001 ( 0.0%) 0.0050 ( 0.1%) 0.0050 ( 0.1%) Machine Cycle Info Analysis 0.0044 ( 0.1%) 0.0004 ( 0.1%) 0.0048 ( 0.1%) 0.0048 ( 0.1%) X86 Execution Dependency Fix 0.0040 ( 0.1%) 0.0008 ( 0.1%) 0.0048 ( 0.1%) 0.0047 ( 0.1%) Scalar Evolution Analysis 0.0033 ( 0.1%) 0.0012 ( 0.2%) 0.0045 ( 0.1%) 0.0045 ( 0.1%) Live Range Shrink 0.0039 ( 0.1%) 0.0005 ( 0.1%) 0.0044 ( 0.1%) 0.0043 ( 0.1%) Constant Hoisting 0.0033 ( 0.1%) 0.0004 ( 0.1%) 0.0037 ( 0.1%) 0.0037 ( 0.1%) Remove dead machine instructions 0.0035 ( 0.1%) 0.0001 ( 0.0%) 0.0036 ( 0.1%) 0.0036 ( 0.1%) Expand large fp convert 0.0031 ( 0.1%) 0.0003 ( 0.0%) 0.0033 ( 0.1%) 0.0033 ( 0.1%) Live DEBUG_VALUE analysis 0.0031 ( 0.1%) 0.0002 ( 0.0%) 0.0033 ( 0.1%) 0.0033 ( 0.1%) Remove unreachable blocks from the CFG 0.0028 ( 0.0%) 0.0001 ( 0.0%) 0.0029 ( 0.0%) 0.0029 ( 0.0%) Expand vector predication intrinsics 0.0026 ( 0.0%) 0.0002 ( 0.0%) 0.0028 ( 0.0%) 0.0028 ( 0.0%) Canonicalize natural loops 0.0026 ( 0.0%) 0.0002 ( 0.0%) 0.0028 ( 0.0%) 0.0028 ( 0.0%) Remove dead machine instructions #2 0.0026 ( 0.0%) 0.0001 ( 0.0%) 0.0027 ( 0.0%) 0.0027 ( 0.0%) Scalarize Masked Memory Intrinsics 0.0025 ( 0.0%) 0.0002 ( 0.0%) 0.0027 ( 0.0%) 0.0027 ( 0.0%) BreakFalseDeps 0.0024 ( 0.0%) 0.0002 ( 0.0%) 0.0026 ( 0.0%) 0.0026 ( 0.0%) X86 LEA Optimize 0.0023 ( 0.0%) 0.0002 ( 0.0%) 0.0024 ( 0.0%) 0.0024 ( 0.0%) Expand Atomic instructions 0.0018 ( 0.0%) 0.0002 ( 0.0%) 0.0021 ( 0.0%) 0.0021 ( 0.0%) Shrink Wrapping analysis 0.0019 ( 0.0%) 0.0001 ( 0.0%) 0.0020 ( 0.0%) 0.0020 ( 0.0%) Expand reduction intrinsics 0.0017 ( 0.0%) 0.0001 ( 0.0%) 0.0018 ( 0.0%) 0.0018 ( 0.0%) PostRA Machine Sink 0.0014 ( 0.0%) 0.0001 ( 0.0%) 0.0015 ( 0.0%) 0.0017 ( 0.0%) X86 cmov Conversion 0.0016 ( 0.0%) 0.0001 ( 0.0%) 0.0016 ( 0.0%) 0.0016 ( 0.0%) Bundle Machine CFG Edges 0.0015 ( 0.0%) 0.0001 ( 0.0%) 0.0016 ( 0.0%) 0.0016 ( 0.0%) Greedy Register Allocator 0.0013 ( 0.0%) 0.0003 ( 0.0%) 0.0016 ( 0.0%) 0.0016 ( 0.0%) Expand memcmp() to load/stores 0.0014 ( 0.0%) 0.0001 ( 0.0%) 0.0015 ( 0.0%) 0.0015 ( 0.0%) Early Tail Duplication 0.0013 ( 0.0%) 0.0001 ( 0.0%) 0.0014 ( 0.0%) 0.0014 ( 0.0%) Early Machine Loop Invariant Code Motion 0.0012 ( 0.0%) 0.0002 ( 0.0%) 0.0014 ( 0.0%) 0.0014 ( 0.0%) Exception handling preparation 0.0011 ( 0.0%) 0.0001 ( 0.0%) 0.0012 ( 0.0%) 0.0013 ( 0.0%) Tail Duplication 0.0010 ( 0.0%) 0.0001 ( 0.0%) 0.0011 ( 0.0%) 0.0011 ( 0.0%) Spill Code Placement Analysis 0.0010 ( 0.0%) 0.0001 ( 0.0%) 0.0011 ( 0.0%) 0.0011 ( 0.0%) X86 LEA Fixup 0.0009 ( 0.0%) 0.0001 ( 0.0%) 0.0010 ( 0.0%) 0.0010 ( 0.0%) X86 Optimize Call Frame 0.0008 ( 0.0%) 0.0002 ( 0.0%) 0.0010 ( 0.0%) 0.0010 ( 0.0%) Partially inline calls to library functions 0.0008 ( 0.0%) 0.0001 ( 0.0%) 0.0009 ( 0.0%) 0.0009 ( 0.0%) Live Register Matrix 0.0007 ( 0.0%) 0.0001 ( 0.0%) 0.0008 ( 0.0%) 0.0008 ( 0.0%) X86 Partial Reduction 0.0007 ( 0.0%) 0.0001 ( 0.0%) 0.0008 ( 0.0%) 0.0008 ( 0.0%) Optimize machine instruction PHIs 0.0007 ( 0.0%) 0.0001 ( 0.0%) 0.0007 ( 0.0%) 0.0007 ( 0.0%) X86 Avoid Store Forwarding Blocks 0.0006 ( 0.0%) 0.0001 ( 0.0%) 0.0006 ( 0.0%) 0.0006 ( 0.0%) Live Stack Slot Analysis 0.0006 ( 0.0%) 0.0001 ( 0.0%) 0.0006 ( 0.0%) 0.0006 ( 0.0%) Virtual Register Map 0.0006 ( 0.0%) 0.0001 ( 0.0%) 0.0006 ( 0.0%) 0.0006 ( 0.0%) Prepare callbr 0.0005 ( 0.0%) 0.0001 ( 0.0%) 0.0006 ( 0.0%) 0.0006 ( 0.0%) Machine Loop Invariant Code Motion 0.0004 ( 0.0%) 0.0001 ( 0.0%) 0.0005 ( 0.0%) 0.0005 ( 0.0%) Machine Trace Metrics 0.0003 ( 0.0%) 0.0002 ( 0.0%) 0.0005 ( 0.0%) 0.0005 ( 0.0%) Merge contiguous icmps into a memcmp 0.0002 ( 0.0%) 0.0002 ( 0.0%) 0.0004 ( 0.0%) 0.0003 ( 0.0%) Function Alias Analysis Results #2 0.0002 ( 0.0%) 0.0001 ( 0.0%) 0.0003 ( 0.0%) 0.0003 ( 0.0%) Function Alias Analysis Results #3 0.0003 ( 0.0%) 0.0000 ( 0.0%) 0.0003 ( 0.0%) 0.0003 ( 0.0%) Basic Alias Analysis (stateless AA impl) 0.0002 ( 0.0%) 0.0001 ( 0.0%) 0.0003 ( 0.0%) 0.0003 ( 0.0%) Post RA top-down list latency scheduler 0.0002 ( 0.0%) 0.0001 ( 0.0%) 0.0003 ( 0.0%) 0.0003 ( 0.0%) Insert stack protectors 0.0002 ( 0.0%) 0.0001 ( 0.0%) 0.0003 ( 0.0%) 0.0003 ( 0.0%) Insert KCFI indirect call checks 0.0002 ( 0.0%) 0.0001 ( 0.0%) 0.0002 ( 0.0%) 0.0002 ( 0.0%) Assignment Tracking Analysis 0.0001 ( 0.0%) 0.0001 ( 0.0%) 0.0002 ( 0.0%) 0.0002 ( 0.0%) Basic Alias Analysis (stateless AA impl) #2 0.0001 ( 0.0%) 0.0001 ( 0.0%) 0.0002 ( 0.0%) 0.0002 ( 0.0%) Basic Alias Analysis (stateless AA impl) #4 0.0002 ( 0.0%) 0.0001 ( 0.0%) 0.0002 ( 0.0%) 0.0002 ( 0.0%) X86 Indirect Branch Tracking 0.0001 ( 0.0%) 0.0001 ( 0.0%) 0.0002 ( 0.0%) 0.0002 ( 0.0%) Lazy Branch Probability Analysis 0.0002 ( 0.0%) 0.0001 ( 0.0%) 0.0002 ( 0.0%) 0.0002 ( 0.0%) Argument Stack Rebase 0.0001 ( 0.0%) 0.0001 ( 0.0%) 0.0002 ( 0.0%) 0.0002 ( 0.0%) Expand indirectbr instructions 0.0002 ( 0.0%) 0.0000 ( 0.0%) 0.0002 ( 0.0%) 0.0002 ( 0.0%) Function Alias Analysis Results 0.0001 ( 0.0%) 0.0001 ( 0.0%) 0.0002 ( 0.0%) 0.0002 ( 0.0%) Insert fentry calls 0.0001 ( 0.0%) 0.0001 ( 0.0%) 0.0002 ( 0.0%) 0.0002 ( 0.0%) Lazy Branch Probability Analysis #2 0.0001 ( 0.0%) 0.0001 ( 0.0%) 0.0002 ( 0.0%) 0.0002 ( 0.0%) Local Dynamic TLS Access Clean-up 0.0001 ( 0.0%) 0.0001 ( 0.0%) 0.0002 ( 0.0%) 0.0002 ( 0.0%) Unpack machine instruction bundles 0.0001 ( 0.0%) 0.0001 ( 0.0%) 0.0002 ( 0.0%) 0.0002 ( 0.0%) Basic Alias Analysis (stateless AA impl) #3 0.0001 ( 0.0%) 0.0001 ( 0.0%) 0.0002 ( 0.0%) 0.0002 ( 0.0%) Insert XRay ops 0.0001 ( 0.0%) 0.0001 ( 0.0%) 0.0002 ( 0.0%) 0.0002 ( 0.0%) X86 Atom pad short functions 0.0002 ( 0.0%) 0.0001 ( 0.0%) 0.0002 ( 0.0%) 0.0002 ( 0.0%) Early If-Conversion 0.0001 ( 0.0%) 0.0001 ( 0.0%) 0.0002 ( 0.0%) 0.0002 ( 0.0%) Stack Frame Layout Analysis 0.0001 ( 0.0%) 0.0001 ( 0.0%) 0.0002 ( 0.0%) 0.0002 ( 0.0%) Analyze Machine Code For Garbage Collection 0.0001 ( 0.0%) 0.0000 ( 0.0%) 0.0002 ( 0.0%) 0.0002 ( 0.0%) Tile Register Configure 0.0002 ( 0.0%) 0.0001 ( 0.0%) 0.0002 ( 0.0%) 0.0002 ( 0.0%) Local Stack Slot Allocation 0.0001 ( 0.0%) 0.0001 ( 0.0%) 0.0002 ( 0.0%) 0.0002 ( 0.0%) X86 Speculative Execution Side Effect Suppression 0.0001 ( 0.0%) 0.0001 ( 0.0%) 0.0002 ( 0.0%) 0.0002 ( 0.0%) X86 FP Stackifier 0.0001 ( 0.0%) 0.0001 ( 0.0%) 0.0002 ( 0.0%) 0.0002 ( 0.0%) Implement the 'patchable-function' attribute 0.0001 ( 0.0%) 0.0001 ( 0.0%) 0.0002 ( 0.0%) 0.0002 ( 0.0%) Machine Optimization Remark Emitter #2 0.0001 ( 0.0%) 0.0001 ( 0.0%) 0.0002 ( 0.0%) 0.0002 ( 0.0%) TLS Variable Hoist 0.0001 ( 0.0%) 0.0001 ( 0.0%) 0.0002 ( 0.0%) 0.0002 ( 0.0%) X86 Domain Reassignment Pass 0.0001 ( 0.0%) 0.0001 ( 0.0%) 0.0002 ( 0.0%) 0.0002 ( 0.0%) Lazy Machine Block Frequency Analysis #4 0.0001 ( 0.0%) 0.0001 ( 0.0%) 0.0002 ( 0.0%) 0.0002 ( 0.0%) Remove Redundant DEBUG_VALUE analysis 0.0002 ( 0.0%) 0.0000 ( 0.0%) 0.0002 ( 0.0%) 0.0002 ( 0.0%) Rename Disconnected Subregister Components 0.0001 ( 0.0%) 0.0001 ( 0.0%) 0.0002 ( 0.0%) 0.0002 ( 0.0%) Detect Dead Lanes 0.0001 ( 0.0%) 0.0001 ( 0.0%) 0.0002 ( 0.0%) 0.0002 ( 0.0%) Machine Optimization Remark Emitter 0.0001 ( 0.0%) 0.0001 ( 0.0%) 0.0002 ( 0.0%) 0.0002 ( 0.0%) Register Allocation Pass Scoring 0.0001 ( 0.0%) 0.0001 ( 0.0%) 0.0002 ( 0.0%) 0.0002 ( 0.0%) X86 Load Value Injection (LVI) Ret-Hardening 0.0001 ( 0.0%) 0.0001 ( 0.0%) 0.0002 ( 0.0%) 0.0002 ( 0.0%) X86 Load Value Injection (LVI) Load Hardening 0.0001 ( 0.0%) 0.0001 ( 0.0%) 0.0002 ( 0.0%) 0.0002 ( 0.0%) Fixup Statepoint Caller Saved 0.0001 ( 0.0%) 0.0001 ( 0.0%) 0.0002 ( 0.0%) 0.0002 ( 0.0%) Compressing EVEX instrs to VEX encoding when possible 0.0001 ( 0.0%) 0.0001 ( 0.0%) 0.0002 ( 0.0%) 0.0002 ( 0.0%) X86 vzeroupper inserter 0.0001 ( 0.0%) 0.0001 ( 0.0%) 0.0002 ( 0.0%) 0.0002 ( 0.0%) Lazy Machine Block Frequency Analysis 0.0001 ( 0.0%) 0.0000 ( 0.0%) 0.0002 ( 0.0%) 0.0002 ( 0.0%) X86 PIC Global Base Reg Initialization 0.0001 ( 0.0%) 0.0001 ( 0.0%) 0.0002 ( 0.0%) 0.0002 ( 0.0%) Contiguously Lay Out Funclets 0.0001 ( 0.0%) 0.0001 ( 0.0%) 0.0002 ( 0.0%) 0.0002 ( 0.0%) Lazy Machine Block Frequency Analysis #6 0.0001 ( 0.0%) 0.0001 ( 0.0%) 0.0002 ( 0.0%) 0.0002 ( 0.0%) X86 Return Thunks 0.0001 ( 0.0%) 0.0000 ( 0.0%) 0.0002 ( 0.0%) 0.0002 ( 0.0%) X86 DynAlloca Expander 0.0001 ( 0.0%) 0.0001 ( 0.0%) 0.0002 ( 0.0%) 0.0002 ( 0.0%) Machine Optimization Remark Emitter #3 0.0001 ( 0.0%) 0.0001 ( 0.0%) 0.0002 ( 0.0%) 0.0002 ( 0.0%) Lazy Machine Block Frequency Analysis #8 0.0001 ( 0.0%) 0.0001 ( 0.0%) 0.0002 ( 0.0%) 0.0002 ( 0.0%) Lazy Machine Block Frequency Analysis #7 0.0001 ( 0.0%) 0.0001 ( 0.0%) 0.0002 ( 0.0%) 0.0002 ( 0.0%) X86 Indirect Thunks 0.0001 ( 0.0%) 0.0001 ( 0.0%) 0.0002 ( 0.0%) 0.0002 ( 0.0%) X86 insert wait instruction 0.0001 ( 0.0%) 0.0001 ( 0.0%) 0.0002 ( 0.0%) 0.0002 ( 0.0%) StackMap Liveness Analysis 0.0001 ( 0.0%) 0.0001 ( 0.0%) 0.0002 ( 0.0%) 0.0002 ( 0.0%) X86 speculative load hardening 0.0001 ( 0.0%) 0.0001 ( 0.0%) 0.0002 ( 0.0%) 0.0002 ( 0.0%) X86 Discriminate Memory Operands 0.0001 ( 0.0%) 0.0001 ( 0.0%) 0.0002 ( 0.0%) 0.0002 ( 0.0%) Machine Optimization Remark Emitter #4 0.0001 ( 0.0%) 0.0001 ( 0.0%) 0.0002 ( 0.0%) 0.0002 ( 0.0%) Lazy Block Frequency Analysis #2 0.0001 ( 0.0%) 0.0001 ( 0.0%) 0.0002 ( 0.0%) 0.0002 ( 0.0%) Lazy Machine Block Frequency Analysis #5 0.0001 ( 0.0%) 0.0001 ( 0.0%) 0.0002 ( 0.0%) 0.0002 ( 0.0%) Machine Sanitizer Binary Metadata 0.0001 ( 0.0%) 0.0001 ( 0.0%) 0.0002 ( 0.0%) 0.0002 ( 0.0%) X86 Insert Cache Prefetches 0.0001 ( 0.0%) 0.0001 ( 0.0%) 0.0002 ( 0.0%) 0.0002 ( 0.0%) Pseudo Probe Inserter 0.0001 ( 0.0%) 0.0001 ( 0.0%) 0.0002 ( 0.0%) 0.0002 ( 0.0%) Lower Garbage Collection Instructions 0.0001 ( 0.0%) 0.0001 ( 0.0%) 0.0002 ( 0.0%) 0.0002 ( 0.0%) Safe Stack instrumentation pass 0.0001 ( 0.0%) 0.0001 ( 0.0%) 0.0002 ( 0.0%) 0.0002 ( 0.0%) Lazy Machine Block Frequency Analysis #3 0.0001 ( 0.0%) 0.0001 ( 0.0%) 0.0002 ( 0.0%) 0.0002 ( 0.0%) Lazy Block Frequency Analysis 0.0001 ( 0.0%) 0.0001 ( 0.0%) 0.0002 ( 0.0%) 0.0002 ( 0.0%) Lazy Machine Block Frequency Analysis #10 0.0001 ( 0.0%) 0.0001 ( 0.0%) 0.0002 ( 0.0%) 0.0002 ( 0.0%) Lower AMX intrinsics 0.0001 ( 0.0%) 0.0001 ( 0.0%) 0.0001 ( 0.0%) 0.0002 ( 0.0%) Shadow Stack GC Lowering 0.0001 ( 0.0%) 0.0001 ( 0.0%) 0.0002 ( 0.0%) 0.0002 ( 0.0%) Lazy Machine Block Frequency Analysis #9 0.0001 ( 0.0%) 0.0000 ( 0.0%) 0.0002 ( 0.0%) 0.0002 ( 0.0%) Lazy Machine Block Frequency Analysis #2 0.0001 ( 0.0%) 0.0000 ( 0.0%) 0.0001 ( 0.0%) 0.0001 ( 0.0%) Pre-ISel Intrinsic Lowering 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Assumption Cache Tracker 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Target Library Information 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Type-Based Alias Analysis 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Target Transform Information 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Machine Module Information 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Create Garbage Collector Module Metadata 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Profile summary info 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Scoped NoAlias Alias Analysis 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Default Regalloc Eviction Advisor 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Machine Branch Probability Analysis 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Target Pass Configuration 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Default Regalloc Priority Advisor 5.8399 (100.0%) 0.6144 (100.0%) 6.4543 (100.0%) 6.4786 (100.0%) Total ===-------------------------------------------------------------------------=== DWARF Emission ===-------------------------------------------------------------------------=== Total Execution Time: 0.2285 seconds (0.2346 wall clock) ---User Time--- --System Time-- --User+System-- ---Wall Time--- --- Name --- 0.0924 (100.0%) 0.1360 (100.0%) 0.2285 (100.0%) 0.2346 (100.0%) DWARF Exception Writer 0.0924 (100.0%) 0.1360 (100.0%) 0.2285 (100.0%) 0.2346 (100.0%) Total ===-------------------------------------------------------------------------=== Clang front-end time report ===-------------------------------------------------------------------------=== Total Execution Time: 46.4602 seconds (46.5836 wall clock) ---User Time--- --System Time-- --User+System-- ---Wall Time--- --- Name --- 44.6971 (100.0%) 1.7631 (100.0%) 46.4602 (100.0%) 46.5836 (100.0%) Clang front-end timer 44.6971 (100.0%) 1.7631 (100.0%) 46.4602 (100.0%) 46.5836 (100.0%) Total ```

Then for clang++-18 (bad):

``` clang++-18 -ftime-report -O3 -std=c++14 -I /usr/local/share/yosys/include/backends/cxxrtl/runtime -I build-tb tb.cpp -o tb ===-------------------------------------------------------------------------=== Pass execution timing report ===-------------------------------------------------------------------------=== Total Execution Time: 1037.8580 seconds (1038.0082 wall clock) ---User Time--- --System Time-- --User+System-- ---Wall Time--- --- Name --- 637.4201 ( 61.5%) 0.2667 ( 15.0%) 637.6868 ( 61.4%) 637.7159 ( 61.4%) LoopRotatePass 337.4477 ( 32.6%) 0.9667 ( 54.3%) 338.4143 ( 32.6%) 338.4955 ( 32.6%) SROAPass 12.2729 ( 1.2%) 0.0244 ( 1.4%) 12.2973 ( 1.2%) 12.2991 ( 1.2%) SimplifyCFGPass 9.7841 ( 0.9%) 0.0256 ( 1.4%) 9.8096 ( 0.9%) 9.8100 ( 0.9%) GVNPass 6.7785 ( 0.7%) 0.0069 ( 0.4%) 6.7854 ( 0.7%) 6.7856 ( 0.7%) JumpThreadingPass 6.3789 ( 0.6%) 0.0742 ( 4.2%) 6.4531 ( 0.6%) 6.4627 ( 0.6%) InstCombinePass 6.1736 ( 0.6%) 0.0178 ( 1.0%) 6.1914 ( 0.6%) 6.1887 ( 0.6%) LICMPass 3.7898 ( 0.4%) 0.0025 ( 0.1%) 3.7923 ( 0.4%) 3.7924 ( 0.4%) LoopSimplifyPass 3.6420 ( 0.4%) 0.0674 ( 3.8%) 3.7095 ( 0.4%) 3.7100 ( 0.4%) InlinerPass 2.5808 ( 0.2%) 0.1399 ( 7.9%) 2.7207 ( 0.3%) 2.7213 ( 0.3%) AlwaysInlinerPass 2.1329 ( 0.2%) 0.0062 ( 0.3%) 2.1391 ( 0.2%) 2.1394 ( 0.2%) CorrelatedValuePropagationPass 1.2868 ( 0.1%) 0.0022 ( 0.1%) 1.2890 ( 0.1%) 1.2886 ( 0.1%) IndVarSimplifyPass 1.2774 ( 0.1%) 0.0012 ( 0.1%) 1.2786 ( 0.1%) 1.2785 ( 0.1%) LoopDeletionPass 0.8207 ( 0.1%) 0.0288 ( 1.6%) 0.8495 ( 0.1%) 0.8575 ( 0.1%) EarlyCSEPass 0.6112 ( 0.1%) 0.0650 ( 3.6%) 0.6762 ( 0.1%) 0.6764 ( 0.1%) RequireAnalysisPass> 0.5341 ( 0.1%) 0.0070 ( 0.4%) 0.5411 ( 0.1%) 0.5488 ( 0.1%) IPSCCPPass 0.5094 ( 0.0%) 0.0053 ( 0.3%) 0.5148 ( 0.0%) 0.5145 ( 0.0%) LoopFullUnrollPass 0.2384 ( 0.0%) 0.0063 ( 0.4%) 0.2446 ( 0.0%) 0.2447 ( 0.0%) ConstraintEliminationPass 0.2438 ( 0.0%) 0.0000 ( 0.0%) 0.2438 ( 0.0%) 0.2441 ( 0.0%) GlobalOptPass 0.2274 ( 0.0%) 0.0021 ( 0.1%) 0.2295 ( 0.0%) 0.2294 ( 0.0%) ReassociatePass 0.2216 ( 0.0%) 0.0000 ( 0.0%) 0.2216 ( 0.0%) 0.2216 ( 0.0%) CalledValuePropagationPass 0.2161 ( 0.0%) 0.0031 ( 0.2%) 0.2192 ( 0.0%) 0.2187 ( 0.0%) SimpleLoopUnswitchPass 0.1904 ( 0.0%) 0.0106 ( 0.6%) 0.2010 ( 0.0%) 0.2012 ( 0.0%) PostOrderFunctionAttrsPass 0.1834 ( 0.0%) 0.0015 ( 0.1%) 0.1849 ( 0.0%) 0.1921 ( 0.0%) SLPVectorizerPass 0.1785 ( 0.0%) 0.0026 ( 0.1%) 0.1811 ( 0.0%) 0.1808 ( 0.0%) LoopInstSimplifyPass 0.1023 ( 0.0%) 0.0020 ( 0.1%) 0.1044 ( 0.0%) 0.1042 ( 0.0%) LoopIdiomRecognizePass 0.1024 ( 0.0%) 0.0015 ( 0.1%) 0.1039 ( 0.0%) 0.1039 ( 0.0%) TailCallElimPass 0.0928 ( 0.0%) 0.0016 ( 0.1%) 0.0944 ( 0.0%) 0.0937 ( 0.0%) LoopSimplifyCFGPass 0.0811 ( 0.0%) 0.0010 ( 0.1%) 0.0821 ( 0.0%) 0.0821 ( 0.0%) AggressiveInstCombinePass 0.0717 ( 0.0%) 0.0019 ( 0.1%) 0.0737 ( 0.0%) 0.0737 ( 0.0%) LCSSAPass 0.0654 ( 0.0%) 0.0021 ( 0.1%) 0.0675 ( 0.0%) 0.0676 ( 0.0%) CallSiteSplittingPass 0.0640 ( 0.0%) 0.0025 ( 0.1%) 0.0666 ( 0.0%) 0.0666 ( 0.0%) SCCPPass 0.0511 ( 0.0%) 0.0041 ( 0.2%) 0.0552 ( 0.0%) 0.0552 ( 0.0%) DSEPass 0.0327 ( 0.0%) 0.0051 ( 0.3%) 0.0378 ( 0.0%) 0.0379 ( 0.0%) ADCEPass 0.0338 ( 0.0%) 0.0035 ( 0.2%) 0.0373 ( 0.0%) 0.0374 ( 0.0%) LowerExpectIntrinsicPass 0.0314 ( 0.0%) 0.0026 ( 0.1%) 0.0340 ( 0.0%) 0.0340 ( 0.0%) BDCEPass 0.0265 ( 0.0%) 0.0000 ( 0.0%) 0.0265 ( 0.0%) 0.0265 ( 0.0%) AssignmentTrackingPass 0.0226 ( 0.0%) 0.0033 ( 0.2%) 0.0259 ( 0.0%) 0.0260 ( 0.0%) PromotePass 0.0204 ( 0.0%) 0.0024 ( 0.1%) 0.0228 ( 0.0%) 0.0227 ( 0.0%) MemCpyOptPass 0.0181 ( 0.0%) 0.0000 ( 0.0%) 0.0181 ( 0.0%) 0.0181 ( 0.0%) RecomputeGlobalsAAPass 0.0157 ( 0.0%) 0.0006 ( 0.0%) 0.0164 ( 0.0%) 0.0164 ( 0.0%) LibCallsShrinkWrapPass 0.0148 ( 0.0%) 0.0010 ( 0.1%) 0.0158 ( 0.0%) 0.0157 ( 0.0%) VectorCombinePass 0.0148 ( 0.0%) 0.0002 ( 0.0%) 0.0151 ( 0.0%) 0.0151 ( 0.0%) InstSimplifyPass 0.0096 ( 0.0%) 0.0000 ( 0.0%) 0.0096 ( 0.0%) 0.0096 ( 0.0%) ReversePostOrderFunctionAttrsPass 0.0087 ( 0.0%) 0.0000 ( 0.0%) 0.0087 ( 0.0%) 0.0087 ( 0.0%) GlobalDCEPass 0.0076 ( 0.0%) 0.0004 ( 0.0%) 0.0080 ( 0.0%) 0.0080 ( 0.0%) Float2IntPass 0.0067 ( 0.0%) 0.0004 ( 0.0%) 0.0072 ( 0.0%) 0.0072 ( 0.0%) LoopUnrollPass 0.0000 ( 0.0%) 0.0005 ( 0.0%) 0.0005 ( 0.0%) 0.0070 ( 0.0%) CGProfilePass 0.0062 ( 0.0%) 0.0003 ( 0.0%) 0.0065 ( 0.0%) 0.0065 ( 0.0%) InferAlignmentPass 0.0058 ( 0.0%) 0.0000 ( 0.0%) 0.0058 ( 0.0%) 0.0058 ( 0.0%) DeadArgumentEliminationPass 0.0039 ( 0.0%) 0.0017 ( 0.1%) 0.0055 ( 0.0%) 0.0056 ( 0.0%) RequireAnalysisPass> 0.0029 ( 0.0%) 0.0001 ( 0.0%) 0.0030 ( 0.0%) 0.0030 ( 0.0%) LowerConstantIntrinsicsPass 0.0019 ( 0.0%) 0.0001 ( 0.0%) 0.0020 ( 0.0%) 0.0028 ( 0.0%) LoopVectorizePass 0.0021 ( 0.0%) 0.0006 ( 0.0%) 0.0027 ( 0.0%) 0.0028 ( 0.0%) MergedLoadStoreMotionPass 0.0018 ( 0.0%) 0.0008 ( 0.0%) 0.0026 ( 0.0%) 0.0026 ( 0.0%) OpenMPOptCGSCCPass 0.0018 ( 0.0%) 0.0007 ( 0.0%) 0.0025 ( 0.0%) 0.0025 ( 0.0%) LoopDistributePass 0.0016 ( 0.0%) 0.0007 ( 0.0%) 0.0023 ( 0.0%) 0.0023 ( 0.0%) CoroSplitPass 0.0016 ( 0.0%) 0.0007 ( 0.0%) 0.0023 ( 0.0%) 0.0023 ( 0.0%) CoroElidePass 0.0015 ( 0.0%) 0.0006 ( 0.0%) 0.0021 ( 0.0%) 0.0021 ( 0.0%) SpeculativeExecutionPass 0.0015 ( 0.0%) 0.0006 ( 0.0%) 0.0021 ( 0.0%) 0.0021 ( 0.0%) ArgumentPromotionPass 0.0015 ( 0.0%) 0.0006 ( 0.0%) 0.0021 ( 0.0%) 0.0021 ( 0.0%) MoveAutoInitPass 0.0019 ( 0.0%) 0.0000 ( 0.0%) 0.0019 ( 0.0%) 0.0019 ( 0.0%) EliminateAvailableExternallyPass 0.0001 ( 0.0%) 0.0018 ( 0.1%) 0.0018 ( 0.0%) 0.0018 ( 0.0%) InvalidateAnalysisPass 0.0011 ( 0.0%) 0.0000 ( 0.0%) 0.0012 ( 0.0%) 0.0012 ( 0.0%) DivRemPairsPass 0.0010 ( 0.0%) 0.0001 ( 0.0%) 0.0011 ( 0.0%) 0.0011 ( 0.0%) LoopLoadEliminationPass 0.0010 ( 0.0%) 0.0000 ( 0.0%) 0.0010 ( 0.0%) 0.0010 ( 0.0%) ConstantMergePass 0.0009 ( 0.0%) 0.0001 ( 0.0%) 0.0010 ( 0.0%) 0.0010 ( 0.0%) InjectTLIMappings 0.0001 ( 0.0%) 0.0007 ( 0.0%) 0.0008 ( 0.0%) 0.0009 ( 0.0%) InferFunctionAttrsPass 0.0001 ( 0.0%) 0.0003 ( 0.0%) 0.0005 ( 0.0%) 0.0005 ( 0.0%) AnnotationRemarksPass 0.0004 ( 0.0%) 0.0001 ( 0.0%) 0.0005 ( 0.0%) 0.0005 ( 0.0%) ControlHeightReductionPass 0.0004 ( 0.0%) 0.0000 ( 0.0%) 0.0004 ( 0.0%) 0.0004 ( 0.0%) InvalidateAnalysisPass 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0002 ( 0.0%) CoroEarlyPass 0.0002 ( 0.0%) 0.0000 ( 0.0%) 0.0002 ( 0.0%) 0.0002 ( 0.0%) WarnMissedTransformationsPass 0.0001 ( 0.0%) 0.0000 ( 0.0%) 0.0001 ( 0.0%) 0.0002 ( 0.0%) AlignmentFromAssumptionsPass 0.0001 ( 0.0%) 0.0000 ( 0.0%) 0.0001 ( 0.0%) 0.0001 ( 0.0%) LoopSinkPass 0.0000 ( 0.0%) 0.0001 ( 0.0%) 0.0001 ( 0.0%) 0.0001 ( 0.0%) RelLookupTableConverterPass 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) CoroCleanupPass 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) OpenMPOptPass 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Annotation2MetadataPass 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) RequireAnalysisPass> 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) ForceFunctionAttrsPass 1036.0762 (100.0%) 1.7818 (100.0%) 1037.8580 (100.0%) 1038.0082 (100.0%) Total ===-------------------------------------------------------------------------=== Analysis execution timing report ===-------------------------------------------------------------------------=== Total Execution Time: 3.0295 seconds (3.0389 wall clock) ---User Time--- --System Time-- --User+System-- ---Wall Time--- --- Name --- 0.4495 ( 15.6%) 0.0162 ( 10.5%) 0.4657 ( 15.4%) 0.4655 ( 15.3%) DominatorTreeAnalysis 0.3992 ( 13.9%) 0.0650 ( 41.9%) 0.4642 ( 15.3%) 0.4642 ( 15.3%) CallGraphAnalysis 0.4461 ( 15.5%) 0.0046 ( 3.0%) 0.4507 ( 14.9%) 0.4508 ( 14.8%) BranchProbabilityAnalysis 0.4174 ( 14.5%) 0.0041 ( 2.6%) 0.4215 ( 13.9%) 0.4223 ( 13.9%) MemorySSAAnalysis 0.3291 ( 11.5%) 0.0035 ( 2.3%) 0.3327 ( 11.0%) 0.3333 ( 11.0%) BlockFrequencyAnalysis 0.2375 ( 8.3%) 0.0046 ( 3.0%) 0.2421 ( 8.0%) 0.2486 ( 8.2%) LoopAnalysis 0.2246 ( 7.8%) 0.0000 ( 0.0%) 0.2246 ( 7.4%) 0.2247 ( 7.4%) GlobalsAA 0.2003 ( 7.0%) 0.0054 ( 3.5%) 0.2057 ( 6.8%) 0.2053 ( 6.8%) PostDominatorTreeAnalysis 0.0426 ( 1.5%) 0.0166 ( 10.7%) 0.0592 ( 2.0%) 0.0592 ( 1.9%) AAManager 0.0503 ( 1.8%) 0.0008 ( 0.5%) 0.0511 ( 1.7%) 0.0514 ( 1.7%) OuterAnalysisManagerProxy 0.0121 ( 0.4%) 0.0043 ( 2.8%) 0.0164 ( 0.5%) 0.0161 ( 0.5%) BasicAA 0.0088 ( 0.3%) 0.0039 ( 2.5%) 0.0127 ( 0.4%) 0.0140 ( 0.5%) TargetIRAnalysis 0.0076 ( 0.3%) 0.0032 ( 2.0%) 0.0107 ( 0.4%) 0.0108 ( 0.4%) AssumptionAnalysis 0.0060 ( 0.2%) 0.0036 ( 2.3%) 0.0096 ( 0.3%) 0.0095 ( 0.3%) TargetLibraryAnalysis 0.0056 ( 0.2%) 0.0026 ( 1.7%) 0.0082 ( 0.3%) 0.0081 ( 0.3%) TypeBasedAA 0.0056 ( 0.2%) 0.0026 ( 1.7%) 0.0082 ( 0.3%) 0.0080 ( 0.3%) OuterAnalysisManagerProxy 0.0058 ( 0.2%) 0.0020 ( 1.3%) 0.0078 ( 0.3%) 0.0078 ( 0.3%) ScalarEvolutionAnalysis 0.0051 ( 0.2%) 0.0026 ( 1.7%) 0.0076 ( 0.3%) 0.0078 ( 0.3%) OptimizationRemarkEmitterAnalysis 0.0048 ( 0.2%) 0.0022 ( 1.4%) 0.0070 ( 0.2%) 0.0070 ( 0.2%) ScopedNoAliasAA 0.0041 ( 0.1%) 0.0019 ( 1.2%) 0.0060 ( 0.2%) 0.0060 ( 0.2%) FunctionAnalysisManagerCGSCCProxy 0.0031 ( 0.1%) 0.0013 ( 0.9%) 0.0044 ( 0.1%) 0.0044 ( 0.1%) LazyValueAnalysis 0.0024 ( 0.1%) 0.0013 ( 0.8%) 0.0037 ( 0.1%) 0.0039 ( 0.1%) LazyCallGraphAnalysis 0.0018 ( 0.1%) 0.0008 ( 0.5%) 0.0025 ( 0.1%) 0.0026 ( 0.1%) MemoryDependenceAnalysis 0.0018 ( 0.1%) 0.0008 ( 0.5%) 0.0026 ( 0.1%) 0.0026 ( 0.1%) OuterAnalysisManagerProxy 0.0016 ( 0.1%) 0.0006 ( 0.4%) 0.0022 ( 0.1%) 0.0022 ( 0.1%) DemandedBitsAnalysis 0.0012 ( 0.0%) 0.0005 ( 0.3%) 0.0016 ( 0.1%) 0.0017 ( 0.1%) ShouldNotRunFunctionPassesAnalysis 0.0004 ( 0.0%) 0.0001 ( 0.1%) 0.0005 ( 0.0%) 0.0006 ( 0.0%) InnerAnalysisManagerProxy 0.0002 ( 0.0%) 0.0001 ( 0.0%) 0.0002 ( 0.0%) 0.0002 ( 0.0%) LoopAccessAnalysis 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) InnerAnalysisManagerProxy 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) ShouldRunExtraVectorPasses 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) ProfileSummaryAnalysis 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) InnerAnalysisManagerProxy 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) InlineAdvisorAnalysis 2.8743 (100.0%) 0.1552 (100.0%) 3.0295 (100.0%) 3.0389 (100.0%) Total ===-------------------------------------------------------------------------=== Miscellaneous Ungrouped Timers ===-------------------------------------------------------------------------=== ---User Time--- --System Time-- --User+System-- ---Wall Time--- --- Name --- 1042.9915 ( 99.8%) 2.5190 ( 96.5%) 1045.5105 ( 99.8%) 1045.6924 ( 99.8%) Code Generation Time 2.4966 ( 0.2%) 0.0913 ( 3.5%) 2.5879 ( 0.2%) 2.6090 ( 0.2%) LLVM IR Generation Time 1045.4881 (100.0%) 2.6103 (100.0%) 1048.0984 (100.0%) 1048.3014 (100.0%) Total ===-------------------------------------------------------------------------=== Register Allocation ===-------------------------------------------------------------------------=== Total Execution Time: 0.4214 seconds (0.4211 wall clock) ---User Time--- --System Time-- --User+System-- ---Wall Time--- --- Name --- 0.2884 ( 71.4%) 0.0066 ( 37.6%) 0.2950 ( 70.0%) 0.2948 ( 70.0%) Global Splitting 0.0420 ( 10.4%) 0.0028 ( 16.1%) 0.0449 ( 10.6%) 0.0446 ( 10.6%) Spiller 0.0350 ( 8.7%) 0.0053 ( 30.2%) 0.0403 ( 9.6%) 0.0405 ( 9.6%) Evict 0.0286 ( 7.1%) 0.0028 ( 15.7%) 0.0314 ( 7.4%) 0.0313 ( 7.4%) Local Splitting 0.0097 ( 2.4%) 0.0001 ( 0.4%) 0.0098 ( 2.3%) 0.0098 ( 2.3%) Seed Live Regs 0.4037 (100.0%) 0.0177 (100.0%) 0.4214 (100.0%) 0.4211 (100.0%) Total ===-------------------------------------------------------------------------=== Instruction Selection and Scheduling ===-------------------------------------------------------------------------=== Total Execution Time: 0.7501 seconds (0.7515 wall clock) ---User Time--- --System Time-- --User+System-- ---Wall Time--- --- Name --- 0.1427 ( 22.6%) 0.0244 ( 20.4%) 0.1671 ( 22.3%) 0.1691 ( 22.5%) DAG Combining 1 0.0943 ( 14.9%) 0.0204 ( 17.1%) 0.1147 ( 15.3%) 0.1147 ( 15.3%) Instruction Selection 0.0908 ( 14.4%) 0.0185 ( 15.5%) 0.1093 ( 14.6%) 0.1089 ( 14.5%) Instruction Scheduling 0.0843 ( 13.4%) 0.0146 ( 12.2%) 0.0989 ( 13.2%) 0.0985 ( 13.1%) DAG Combining 2 0.0603 ( 9.6%) 0.0077 ( 6.5%) 0.0680 ( 9.1%) 0.0678 ( 9.0%) DAG Combining after legalize types 0.0545 ( 8.6%) 0.0117 ( 9.8%) 0.0662 ( 8.8%) 0.0655 ( 8.7%) Instruction Creation 0.0375 ( 6.0%) 0.0082 ( 6.9%) 0.0458 ( 6.1%) 0.0459 ( 6.1%) Type Legalization 0.0346 ( 5.5%) 0.0074 ( 6.2%) 0.0420 ( 5.6%) 0.0421 ( 5.6%) DAG Legalization 0.0124 ( 2.0%) 0.0030 ( 2.5%) 0.0154 ( 2.1%) 0.0154 ( 2.1%) Instruction Scheduling Cleanup 0.0110 ( 1.8%) 0.0031 ( 2.6%) 0.0142 ( 1.9%) 0.0144 ( 1.9%) Vector Legalization 0.0081 ( 1.3%) 0.0004 ( 0.3%) 0.0085 ( 1.1%) 0.0089 ( 1.2%) DAG Combining after legalize vectors 0.0001 ( 0.0%) 0.0001 ( 0.1%) 0.0002 ( 0.0%) 0.0002 ( 0.0%) Type Legalization 2 0.6306 (100.0%) 0.1195 (100.0%) 0.7501 (100.0%) 0.7515 (100.0%) Total ===-------------------------------------------------------------------------=== Pass execution timing report ===-------------------------------------------------------------------------=== Total Execution Time: 6.2252 seconds (6.2315 wall clock) ---User Time--- --System Time-- --User+System-- ---Wall Time--- --- Name --- 1.0293 ( 18.4%) 0.1977 ( 31.7%) 1.2270 ( 19.7%) 1.2315 ( 19.8%) X86 DAG->DAG Instruction Selection 0.8958 ( 16.0%) 0.0267 ( 4.3%) 0.9225 ( 14.8%) 0.9226 ( 14.8%) Greedy Register Allocator #2 0.6793 ( 12.1%) 0.0006 ( 0.1%) 0.6798 ( 10.9%) 0.6799 ( 10.9%) Register Coalescer 0.2166 ( 3.9%) 0.3337 ( 53.5%) 0.5503 ( 8.8%) 0.5503 ( 8.8%) X86 Assembly Printer 0.5188 ( 9.3%) 0.0215 ( 3.4%) 0.5403 ( 8.7%) 0.5399 ( 8.7%) Induction Variable Users 0.3012 ( 5.4%) 0.0007 ( 0.1%) 0.3019 ( 4.9%) 0.3020 ( 4.8%) Live Interval Analysis 0.1806 ( 3.2%) 0.0004 ( 0.1%) 0.1810 ( 2.9%) 0.1810 ( 2.9%) Machine code sinking 0.1666 ( 3.0%) 0.0012 ( 0.2%) 0.1678 ( 2.7%) 0.1677 ( 2.7%) Live Variable Analysis 0.1096 ( 2.0%) 0.0008 ( 0.1%) 0.1104 ( 1.8%) 0.1104 ( 1.8%) ReachingDefAnalysis 0.0854 ( 1.5%) 0.0021 ( 0.3%) 0.0875 ( 1.4%) 0.0876 ( 1.4%) Machine Instruction Scheduler 0.0645 ( 1.2%) 0.0003 ( 0.0%) 0.0648 ( 1.0%) 0.0648 ( 1.0%) Eliminate PHI nodes for register allocation 0.0615 ( 1.1%) 0.0004 ( 0.1%) 0.0619 ( 1.0%) 0.0619 ( 1.0%) Virtual Register Rewriter 0.0547 ( 1.0%) 0.0029 ( 0.5%) 0.0576 ( 0.9%) 0.0576 ( 0.9%) CodeGen Prepare 0.0428 ( 0.8%) 0.0003 ( 0.1%) 0.0431 ( 0.7%) 0.0440 ( 0.7%) Two-Address instruction pass 0.0428 ( 0.8%) 0.0005 ( 0.1%) 0.0433 ( 0.7%) 0.0433 ( 0.7%) Prologue/Epilogue Insertion & Frame Finalization 0.0397 ( 0.7%) 0.0003 ( 0.0%) 0.0399 ( 0.6%) 0.0399 ( 0.6%) Merge disjoint stack slots 0.0363 ( 0.6%) 0.0004 ( 0.1%) 0.0367 ( 0.6%) 0.0367 ( 0.6%) Control Flow Optimizer 0.0328 ( 0.6%) 0.0002 ( 0.0%) 0.0330 ( 0.5%) 0.0330 ( 0.5%) Machine InstCombiner 0.0323 ( 0.6%) 0.0002 ( 0.0%) 0.0324 ( 0.5%) 0.0324 ( 0.5%) Slot index numbering #2 0.0320 ( 0.6%) 0.0003 ( 0.1%) 0.0323 ( 0.5%) 0.0323 ( 0.5%) Branch Probability Analysis #2 0.0277 ( 0.5%) 0.0002 ( 0.0%) 0.0279 ( 0.4%) 0.0279 ( 0.4%) MachineDominator Tree Construction #9 0.0272 ( 0.5%) 0.0004 ( 0.1%) 0.0276 ( 0.4%) 0.0276 ( 0.4%) Branch Probability Basic Block Placement 0.0246 ( 0.4%) 0.0002 ( 0.0%) 0.0247 ( 0.4%) 0.0247 ( 0.4%) Slot index numbering 0.0235 ( 0.4%) 0.0007 ( 0.1%) 0.0242 ( 0.4%) 0.0242 ( 0.4%) Branch Probability Analysis 0.0232 ( 0.4%) 0.0007 ( 0.1%) 0.0239 ( 0.4%) 0.0239 ( 0.4%) Machine Common Subexpression Elimination 0.0211 ( 0.4%) 0.0008 ( 0.1%) 0.0219 ( 0.4%) 0.0219 ( 0.4%) Block Frequency Analysis 0.0208 ( 0.4%) 0.0001 ( 0.0%) 0.0208 ( 0.3%) 0.0208 ( 0.3%) Stack Slot Coloring 0.0195 ( 0.3%) 0.0002 ( 0.0%) 0.0196 ( 0.3%) 0.0196 ( 0.3%) Machine Dominance Frontier Construction 0.0186 ( 0.3%) 0.0001 ( 0.0%) 0.0187 ( 0.3%) 0.0187 ( 0.3%) MachineDominator Tree Construction #5 0.0184 ( 0.3%) 0.0002 ( 0.0%) 0.0186 ( 0.3%) 0.0186 ( 0.3%) MachinePostDominator Tree Construction #2 0.0175 ( 0.3%) 0.0010 ( 0.2%) 0.0186 ( 0.3%) 0.0185 ( 0.3%) MachineDominator Tree Construction 0.0158 ( 0.3%) 0.0017 ( 0.3%) 0.0175 ( 0.3%) 0.0175 ( 0.3%) Natural Loop Information 0.0172 ( 0.3%) 0.0002 ( 0.0%) 0.0174 ( 0.3%) 0.0174 ( 0.3%) Machine Block Frequency Analysis #3 0.0169 ( 0.3%) 0.0001 ( 0.0%) 0.0170 ( 0.3%) 0.0170 ( 0.3%) MachineDominator Tree Construction #2 0.0168 ( 0.3%) 0.0002 ( 0.0%) 0.0170 ( 0.3%) 0.0170 ( 0.3%) MachinePostDominator Tree Construction #3 0.0163 ( 0.3%) 0.0007 ( 0.1%) 0.0169 ( 0.3%) 0.0169 ( 0.3%) Post-Dominator Tree Construction 0.0164 ( 0.3%) 0.0004 ( 0.1%) 0.0168 ( 0.3%) 0.0168 ( 0.3%) Post-Dominator Tree Construction #2 0.0164 ( 0.3%) 0.0002 ( 0.0%) 0.0167 ( 0.3%) 0.0166 ( 0.3%) MachinePostDominator Tree Construction 0.0165 ( 0.3%) 0.0001 ( 0.0%) 0.0166 ( 0.3%) 0.0166 ( 0.3%) MachineDominator Tree Construction #8 0.0161 ( 0.3%) 0.0001 ( 0.0%) 0.0163 ( 0.3%) 0.0162 ( 0.3%) Machine Block Frequency Analysis #4 0.0160 ( 0.3%) 0.0002 ( 0.0%) 0.0161 ( 0.3%) 0.0161 ( 0.3%) MachineDominator Tree Construction #7 0.0154 ( 0.3%) 0.0002 ( 0.0%) 0.0156 ( 0.2%) 0.0156 ( 0.2%) Natural Loop Information #2 0.0153 ( 0.3%) 0.0001 ( 0.0%) 0.0154 ( 0.2%) 0.0154 ( 0.2%) MachineDominator Tree Construction #6 0.0148 ( 0.3%) 0.0004 ( 0.1%) 0.0152 ( 0.2%) 0.0152 ( 0.2%) Lower AMX type for load/store 0.0142 ( 0.3%) 0.0010 ( 0.2%) 0.0151 ( 0.2%) 0.0152 ( 0.2%) Free MachineFunction 0.0148 ( 0.3%) 0.0004 ( 0.1%) 0.0151 ( 0.2%) 0.0152 ( 0.2%) Dominator Tree Construction 0.0148 ( 0.3%) 0.0001 ( 0.0%) 0.0149 ( 0.2%) 0.0149 ( 0.2%) MachineDominator Tree Construction #3 0.0147 ( 0.3%) 0.0001 ( 0.0%) 0.0148 ( 0.2%) 0.0148 ( 0.2%) Natural Loop Information #4 0.0145 ( 0.3%) 0.0002 ( 0.0%) 0.0147 ( 0.2%) 0.0147 ( 0.2%) Dominator Tree Construction #3 0.0140 ( 0.2%) 0.0001 ( 0.0%) 0.0141 ( 0.2%) 0.0141 ( 0.2%) Tile Register Pre-configure 0.0135 ( 0.2%) 0.0003 ( 0.0%) 0.0138 ( 0.2%) 0.0138 ( 0.2%) ObjC ARC contraction 0.0123 ( 0.2%) 0.0007 ( 0.1%) 0.0130 ( 0.2%) 0.0130 ( 0.2%) Dominator Tree Construction #2 0.0129 ( 0.2%) 0.0002 ( 0.0%) 0.0130 ( 0.2%) 0.0130 ( 0.2%) Check CFA info and insert CFI instructions if needed 0.0129 ( 0.2%) 0.0001 ( 0.0%) 0.0130 ( 0.2%) 0.0130 ( 0.2%) Machine Block Frequency Analysis #5 0.0129 ( 0.2%) 0.0001 ( 0.0%) 0.0130 ( 0.2%) 0.0130 ( 0.2%) MachineDominator Tree Construction #4 0.0118 ( 0.2%) 0.0001 ( 0.0%) 0.0119 ( 0.2%) 0.0119 ( 0.2%) X86 EFLAGS copy lowering 0.0098 ( 0.2%) 0.0018 ( 0.3%) 0.0116 ( 0.2%) 0.0118 ( 0.2%) Loop Strength Reduction 0.0109 ( 0.2%) 0.0001 ( 0.0%) 0.0110 ( 0.2%) 0.0110 ( 0.2%) Machine Natural Loop Construction 0.0104 ( 0.2%) 0.0001 ( 0.0%) 0.0105 ( 0.2%) 0.0106 ( 0.2%) Machine Block Frequency Analysis #2 0.0100 ( 0.2%) 0.0002 ( 0.0%) 0.0102 ( 0.2%) 0.0102 ( 0.2%) Machine Block Frequency Analysis 0.0101 ( 0.2%) 0.0001 ( 0.0%) 0.0102 ( 0.2%) 0.0102 ( 0.2%) Machine Natural Loop Construction #4 0.0101 ( 0.2%) 0.0001 ( 0.0%) 0.0102 ( 0.2%) 0.0102 ( 0.2%) Machine Natural Loop Construction #3 0.0098 ( 0.2%) 0.0001 ( 0.0%) 0.0099 ( 0.2%) 0.0098 ( 0.2%) Machine Natural Loop Construction #2 0.0097 ( 0.2%) 0.0001 ( 0.0%) 0.0098 ( 0.2%) 0.0098 ( 0.2%) Natural Loop Information #5 0.0095 ( 0.2%) 0.0001 ( 0.0%) 0.0096 ( 0.2%) 0.0096 ( 0.2%) X86 Fixup SetCC 0.0092 ( 0.2%) 0.0001 ( 0.0%) 0.0093 ( 0.1%) 0.0093 ( 0.1%) Machine Natural Loop Construction #5 0.0091 ( 0.2%) 0.0002 ( 0.0%) 0.0093 ( 0.1%) 0.0093 ( 0.1%) Natural Loop Information #3 0.0091 ( 0.2%) 0.0001 ( 0.0%) 0.0093 ( 0.1%) 0.0092 ( 0.1%) Natural Loop Information #6 0.0085 ( 0.2%) 0.0003 ( 0.1%) 0.0088 ( 0.1%) 0.0089 ( 0.1%) Machine Copy Propagation Pass 0.0085 ( 0.2%) 0.0003 ( 0.0%) 0.0088 ( 0.1%) 0.0088 ( 0.1%) Lower constant intrinsics 0.0082 ( 0.1%) 0.0001 ( 0.0%) 0.0083 ( 0.1%) 0.0083 ( 0.1%) Finalize ISel and expand pseudo-instructions 0.0069 ( 0.1%) 0.0012 ( 0.2%) 0.0082 ( 0.1%) 0.0082 ( 0.1%) Expand large div/rem 0.0076 ( 0.1%) 0.0001 ( 0.0%) 0.0077 ( 0.1%) 0.0077 ( 0.1%) Debug Variable Analysis 0.0074 ( 0.1%) 0.0001 ( 0.0%) 0.0075 ( 0.1%) 0.0075 ( 0.1%) Process Implicit Definitions 0.0074 ( 0.1%) 0.0001 ( 0.0%) 0.0075 ( 0.1%) 0.0075 ( 0.1%) X86 Fixup Inst Tuning 0.0073 ( 0.1%) 0.0001 ( 0.0%) 0.0074 ( 0.1%) 0.0074 ( 0.1%) Post-RA pseudo instruction expansion pass 0.0071 ( 0.1%) 0.0002 ( 0.0%) 0.0073 ( 0.1%) 0.0073 ( 0.1%) Machine Late Instructions Cleanup Pass 0.0056 ( 0.1%) 0.0014 ( 0.2%) 0.0071 ( 0.1%) 0.0073 ( 0.1%) Canonicalize Freeze Instructions in Loops 0.0070 ( 0.1%) 0.0001 ( 0.0%) 0.0071 ( 0.1%) 0.0071 ( 0.1%) Remove unreachable machine basic blocks 0.0069 ( 0.1%) 0.0001 ( 0.0%) 0.0070 ( 0.1%) 0.0070 ( 0.1%) Bundle Machine CFG Edges #2 0.0069 ( 0.1%) 0.0001 ( 0.0%) 0.0069 ( 0.1%) 0.0069 ( 0.1%) X86 Lower Tile Copy 0.0067 ( 0.1%) 0.0002 ( 0.0%) 0.0069 ( 0.1%) 0.0069 ( 0.1%) X86 Execution Dependency Fix 0.0066 ( 0.1%) 0.0003 ( 0.0%) 0.0069 ( 0.1%) 0.0069 ( 0.1%) X86 Byte/Word Instruction Fixup 0.0063 ( 0.1%) 0.0003 ( 0.0%) 0.0066 ( 0.1%) 0.0066 ( 0.1%) Machine Copy Propagation Pass #2 0.0061 ( 0.1%) 0.0004 ( 0.1%) 0.0064 ( 0.1%) 0.0064 ( 0.1%) Peephole Optimizations 0.0060 ( 0.1%) 0.0001 ( 0.0%) 0.0061 ( 0.1%) 0.0061 ( 0.1%) X86 pseudo instruction expansion pass 0.0060 ( 0.1%) 0.0001 ( 0.0%) 0.0061 ( 0.1%) 0.0061 ( 0.1%) Replace intrinsics with calls to vector library 0.0059 ( 0.1%) 0.0001 ( 0.0%) 0.0061 ( 0.1%) 0.0061 ( 0.1%) Interleaved Access Pass 0.0053 ( 0.1%) 0.0007 ( 0.1%) 0.0060 ( 0.1%) 0.0060 ( 0.1%) Scalar Evolution Analysis 0.0059 ( 0.1%) 0.0001 ( 0.0%) 0.0059 ( 0.1%) 0.0059 ( 0.1%) X86 Fixup Vector Constants 0.0053 ( 0.1%) 0.0004 ( 0.1%) 0.0057 ( 0.1%) 0.0057 ( 0.1%) Remove dead machine instructions 0.0051 ( 0.1%) 0.0002 ( 0.0%) 0.0053 ( 0.1%) 0.0053 ( 0.1%) Live Range Shrink 0.0051 ( 0.1%) 0.0001 ( 0.0%) 0.0052 ( 0.1%) 0.0052 ( 0.1%) Machine Cycle Info Analysis 0.0043 ( 0.1%) 0.0005 ( 0.1%) 0.0048 ( 0.1%) 0.0048 ( 0.1%) Constant Hoisting 0.0042 ( 0.1%) 0.0001 ( 0.0%) 0.0043 ( 0.1%) 0.0043 ( 0.1%) Expand large fp convert 0.0040 ( 0.1%) 0.0002 ( 0.0%) 0.0042 ( 0.1%) 0.0042 ( 0.1%) Remove unreachable blocks from the CFG 0.0037 ( 0.1%) 0.0002 ( 0.0%) 0.0038 ( 0.1%) 0.0038 ( 0.1%) Expand Atomic instructions 0.0033 ( 0.1%) 0.0002 ( 0.0%) 0.0035 ( 0.1%) 0.0035 ( 0.1%) Live DEBUG_VALUE analysis 0.0033 ( 0.1%) 0.0002 ( 0.0%) 0.0035 ( 0.1%) 0.0035 ( 0.1%) Canonicalize natural loops 0.0033 ( 0.1%) 0.0002 ( 0.0%) 0.0034 ( 0.1%) 0.0034 ( 0.1%) BreakFalseDeps 0.0032 ( 0.1%) 0.0001 ( 0.0%) 0.0033 ( 0.1%) 0.0033 ( 0.1%) Expand vector predication intrinsics 0.0031 ( 0.1%) 0.0002 ( 0.0%) 0.0033 ( 0.1%) 0.0033 ( 0.1%) X86 LEA Optimize 0.0031 ( 0.1%) 0.0001 ( 0.0%) 0.0032 ( 0.1%) 0.0032 ( 0.1%) Scalarize Masked Memory Intrinsics 0.0030 ( 0.1%) 0.0002 ( 0.0%) 0.0032 ( 0.1%) 0.0032 ( 0.1%) Remove dead machine instructions #2 0.0028 ( 0.0%) 0.0001 ( 0.0%) 0.0029 ( 0.0%) 0.0029 ( 0.0%) PostRA Machine Sink 0.0026 ( 0.0%) 0.0001 ( 0.0%) 0.0027 ( 0.0%) 0.0026 ( 0.0%) Expand reduction intrinsics 0.0023 ( 0.0%) 0.0002 ( 0.0%) 0.0025 ( 0.0%) 0.0025 ( 0.0%) Shrink Wrapping analysis 0.0021 ( 0.0%) 0.0001 ( 0.0%) 0.0022 ( 0.0%) 0.0022 ( 0.0%) Bundle Machine CFG Edges 0.0020 ( 0.0%) 0.0001 ( 0.0%) 0.0021 ( 0.0%) 0.0021 ( 0.0%) Early Tail Duplication 0.0017 ( 0.0%) 0.0001 ( 0.0%) 0.0019 ( 0.0%) 0.0020 ( 0.0%) X86 cmov Conversion 0.0017 ( 0.0%) 0.0002 ( 0.0%) 0.0019 ( 0.0%) 0.0018 ( 0.0%) Exception handling preparation 0.0015 ( 0.0%) 0.0003 ( 0.0%) 0.0017 ( 0.0%) 0.0017 ( 0.0%) Early Machine Loop Invariant Code Motion 0.0016 ( 0.0%) 0.0001 ( 0.0%) 0.0017 ( 0.0%) 0.0017 ( 0.0%) Greedy Register Allocator 0.0014 ( 0.0%) 0.0002 ( 0.0%) 0.0016 ( 0.0%) 0.0016 ( 0.0%) Expand memcmp() to load/stores 0.0013 ( 0.0%) 0.0001 ( 0.0%) 0.0014 ( 0.0%) 0.0014 ( 0.0%) Tail Duplication 0.0012 ( 0.0%) 0.0001 ( 0.0%) 0.0013 ( 0.0%) 0.0013 ( 0.0%) X86 LEA Fixup 0.0011 ( 0.0%) 0.0001 ( 0.0%) 0.0012 ( 0.0%) 0.0012 ( 0.0%) Spill Code Placement Analysis 0.0012 ( 0.0%) 0.0001 ( 0.0%) 0.0012 ( 0.0%) 0.0012 ( 0.0%) Live Register Matrix 0.0009 ( 0.0%) 0.0001 ( 0.0%) 0.0011 ( 0.0%) 0.0011 ( 0.0%) Partially inline calls to library functions 0.0010 ( 0.0%) 0.0001 ( 0.0%) 0.0011 ( 0.0%) 0.0011 ( 0.0%) X86 Optimize Call Frame 0.0008 ( 0.0%) 0.0001 ( 0.0%) 0.0009 ( 0.0%) 0.0009 ( 0.0%) X86 Partial Reduction 0.0008 ( 0.0%) 0.0001 ( 0.0%) 0.0009 ( 0.0%) 0.0009 ( 0.0%) Live Stack Slot Analysis 0.0007 ( 0.0%) 0.0001 ( 0.0%) 0.0008 ( 0.0%) 0.0008 ( 0.0%) X86 Avoid Store Forwarding Blocks 0.0006 ( 0.0%) 0.0001 ( 0.0%) 0.0007 ( 0.0%) 0.0007 ( 0.0%) Optimize machine instruction PHIs 0.0006 ( 0.0%) 0.0001 ( 0.0%) 0.0007 ( 0.0%) 0.0007 ( 0.0%) Prepare callbr 0.0005 ( 0.0%) 0.0001 ( 0.0%) 0.0006 ( 0.0%) 0.0006 ( 0.0%) Virtual Register Map 0.0005 ( 0.0%) 0.0001 ( 0.0%) 0.0006 ( 0.0%) 0.0006 ( 0.0%) Machine Loop Invariant Code Motion 0.0005 ( 0.0%) 0.0001 ( 0.0%) 0.0006 ( 0.0%) 0.0006 ( 0.0%) Machine Trace Metrics 0.0004 ( 0.0%) 0.0002 ( 0.0%) 0.0005 ( 0.0%) 0.0005 ( 0.0%) Merge contiguous icmps into a memcmp 0.0002 ( 0.0%) 0.0001 ( 0.0%) 0.0004 ( 0.0%) 0.0004 ( 0.0%) Function Alias Analysis Results #2 0.0002 ( 0.0%) 0.0001 ( 0.0%) 0.0003 ( 0.0%) 0.0003 ( 0.0%) Post RA top-down list latency scheduler 0.0002 ( 0.0%) 0.0001 ( 0.0%) 0.0003 ( 0.0%) 0.0003 ( 0.0%) Basic Alias Analysis (stateless AA impl) 0.0002 ( 0.0%) 0.0001 ( 0.0%) 0.0003 ( 0.0%) 0.0003 ( 0.0%) Function Alias Analysis Results #3 0.0002 ( 0.0%) 0.0001 ( 0.0%) 0.0003 ( 0.0%) 0.0003 ( 0.0%) Insert KCFI indirect call checks 0.0002 ( 0.0%) 0.0001 ( 0.0%) 0.0003 ( 0.0%) 0.0003 ( 0.0%) Assignment Tracking Analysis 0.0002 ( 0.0%) 0.0001 ( 0.0%) 0.0002 ( 0.0%) 0.0003 ( 0.0%) Basic Alias Analysis (stateless AA impl) #2 0.0002 ( 0.0%) 0.0001 ( 0.0%) 0.0002 ( 0.0%) 0.0002 ( 0.0%) X86 Indirect Branch Tracking 0.0002 ( 0.0%) 0.0001 ( 0.0%) 0.0002 ( 0.0%) 0.0002 ( 0.0%) Lazy Branch Probability Analysis 0.0001 ( 0.0%) 0.0001 ( 0.0%) 0.0002 ( 0.0%) 0.0002 ( 0.0%) Basic Alias Analysis (stateless AA impl) #3 0.0001 ( 0.0%) 0.0001 ( 0.0%) 0.0002 ( 0.0%) 0.0002 ( 0.0%) Insert stack protectors 0.0001 ( 0.0%) 0.0001 ( 0.0%) 0.0002 ( 0.0%) 0.0002 ( 0.0%) Basic Alias Analysis (stateless AA impl) #4 0.0002 ( 0.0%) 0.0001 ( 0.0%) 0.0002 ( 0.0%) 0.0002 ( 0.0%) Local Dynamic TLS Access Clean-up 0.0002 ( 0.0%) 0.0001 ( 0.0%) 0.0002 ( 0.0%) 0.0002 ( 0.0%) Argument Stack Rebase 0.0001 ( 0.0%) 0.0001 ( 0.0%) 0.0002 ( 0.0%) 0.0002 ( 0.0%) Lazy Branch Probability Analysis #2 0.0001 ( 0.0%) 0.0001 ( 0.0%) 0.0002 ( 0.0%) 0.0002 ( 0.0%) Unpack machine instruction bundles 0.0001 ( 0.0%) 0.0001 ( 0.0%) 0.0002 ( 0.0%) 0.0002 ( 0.0%) Expand indirectbr instructions 0.0002 ( 0.0%) 0.0001 ( 0.0%) 0.0002 ( 0.0%) 0.0002 ( 0.0%) Function Alias Analysis Results 0.0002 ( 0.0%) 0.0001 ( 0.0%) 0.0002 ( 0.0%) 0.0002 ( 0.0%) Insert XRay ops 0.0002 ( 0.0%) 0.0001 ( 0.0%) 0.0002 ( 0.0%) 0.0002 ( 0.0%) Insert fentry calls 0.0001 ( 0.0%) 0.0001 ( 0.0%) 0.0002 ( 0.0%) 0.0002 ( 0.0%) Shadow Stack GC Lowering 0.0002 ( 0.0%) 0.0001 ( 0.0%) 0.0002 ( 0.0%) 0.0002 ( 0.0%) Rename Disconnected Subregister Components 0.0001 ( 0.0%) 0.0001 ( 0.0%) 0.0002 ( 0.0%) 0.0002 ( 0.0%) TLS Variable Hoist 0.0002 ( 0.0%) 0.0001 ( 0.0%) 0.0002 ( 0.0%) 0.0002 ( 0.0%) Early If-Conversion 0.0001 ( 0.0%) 0.0000 ( 0.0%) 0.0002 ( 0.0%) 0.0002 ( 0.0%) Implement the 'patchable-function' attribute 0.0001 ( 0.0%) 0.0001 ( 0.0%) 0.0002 ( 0.0%) 0.0002 ( 0.0%) Tile Register Configure 0.0001 ( 0.0%) 0.0001 ( 0.0%) 0.0002 ( 0.0%) 0.0002 ( 0.0%) X86 Domain Reassignment Pass 0.0001 ( 0.0%) 0.0001 ( 0.0%) 0.0002 ( 0.0%) 0.0002 ( 0.0%) Local Stack Slot Allocation 0.0001 ( 0.0%) 0.0001 ( 0.0%) 0.0002 ( 0.0%) 0.0002 ( 0.0%) Analyze Machine Code For Garbage Collection 0.0001 ( 0.0%) 0.0001 ( 0.0%) 0.0002 ( 0.0%) 0.0002 ( 0.0%) X86 PIC Global Base Reg Initialization 0.0001 ( 0.0%) 0.0001 ( 0.0%) 0.0002 ( 0.0%) 0.0002 ( 0.0%) Machine Optimization Remark Emitter #2 0.0001 ( 0.0%) 0.0001 ( 0.0%) 0.0002 ( 0.0%) 0.0002 ( 0.0%) Stack Frame Layout Analysis 0.0001 ( 0.0%) 0.0001 ( 0.0%) 0.0002 ( 0.0%) 0.0002 ( 0.0%) X86 Atom pad short functions 0.0001 ( 0.0%) 0.0001 ( 0.0%) 0.0002 ( 0.0%) 0.0002 ( 0.0%) Machine Optimization Remark Emitter 0.0002 ( 0.0%) 0.0001 ( 0.0%) 0.0002 ( 0.0%) 0.0002 ( 0.0%) Register Allocation Pass Scoring 0.0001 ( 0.0%) 0.0001 ( 0.0%) 0.0002 ( 0.0%) 0.0002 ( 0.0%) Lazy Machine Block Frequency Analysis #6 0.0002 ( 0.0%) 0.0000 ( 0.0%) 0.0002 ( 0.0%) 0.0002 ( 0.0%) X86 FP Stackifier 0.0001 ( 0.0%) 0.0001 ( 0.0%) 0.0002 ( 0.0%) 0.0002 ( 0.0%) X86 Discriminate Memory Operands 0.0001 ( 0.0%) 0.0001 ( 0.0%) 0.0002 ( 0.0%) 0.0002 ( 0.0%) X86 Speculative Execution Side Effect Suppression 0.0001 ( 0.0%) 0.0001 ( 0.0%) 0.0002 ( 0.0%) 0.0002 ( 0.0%) StackMap Liveness Analysis 0.0001 ( 0.0%) 0.0001 ( 0.0%) 0.0002 ( 0.0%) 0.0002 ( 0.0%) Contiguously Lay Out Funclets 0.0001 ( 0.0%) 0.0001 ( 0.0%) 0.0002 ( 0.0%) 0.0002 ( 0.0%) Machine Optimization Remark Emitter #4 0.0001 ( 0.0%) 0.0001 ( 0.0%) 0.0002 ( 0.0%) 0.0002 ( 0.0%) X86 Indirect Thunks 0.0001 ( 0.0%) 0.0001 ( 0.0%) 0.0002 ( 0.0%) 0.0002 ( 0.0%) Lazy Block Frequency Analysis #2 0.0001 ( 0.0%) 0.0001 ( 0.0%) 0.0002 ( 0.0%) 0.0002 ( 0.0%) Remove Redundant DEBUG_VALUE analysis 0.0001 ( 0.0%) 0.0000 ( 0.0%) 0.0002 ( 0.0%) 0.0002 ( 0.0%) X86 Load Value Injection (LVI) Load Hardening 0.0001 ( 0.0%) 0.0000 ( 0.0%) 0.0002 ( 0.0%) 0.0002 ( 0.0%) Lazy Machine Block Frequency Analysis 0.0001 ( 0.0%) 0.0000 ( 0.0%) 0.0002 ( 0.0%) 0.0002 ( 0.0%) Detect Dead Lanes 0.0001 ( 0.0%) 0.0001 ( 0.0%) 0.0002 ( 0.0%) 0.0002 ( 0.0%) X86 Load Value Injection (LVI) Ret-Hardening 0.0001 ( 0.0%) 0.0001 ( 0.0%) 0.0002 ( 0.0%) 0.0002 ( 0.0%) Lazy Machine Block Frequency Analysis #3 0.0001 ( 0.0%) 0.0001 ( 0.0%) 0.0002 ( 0.0%) 0.0002 ( 0.0%) Fixup Statepoint Caller Saved 0.0001 ( 0.0%) 0.0001 ( 0.0%) 0.0002 ( 0.0%) 0.0002 ( 0.0%) Lazy Machine Block Frequency Analysis #4 0.0001 ( 0.0%) 0.0001 ( 0.0%) 0.0002 ( 0.0%) 0.0002 ( 0.0%) X86 speculative load hardening 0.0001 ( 0.0%) 0.0001 ( 0.0%) 0.0002 ( 0.0%) 0.0002 ( 0.0%) X86 vzeroupper inserter 0.0001 ( 0.0%) 0.0001 ( 0.0%) 0.0002 ( 0.0%) 0.0002 ( 0.0%) Machine Optimization Remark Emitter #3 0.0001 ( 0.0%) 0.0001 ( 0.0%) 0.0002 ( 0.0%) 0.0002 ( 0.0%) X86 insert wait instruction 0.0001 ( 0.0%) 0.0001 ( 0.0%) 0.0002 ( 0.0%) 0.0002 ( 0.0%) Lower AMX intrinsics 0.0001 ( 0.0%) 0.0001 ( 0.0%) 0.0002 ( 0.0%) 0.0002 ( 0.0%) Lazy Block Frequency Analysis 0.0001 ( 0.0%) 0.0001 ( 0.0%) 0.0002 ( 0.0%) 0.0002 ( 0.0%) Lazy Machine Block Frequency Analysis #5 0.0001 ( 0.0%) 0.0001 ( 0.0%) 0.0002 ( 0.0%) 0.0002 ( 0.0%) Lazy Machine Block Frequency Analysis #7 0.0001 ( 0.0%) 0.0001 ( 0.0%) 0.0002 ( 0.0%) 0.0002 ( 0.0%) X86 Insert Cache Prefetches 0.0001 ( 0.0%) 0.0000 ( 0.0%) 0.0002 ( 0.0%) 0.0002 ( 0.0%) X86 DynAlloca Expander 0.0001 ( 0.0%) 0.0000 ( 0.0%) 0.0002 ( 0.0%) 0.0002 ( 0.0%) Compressing EVEX instrs when possible 0.0001 ( 0.0%) 0.0001 ( 0.0%) 0.0002 ( 0.0%) 0.0002 ( 0.0%) Pseudo Probe Inserter 0.0001 ( 0.0%) 0.0001 ( 0.0%) 0.0002 ( 0.0%) 0.0002 ( 0.0%) Lazy Machine Block Frequency Analysis #8 0.0001 ( 0.0%) 0.0001 ( 0.0%) 0.0002 ( 0.0%) 0.0002 ( 0.0%) Lazy Machine Block Frequency Analysis #9 0.0001 ( 0.0%) 0.0001 ( 0.0%) 0.0002 ( 0.0%) 0.0002 ( 0.0%) Safe Stack instrumentation pass 0.0001 ( 0.0%) 0.0001 ( 0.0%) 0.0002 ( 0.0%) 0.0002 ( 0.0%) Lazy Machine Block Frequency Analysis #10 0.0001 ( 0.0%) 0.0001 ( 0.0%) 0.0002 ( 0.0%) 0.0002 ( 0.0%) Lower Garbage Collection Instructions 0.0001 ( 0.0%) 0.0001 ( 0.0%) 0.0002 ( 0.0%) 0.0002 ( 0.0%) Lazy Machine Block Frequency Analysis #2 0.0001 ( 0.0%) 0.0001 ( 0.0%) 0.0002 ( 0.0%) 0.0002 ( 0.0%) X86 Return Thunks 0.0001 ( 0.0%) 0.0001 ( 0.0%) 0.0002 ( 0.0%) 0.0002 ( 0.0%) Machine Sanitizer Binary Metadata 0.0001 ( 0.0%) 0.0000 ( 0.0%) 0.0001 ( 0.0%) 0.0001 ( 0.0%) Pre-ISel Intrinsic Lowering 0.0000 ( 0.0%) 0.0001 ( 0.0%) 0.0001 ( 0.0%) 0.0001 ( 0.0%) Assumption Cache Tracker 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Machine Branch Probability Analysis 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Target Library Information 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Default Regalloc Eviction Advisor 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Target Pass Configuration 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Profile summary info 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Type-Based Alias Analysis 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Target Transform Information 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Create Garbage Collector Module Metadata 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Machine Module Information 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Default Regalloc Priority Advisor 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Scoped NoAlias Alias Analysis 5.6017 (100.0%) 0.6236 (100.0%) 6.2252 (100.0%) 6.2315 (100.0%) Total ===-------------------------------------------------------------------------=== DWARF Emission ===-------------------------------------------------------------------------=== Total Execution Time: 0.2326 seconds (0.2392 wall clock) ---User Time--- --System Time-- --User+System-- ---Wall Time--- --- Name --- 0.0943 (100.0%) 0.1382 (100.0%) 0.2326 (100.0%) 0.2392 (100.0%) DWARF Exception Writer 0.0943 (100.0%) 0.1382 (100.0%) 0.2326 (100.0%) 0.2392 (100.0%) Total ===-------------------------------------------------------------------------=== Clang front-end time report ===-------------------------------------------------------------------------=== Total Execution Time: 1054.1768 seconds (1054.4013 wall clock) ---User Time--- --System Time-- --User+System-- ---Wall Time--- --- Name --- 1051.2682 (100.0%) 2.9086 (100.0%) 1054.1768 (100.0%) 1054.4013 (100.0%) Clang front-end timer 1051.2682 (100.0%) 2.9086 (100.0%) 1054.1768 (100.0%) 1054.4013 (100.0%) Total ```

So almost all of the time is spent in LoopRotatePass (2/3rds) and SROAPass (1/3rd). LoopRotatePass seems like it may not be that useful for the straight line code generated by CXXRTL?

whitequark commented 4 weeks ago

Yep! This seems like a straightforward bug. I should be able to fix it whenever I get the chance, but I'm not sure when exactly that will happen.