Open Quuxplusone opened 6 years ago
Attached sfinae-perf-bug.cpp.tz
(167140 bytes, application/gzip): demonstrate severe compile-time degradation of SFINAE-in-tparam-list
Some slowdown from adding an extra template parameter would be expected, but 10x or even 60% seems unreasonable. Do you have any profiling data (or even just a few backtraces from the slow run) to help identify what's going slowly?
I don't. I could get one from the attached example, but I'm away from my computer right now.
SLOW TIME REPORT (-ftime-report):
=================================
===-------------------------------------------------------------------------===
Miscellaneous Ungrouped Timers
===-------------------------------------------------------------------------===
---User Time--- --System Time-- --User+System-- ---Wall Time--- --- Name ---
0.0034 ( 24.9%) 0.0046 ( 50.8%) 0.0080 ( 35.3%) 0.0301 ( 53.6%) Code Generation Time
0.0102 ( 75.1%) 0.0045 ( 49.2%) 0.0146 ( 64.7%) 0.0261 ( 46.4%) LLVM IR Generation Time
0.0136 (100.0%) 0.0091 (100.0%) 0.0226 (100.0%) 0.0561 (100.0%) Total
===-------------------------------------------------------------------------===
DWARF Emission
===-------------------------------------------------------------------------===
Total Execution Time: 0.0001 seconds (0.0004 wall clock)
---User Time--- --System Time-- --User+System-- ---Wall Time--- --- Name ---
0.0000 ( 51.4%) 0.0001 ( 71.4%) 0.0001 ( 65.2%) 0.0003 ( 70.1%) Debug Info Emission
0.0000 ( 11.4%) 0.0000 ( 24.7%) 0.0000 ( 20.5%) 0.0001 ( 26.3%) DWARF Debug Writer
0.0000 ( 37.1%) 0.0000 ( 3.9%) 0.0000 ( 14.3%) 0.0000 ( 3.6%) DWARF Exception Writer
0.0000 (100.0%) 0.0001 (100.0%) 0.0001 (100.0%) 0.0004 (100.0%) Total
===-------------------------------------------------------------------------===
... Pass execution timing report ...
===-------------------------------------------------------------------------===
Total Execution Time: 0.0042 seconds (0.0143 wall clock)
---User Time--- --System Time-- --User+System-- ---Wall Time--- --- Name ---
0.0007 ( 34.5%) 0.0007 ( 29.2%) 0.0013 ( 31.6%) 0.0043 ( 29.7%) Expand Atomic instructions
0.0002 ( 10.6%) 0.0006 ( 25.6%) 0.0008 ( 18.8%) 0.0036 ( 24.9%) X86 DAG->DAG Instruction Selection
0.0001 ( 6.4%) 0.0002 ( 9.4%) 0.0003 ( 8.1%) 0.0015 ( 10.5%) X86 Assembly Printer
0.0001 ( 3.1%) 0.0001 ( 5.7%) 0.0002 ( 4.5%) 0.0008 ( 5.4%) Prologue/Epilogue Insertion & Frame Finalization
0.0005 ( 24.0%) 0.0001 ( 3.3%) 0.0005 ( 12.8%) 0.0005 ( 3.8%) X86 Retpoline Thunks
0.0000 ( 2.0%) 0.0001 ( 2.9%) 0.0001 ( 2.5%) 0.0004 ( 3.0%) Inliner for always_inline functions
0.0000 ( 1.0%) 0.0001 ( 2.7%) 0.0001 ( 1.9%) 0.0004 ( 2.8%) MachineDominator Tree Construction
0.0000 ( 2.1%) 0.0001 ( 2.4%) 0.0001 ( 2.3%) 0.0003 ( 2.3%) Fast Register Allocator
0.0000 ( 0.3%) 0.0000 ( 1.7%) 0.0000 ( 1.0%) 0.0003 ( 2.0%) Scalarize Masked Memory Intrinsics
0.0000 ( 1.7%) 0.0000 ( 2.0%) 0.0001 ( 1.9%) 0.0003 ( 2.0%) Insert stack protectors
0.0000 ( 0.6%) 0.0001 ( 2.2%) 0.0001 ( 1.5%) 0.0003 ( 1.9%) Post-RA pseudo instruction expansion pass
0.0000 ( 0.8%) 0.0000 ( 1.8%) 0.0001 ( 1.3%) 0.0003 ( 1.9%) Dominator Tree Construction
0.0000 ( 1.2%) 0.0000 ( 1.7%) 0.0001 ( 1.5%) 0.0003 ( 1.9%) Machine Natural Loop Construction
0.0000 ( 1.3%) 0.0000 ( 1.3%) 0.0001 ( 1.3%) 0.0002 ( 1.5%) CallGraph Construction
0.0000 ( 1.1%) 0.0000 ( 1.4%) 0.0001 ( 1.3%) 0.0002 ( 1.2%) Two-Address instruction pass
0.0000 ( 0.7%) 0.0000 ( 1.0%) 0.0000 ( 0.9%) 0.0002 ( 1.1%) Free MachineFunction
0.0000 ( 0.3%) 0.0000 ( 0.8%) 0.0000 ( 0.6%) 0.0001 ( 1.0%) Eliminate PHI nodes for register allocation
0.0000 ( 0.3%) 0.0000 ( 0.9%) 0.0000 ( 0.6%) 0.0001 ( 0.9%) Expand reduction intrinsics
0.0000 ( 0.3%) 0.0000 ( 1.0%) 0.0000 ( 0.6%) 0.0001 ( 0.7%) Local Stack Slot Allocation
0.0000 ( 0.6%) 0.0000 ( 0.0%) 0.0000 ( 0.3%) 0.0000 ( 0.1%) MachineDominator Tree Construction
0.0000 ( 0.4%) 0.0000 ( 0.1%) 0.0000 ( 0.3%) 0.0000 ( 0.1%) Machine Natural Loop Construction
0.0000 ( 0.4%) 0.0000 ( 0.1%) 0.0000 ( 0.2%) 0.0000 ( 0.1%) Dominator Tree Construction
0.0000 ( 0.3%) 0.0000 ( 0.0%) 0.0000 ( 0.2%) 0.0000 ( 0.1%) Basic Alias Analysis (stateless AA impl)
0.0000 ( 0.2%) 0.0000 ( 0.3%) 0.0000 ( 0.3%) 0.0000 ( 0.1%) Lazy Machine Block Frequency Analysis
0.0000 ( 0.2%) 0.0000 ( 0.1%) 0.0000 ( 0.2%) 0.0000 ( 0.1%) X86 WinAlloca Expander
0.0000 ( 0.3%) 0.0000 ( 0.1%) 0.0000 ( 0.2%) 0.0000 ( 0.1%) Exception handling preparation
0.0000 ( 0.3%) 0.0000 ( 0.0%) 0.0000 ( 0.1%) 0.0000 ( 0.1%) Expand indirectbr instructions
0.0000 ( 0.2%) 0.0000 ( 0.0%) 0.0000 ( 0.1%) 0.0000 ( 0.1%) Machine Optimization Remark Emitter
0.0000 ( 0.2%) 0.0000 ( 0.1%) 0.0000 ( 0.1%) 0.0000 ( 0.1%) Contiguously Lay Out Funclets
0.0000 ( 0.3%) 0.0000 ( 0.1%) 0.0000 ( 0.2%) 0.0000 ( 0.0%) Analyze Machine Code For Garbage Collection
0.0000 ( 0.3%) 0.0000 ( 0.1%) 0.0000 ( 0.2%) 0.0000 ( 0.0%) Remove unreachable blocks from the CFG
0.0000 ( 0.2%) 0.0000 ( 0.1%) 0.0000 ( 0.2%) 0.0000 ( 0.0%) Instrument function entry/exit with calls to e.g. mcount() (pre inlining)
0.0000 ( 0.2%) 0.0000 ( 0.1%) 0.0000 ( 0.2%) 0.0000 ( 0.0%) Bundle Machine CFG Edges
0.0000 ( 0.2%) 0.0000 ( 0.1%) 0.0000 ( 0.1%) 0.0000 ( 0.0%) Live DEBUG_VALUE analysis
0.0000 ( 0.2%) 0.0000 ( 0.1%) 0.0000 ( 0.1%) 0.0000 ( 0.0%) Insert fentry calls
0.0000 ( 0.3%) 0.0000 ( 0.1%) 0.0000 ( 0.2%) 0.0000 ( 0.0%) X86 pseudo instruction expansion pass
0.0000 ( 0.2%) 0.0000 ( 0.0%) 0.0000 ( 0.1%) 0.0000 ( 0.0%) Machine Optimization Remark Emitter
0.0000 ( 0.1%) 0.0000 ( 0.1%) 0.0000 ( 0.1%) 0.0000 ( 0.0%) X86 PIC Global Base Reg Initialization
0.0000 ( 0.2%) 0.0000 ( 0.0%) 0.0000 ( 0.1%) 0.0000 ( 0.0%) Safe Stack instrumentation pass
0.0000 ( 0.1%) 0.0000 ( 0.1%) 0.0000 ( 0.1%) 0.0000 ( 0.0%) Shadow Stack GC Lowering
0.0000 ( 0.1%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) X86 FP Stackifier
0.0000 ( 0.2%) 0.0000 ( 0.1%) 0.0000 ( 0.1%) 0.0000 ( 0.0%) Instrument function entry/exit with calls to e.g. mcount() (post inlining)
0.0000 ( 0.2%) 0.0000 ( 0.1%) 0.0000 ( 0.1%) 0.0000 ( 0.0%) X86 vzeroupper inserter
0.0000 ( 0.2%) 0.0000 ( 0.1%) 0.0000 ( 0.1%) 0.0000 ( 0.0%) X86 Indirect Branch Tracking
0.0000 ( 0.2%) 0.0000 ( 0.1%) 0.0000 ( 0.1%) 0.0000 ( 0.0%) Implement the 'patchable-function' attribute
0.0000 ( 0.2%) 0.0000 ( 0.0%) 0.0000 ( 0.1%) 0.0000 ( 0.0%) Expand ISel Pseudo-instructions
0.0000 ( 0.2%) 0.0000 ( 0.0%) 0.0000 ( 0.1%) 0.0000 ( 0.0%) Lower Garbage Collection Instructions
0.0000 ( 0.3%) 0.0000 ( 0.0%) 0.0000 ( 0.1%) 0.0000 ( 0.0%) Lazy Machine Block Frequency Analysis
0.0000 ( 0.2%) 0.0000 ( 0.1%) 0.0000 ( 0.1%) 0.0000 ( 0.0%) StackMap Liveness Analysis
0.0000 ( 0.3%) 0.0000 ( 0.0%) 0.0000 ( 0.1%) 0.0000 ( 0.0%) Insert XRay ops
0.0000 ( 0.1%) 0.0000 ( 0.0%) 0.0000 ( 0.1%) 0.0000 ( 0.0%) Pre-ISel Intrinsic Lowering
0.0000 ( 0.1%) 0.0000 ( 0.1%) 0.0000 ( 0.1%) 0.0000 ( 0.0%) Rewrite Symbols
0.0000 ( 0.1%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Assumption Cache Tracker
0.0000 ( 0.1%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Target Library Information
0.0000 ( 0.1%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Machine Branch Probability Analysis
0.0000 ( 0.1%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Target Pass Configuration
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Target Transform Information
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Target Library Information
0.0000 ( 0.1%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Profile summary info
0.0000 ( 0.1%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Force set function attributes
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Machine Module Information
0.0000 ( 0.1%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Assumption Cache Tracker
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Create Garbage Collector Module Metadata
0.0019 (100.0%) 0.0023 (100.0%) 0.0042 (100.0%) 0.0143 (100.0%) Total
===-------------------------------------------------------------------------===
Clang front-end time report
===-------------------------------------------------------------------------===
Total Execution Time: 5.6984 seconds (5.7773 wall clock)
---User Time--- --System Time-- --User+System-- ---Wall Time--- --- Name ---
5.5208 (100.0%) 0.1776 (100.0%) 5.6984 (100.0%) 5.7773 (100.0%) Clang front-end timer
5.5208 (100.0%) 0.1776 (100.0%) 5.6984 (100.0%) 5.7773 (100.0%) Total
FAST TIME REPORT (-ftime-report):
=================================
===-------------------------------------------------------------------------===
Miscellaneous Ungrouped Timers
===-------------------------------------------------------------------------===
---User Time--- --System Time-- --User+System-- ---Wall Time--- --- Name ---
0.0088 ( 76.3%) 0.0016 ( 65.0%) 0.0105 ( 74.3%) 0.0108 ( 74.8%) LLVM IR Generation Time
0.0027 ( 23.7%) 0.0009 ( 35.0%) 0.0036 ( 25.7%) 0.0036 ( 25.2%) Code Generation Time
0.0116 (100.0%) 0.0025 (100.0%) 0.0141 (100.0%) 0.0144 (100.0%) Total
===-------------------------------------------------------------------------===
DWARF Emission
===-------------------------------------------------------------------------===
Total Execution Time: 0.0000 seconds (0.0000 wall clock)
---User Time--- --System Time-- --User+System-- ---Wall Time--- --- Name ---
0.0000 ( 57.1%) 0.0000 ( 57.1%) 0.0000 ( 57.1%) 0.0000 ( 54.3%) Debug Info Emission
0.0000 ( 32.1%) 0.0000 ( 28.6%) 0.0000 ( 31.0%) 0.0000 ( 33.7%) DWARF Exception Writer
0.0000 ( 10.7%) 0.0000 ( 14.3%) 0.0000 ( 11.9%) 0.0000 ( 12.0%) DWARF Debug Writer
0.0000 (100.0%) 0.0000 (100.0%) 0.0000 (100.0%) 0.0000 (100.0%) Total
===-------------------------------------------------------------------------===
... Pass execution timing report ...
===-------------------------------------------------------------------------===
Total Execution Time: 0.0022 seconds (0.0022 wall clock)
---User Time--- --System Time-- --User+System-- ---Wall Time--- --- Name ---
0.0006 ( 35.2%) 0.0002 ( 27.6%) 0.0007 ( 33.3%) 0.0007 ( 33.5%) Expand Atomic instructions
0.0005 ( 27.1%) 0.0001 ( 14.8%) 0.0005 ( 23.9%) 0.0005 ( 24.0%) X86 Retpoline Thunks
0.0002 ( 9.3%) 0.0001 ( 19.7%) 0.0003 ( 11.9%) 0.0003 ( 12.0%) X86 DAG->DAG Instruction Selection
0.0001 ( 5.8%) 0.0000 ( 7.6%) 0.0001 ( 6.3%) 0.0001 ( 6.4%) X86 Assembly Printer
0.0000 ( 2.9%) 0.0000 ( 3.2%) 0.0001 ( 3.0%) 0.0001 ( 3.1%) Prologue/Epilogue Insertion & Frame Finalization
0.0000 ( 1.6%) 0.0000 ( 1.1%) 0.0000 ( 1.4%) 0.0000 ( 1.4%) Fast Register Allocator
0.0000 ( 1.3%) 0.0000 ( 1.6%) 0.0000 ( 1.3%) 0.0000 ( 1.3%) Insert stack protectors
0.0000 ( 1.1%) 0.0000 ( 2.1%) 0.0000 ( 1.4%) 0.0000 ( 1.2%) Two-Address instruction pass
0.0000 ( 1.0%) 0.0000 ( 1.4%) 0.0000 ( 1.1%) 0.0000 ( 1.2%) MachineDominator Tree Construction
0.0000 ( 0.8%) 0.0000 ( 1.8%) 0.0000 ( 1.1%) 0.0000 ( 1.1%) Inliner for always_inline functions
0.0000 ( 0.8%) 0.0000 ( 1.4%) 0.0000 ( 1.0%) 0.0000 ( 0.9%) Dominator Tree Construction
0.0000 ( 0.6%) 0.0000 ( 0.9%) 0.0000 ( 0.7%) 0.0000 ( 0.8%) Machine Natural Loop Construction
0.0000 ( 0.7%) 0.0000 ( 1.1%) 0.0000 ( 0.8%) 0.0000 ( 0.8%) Post-RA pseudo instruction expansion pass
0.0000 ( 0.7%) 0.0000 ( 0.9%) 0.0000 ( 0.7%) 0.0000 ( 0.7%) Free MachineFunction
0.0000 ( 0.5%) 0.0000 ( 0.7%) 0.0000 ( 0.5%) 0.0000 ( 0.5%) CallGraph Construction
0.0000 ( 0.5%) 0.0000 ( 0.5%) 0.0000 ( 0.5%) 0.0000 ( 0.5%) MachineDominator Tree Construction
0.0000 ( 0.4%) 0.0000 ( 0.7%) 0.0000 ( 0.5%) 0.0000 ( 0.4%) Expand reduction intrinsics
0.0000 ( 0.4%) 0.0000 ( 0.7%) 0.0000 ( 0.4%) 0.0000 ( 0.4%) Eliminate PHI nodes for register allocation
0.0000 ( 0.3%) 0.0000 ( 0.9%) 0.0000 ( 0.4%) 0.0000 ( 0.4%) Local Stack Slot Allocation
0.0000 ( 0.4%) 0.0000 ( 0.2%) 0.0000 ( 0.4%) 0.0000 ( 0.4%) Exception handling preparation
0.0000 ( 0.3%) 0.0000 ( 0.7%) 0.0000 ( 0.4%) 0.0000 ( 0.4%) Scalarize Masked Memory Intrinsics
0.0000 ( 0.4%) 0.0000 ( 0.4%) 0.0000 ( 0.4%) 0.0000 ( 0.4%) Basic Alias Analysis (stateless AA impl)
0.0000 ( 0.4%) 0.0000 ( 0.2%) 0.0000 ( 0.4%) 0.0000 ( 0.4%) Machine Natural Loop Construction
0.0000 ( 0.4%) 0.0000 ( 0.4%) 0.0000 ( 0.4%) 0.0000 ( 0.4%) Dominator Tree Construction
0.0000 ( 0.2%) 0.0000 ( 0.4%) 0.0000 ( 0.3%) 0.0000 ( 0.3%) Insert XRay ops
0.0000 ( 0.2%) 0.0000 ( 0.4%) 0.0000 ( 0.2%) 0.0000 ( 0.3%) Expand indirectbr instructions
0.0000 ( 0.3%) 0.0000 ( 0.4%) 0.0000 ( 0.3%) 0.0000 ( 0.3%) Instrument function entry/exit with calls to e.g. mcount() (pre inlining)
0.0000 ( 0.2%) 0.0000 ( 0.4%) 0.0000 ( 0.3%) 0.0000 ( 0.3%) X86 WinAlloca Expander
0.0000 ( 0.2%) 0.0000 ( 0.4%) 0.0000 ( 0.3%) 0.0000 ( 0.3%) StackMap Liveness Analysis
0.0000 ( 0.3%) 0.0000 ( 0.0%) 0.0000 ( 0.2%) 0.0000 ( 0.3%) Instrument function entry/exit with calls to e.g. mcount() (post inlining)
0.0000 ( 0.2%) 0.0000 ( 0.4%) 0.0000 ( 0.2%) 0.0000 ( 0.3%) Insert fentry calls
0.0000 ( 0.2%) 0.0000 ( 0.2%) 0.0000 ( 0.2%) 0.0000 ( 0.3%) Bundle Machine CFG Edges
0.0000 ( 0.3%) 0.0000 ( 0.5%) 0.0000 ( 0.4%) 0.0000 ( 0.3%) Shadow Stack GC Lowering
0.0000 ( 0.2%) 0.0000 ( 0.4%) 0.0000 ( 0.2%) 0.0000 ( 0.3%) Machine Optimization Remark Emitter
0.0000 ( 0.2%) 0.0000 ( 0.5%) 0.0000 ( 0.3%) 0.0000 ( 0.3%) Analyze Machine Code For Garbage Collection
0.0000 ( 0.4%) 0.0000 ( 0.5%) 0.0000 ( 0.4%) 0.0000 ( 0.3%) Contiguously Lay Out Funclets
0.0000 ( 0.2%) 0.0000 ( 0.4%) 0.0000 ( 0.2%) 0.0000 ( 0.3%) Safe Stack instrumentation pass
0.0000 ( 0.2%) 0.0000 ( 0.4%) 0.0000 ( 0.3%) 0.0000 ( 0.3%) X86 Indirect Branch Tracking
0.0000 ( 0.2%) 0.0000 ( 0.2%) 0.0000 ( 0.2%) 0.0000 ( 0.3%) X86 PIC Global Base Reg Initialization
0.0000 ( 0.3%) 0.0000 ( 0.4%) 0.0000 ( 0.3%) 0.0000 ( 0.2%) Machine Optimization Remark Emitter
0.0000 ( 0.2%) 0.0000 ( 0.5%) 0.0000 ( 0.3%) 0.0000 ( 0.2%) Implement the 'patchable-function' attribute
0.0000 ( 0.2%) 0.0000 ( 0.7%) 0.0000 ( 0.3%) 0.0000 ( 0.2%) Lazy Machine Block Frequency Analysis
0.0000 ( 0.2%) 0.0000 ( 0.2%) 0.0000 ( 0.2%) 0.0000 ( 0.2%) X86 FP Stackifier
0.0000 ( 0.2%) 0.0000 ( 0.2%) 0.0000 ( 0.2%) 0.0000 ( 0.2%) Live DEBUG_VALUE analysis
0.0000 ( 0.2%) 0.0000 ( 0.2%) 0.0000 ( 0.2%) 0.0000 ( 0.2%) X86 vzeroupper inserter
0.0000 ( 0.2%) 0.0000 ( 0.4%) 0.0000 ( 0.3%) 0.0000 ( 0.2%) Expand ISel Pseudo-instructions
0.0000 ( 0.4%) 0.0000 ( 0.2%) 0.0000 ( 0.3%) 0.0000 ( 0.2%) Lazy Machine Block Frequency Analysis
0.0000 ( 0.4%) 0.0000 ( 0.5%) 0.0000 ( 0.4%) 0.0000 ( 0.2%) Remove unreachable blocks from the CFG
0.0000 ( 0.1%) 0.0000 ( 0.2%) 0.0000 ( 0.1%) 0.0000 ( 0.2%) Rewrite Symbols
0.0000 ( 0.2%) 0.0000 ( 0.0%) 0.0000 ( 0.1%) 0.0000 ( 0.2%) Lower Garbage Collection Instructions
0.0000 ( 0.3%) 0.0000 ( 0.4%) 0.0000 ( 0.3%) 0.0000 ( 0.2%) X86 pseudo instruction expansion pass
0.0000 ( 0.1%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.1%) Assumption Cache Tracker
0.0000 ( 0.1%) 0.0000 ( 0.2%) 0.0000 ( 0.1%) 0.0000 ( 0.1%) Pre-ISel Intrinsic Lowering
0.0000 ( 0.1%) 0.0000 ( 0.2%) 0.0000 ( 0.1%) 0.0000 ( 0.0%) Target Pass Configuration
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Target Library Information
0.0000 ( 0.1%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Target Transform Information
0.0000 ( 0.1%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Assumption Cache Tracker
0.0000 ( 0.1%) 0.0000 ( 0.2%) 0.0000 ( 0.1%) 0.0000 ( 0.0%) Profile summary info
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Target Library Information
0.0000 ( 0.1%) 0.0000 ( 0.2%) 0.0000 ( 0.1%) 0.0000 ( 0.0%) Force set function attributes
0.0000 ( 0.0%) 0.0000 ( 0.2%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Machine Branch Probability Analysis
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Machine Module Information
0.0000 ( 0.1%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Create Garbage Collector Module Metadata
0.0017 (100.0%) 0.0006 (100.0%) 0.0022 (100.0%) 0.0022 (100.0%) Total
===-------------------------------------------------------------------------===
Clang front-end time report
===-------------------------------------------------------------------------===
Total Execution Time: 0.5674 seconds (0.5681 wall clock)
---User Time--- --System Time-- --User+System-- ---Wall Time--- --- Name ---
0.5402 (100.0%) 0.0272 (100.0%) 0.5674 (100.0%) 0.5681 (100.0%) Clang front-end timer
0.5402 (100.0%) 0.0272 (100.0%) 0.5674 (100.0%) 0.5681 (100.0%) Total
Aside: why is -ftime-report on clang so useless? None of the +5s running time is accounted for by any of the timers.
(In reply to Eric Niebler from comment #4)
> Aside: why is -ftime-report on clang so useless? None of the +5s running
> time is accounted for by any of the timers.
It's intended for investigating optimization problems - in part because it's a
bit easier to separate different optimization passes & measure them. The time
spent parsing/sema/etc is a bit harder to assess (since the whole parsing
process goes through all the layers somewhat more simultaneously - so there's
less descrete actions that can be timed in isolation)
A generic profiler (gprof, etc) would probably be more useful for this sort of
issue.
sfinae-perf-bug.cpp.tz
(167140 bytes, application/gzip)