Closed briansmith closed 7 years ago
More ideas:
Deleting even more code, and making the code simpler yet. This is actually the primary way in which I intend to contribute to resolving this issue.
Switch from C++ to Rust for the tests in bn_test.cc. bn_test.cc is the last C++ test suite. Presumably the C++ compiler is much slower than the C compiler, so getting rid of the C++ code should make the build faster. Note that bn_test.cc forces the existence of most (all?) the C/C++ code in crypto/test, which is several files. Though, the added Rust code may make the build even slower.
Many of the "unit" tests in ring are probably better recast as "integration" tests. This involves moving them out from inline submodules in src/ to their own separate files in tests/. Presumably, when ring is built for the purpose of being used by a dependent crate, only the sources in src/ are compiled, not the tests; the tests should only be built if/when the user ever runs "cargo test -p ring". (If that is not the case, that's probably a performance bug in cargo that should be fixed.) This should non-trivially reduce the amount of Rust code that the Rust compiler sees during a typical compilation. Also, this is something we want to do anyway because it helps ensure that the public API is properly exported.
Here's some profiles of -Z time-passes -Z time-llvm-passes
.
Fresh debug build takes 15.8s:
$ cargo rustc -- -Z time-passes -Z time-llvm-passes Compiling libc v0.2.21 Compiling gcc v0.3.45 Compiling lazy_static v0.2.6 Compiling untrusted v0.3.2 Compiling rand v0.3.15 Compiling num_cpus v1.3.0 Compiling deque v0.3.1 Compiling rayon v0.6.0 Compiling ring v0.7.3 (file:///home/cmr/proj/ring) time: 0.046; rss: 56MB parsing time: 0.000; rss: 56MB recursion limit time: 0.000; rss: 56MB crate injection time: 0.000; rss: 56MB plugin loading time: 0.000; rss: 56MB plugin registration time: 0.078; rss: 95MB expansion time: 0.000; rss: 95MB maybe building test harness time: 0.000; rss: 95MB maybe creating a macro crate time: 0.000; rss: 95MB checking for inline asm in case the target doesn't support it time: 0.003; rss: 95MB early lint checks time: 0.001; rss: 95MB AST validation time: 0.012; rss: 98MB name resolution time: 0.008; rss: 98MB complete gated feature checking time: 0.010; rss: 104MB lowering ast -> hir time: 0.003; rss: 104MB indexing hir time: 0.001; rss: 104MB attribute checking time: 0.004; rss: 107MB language item collection time: 0.002; rss: 107MB lifetime resolution time: 0.000; rss: 107MB looking for entry point time: 0.000; rss: 107MB looking for plugin registrar time: 0.003; rss: 107MB region resolution time: 0.001; rss: 107MB loop checking time: 0.000; rss: 107MB static item recursion checking time: 0.031; rss: 108MB compute_incremental_hashes_map time: 0.000; rss: 108MB load_dep_graph time: 0.001; rss: 108MB stability index time: 0.003; rss: 108MB stability checking time: 0.461; rss: 124MB type collecting time: 0.000; rss: 124MB variance inference time: 0.000; rss: 124MB impl wf inference time: 0.014; rss: 127MB coherence checking time: 0.021; rss: 127MB wf checking time: 0.045; rss: 127MB item-types checking time: 0.324; rss: 134MB item-bodies checking time: 0.022; rss: 136MB const checking time: 0.005; rss: 136MB privacy checking time: 0.002; rss: 136MB intrinsic checking time: 0.001; rss: 136MB effect checking time: 0.005; rss: 136MB match checking time: 0.003; rss: 136MB liveness checking time: 0.016; rss: 136MB rvalue checking time: 0.037; rss: 151MB MIR dump time: 0.005; rss: 151MB SimplifyCfg time: 0.008; rss: 151MB QualifyAndPromoteConstants time: 0.012; rss: 151MB TypeckMir time: 0.000; rss: 151MB SimplifyBranches time: 0.002; rss: 151MB SimplifyCfg time: 0.028; rss: 151MB MIR cleanup and validation time: 0.044; rss: 151MB borrow checking time: 0.000; rss: 151MB reachability checking time: 0.003; rss: 151MB death checking time: 0.000; rss: 151MB unused lib feature checking time: 0.041; rss: 151MB lint checking time: 0.000; rss: 151MB resolving dependency formats time: 0.000; rss: 151MB NoLandingPads time: 0.002; rss: 151MB SimplifyCfg time: 0.004; rss: 151MB EraseRegions time: 0.001; rss: 151MB AddCallGuards time: 0.015; rss: 154MB ElaborateDrops time: 0.000; rss: 154MB NoLandingPads time: 0.003; rss: 154MB SimplifyCfg time: 0.000; rss: 154MB Inline time: 0.003; rss: 154MB InstCombine time: 0.001; rss: 154MB Deaggregator time: 0.000; rss: 154MB CopyPropagation time: 0.003; rss: 154MB SimplifyLocals time: 0.001; rss: 154MB AddCallGuards time: 0.000; rss: 154MB PreTrans time: 0.033; rss: 154MB MIR optimisations time: 0.009; rss: 154MB write metadata time: 0.066; rss: 156MB translation item collection time: 0.013; rss: 156MB codegen unit partitioning time: 0.007; rss: 176MB internalize symbols time: 0.532; rss: 176MB translation time: 0.000; rss: 176MB assert dep graph time: 0.000; rss: 176MB serialize dep graph time: 0.046; rss: 142MB llvm function passes [0] time: 0.035; rss: 144MB llvm module passes [0] time: 0.941; rss: 150MB codegen passes [0] time: 0.000; rss: 149MB codegen passes [0] ===-------------------------------------------------------------------------=== Instruction Selection and Scheduling ===-------------------------------------------------------------------------=== Total Execution Time: 0.1400 seconds (0.1241 wall clock) ---User Time--- --System Time-- --User+System-- ---Wall Time--- --- Name --- 0.0333 ( 37.0%) 0.0167 ( 33.3%) 0.0500 ( 35.7%) 0.0349 ( 28.2%) Instruction Selection 0.0067 ( 7.4%) 0.0100 ( 20.0%) 0.0167 ( 11.9%) 0.0214 ( 17.2%) Instruction Scheduling 0.0133 ( 14.8%) 0.0033 ( 6.7%) 0.0167 ( 11.9%) 0.0185 ( 14.9%) DAG Combining 1 0.0200 ( 22.2%) 0.0067 ( 13.3%) 0.0267 ( 19.0%) 0.0122 ( 9.8%) DAG Combining 2 0.0100 ( 11.1%) 0.0067 ( 13.3%) 0.0167 ( 11.9%) 0.0117 ( 9.4%) Instruction Creation 0.0033 ( 3.7%) 0.0000 ( 0.0%) 0.0033 ( 2.4%) 0.0100 ( 8.0%) DAG Legalization 0.0033 ( 3.7%) 0.0067 ( 13.3%) 0.0100 ( 7.1%) 0.0081 ( 6.5%) Type Legalization 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0032 ( 2.6%) DAG Combining after legalize types 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0021 ( 1.7%) Instruction Scheduling Cleanup 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0020 ( 1.6%) Vector Legalization 0.0900 (100.0%) 0.0500 (100.0%) 0.1400 (100.0%) 0.1241 (100.0%) Total ===-------------------------------------------------------------------------=== DWARF Emission ===-------------------------------------------------------------------------=== Total Execution Time: 0.1133 seconds (0.1251 wall clock) ---User Time--- --System Time-- --User+System-- ---Wall Time--- --- Name --- 0.0733 ( 73.3%) 0.0067 ( 50.0%) 0.0800 ( 70.6%) 0.0846 ( 67.7%) Debug Info Emission 0.0233 ( 23.3%) 0.0067 ( 50.0%) 0.0300 ( 26.5%) 0.0395 ( 31.6%) DWARF Exception Writer 0.0033 ( 3.3%) 0.0000 ( 0.0%) 0.0033 ( 2.9%) 0.0009 ( 0.8%) DWARF Debug Writer 0.1000 (100.0%) 0.0133 (100.0%) 0.1133 (100.0%) 0.1251 (100.0%) Total ===-------------------------------------------------------------------------=== ... Pass execution timing report ... ===-------------------------------------------------------------------------=== Total Execution Time: 0.8700 seconds (0.8690 wall clock) ---User Time--- --System Time-- --User+System-- ---Wall Time--- --- Name --- 0.2100 ( 29.7%) 0.0833 ( 51.0%) 0.2933 ( 33.7%) 0.2814 ( 32.4%) X86 DAG->DAG Instruction Selection 0.2267 ( 32.1%) 0.0233 ( 14.3%) 0.2500 ( 28.7%) 0.2652 ( 30.5%) X86 Assembly / Object Emitter 0.0533 ( 7.5%) 0.0067 ( 4.1%) 0.0600 ( 6.9%) 0.0519 ( 6.0%) Module Verifier 0.0367 ( 5.2%) 0.0033 ( 2.0%) 0.0400 ( 4.6%) 0.0425 ( 4.9%) Module Verifier 0.0367 ( 5.2%) 0.0067 ( 4.1%) 0.0433 ( 5.0%) 0.0420 ( 4.8%) Module Verifier 0.0233 ( 3.3%) 0.0033 ( 2.0%) 0.0267 ( 3.1%) 0.0281 ( 3.2%) Inliner for always_inline functions 0.0167 ( 2.4%) 0.0067 ( 4.1%) 0.0233 ( 2.7%) 0.0259 ( 3.0%) Prologue/Epilogue Insertion & Frame Finalization 0.0133 ( 1.9%) 0.0100 ( 6.1%) 0.0233 ( 2.7%) 0.0223 ( 2.6%) Fast Register Allocator 0.0100 ( 1.4%) 0.0033 ( 2.0%) 0.0133 ( 1.5%) 0.0125 ( 1.4%) Live DEBUG_VALUE analysis 0.0100 ( 1.4%) 0.0033 ( 2.0%) 0.0133 ( 1.5%) 0.0121 ( 1.4%) Machine Function Analysis 0.0067 ( 0.9%) 0.0000 ( 0.0%) 0.0067 ( 0.8%) 0.0104 ( 1.2%) Insert stack protectors 0.0067 ( 0.9%) 0.0000 ( 0.0%) 0.0067 ( 0.8%) 0.0084 ( 1.0%) Two-Address instruction pass 0.0067 ( 0.9%) 0.0000 ( 0.0%) 0.0067 ( 0.8%) 0.0067 ( 0.8%) Dominator Tree Construction 0.0033 ( 0.5%) 0.0000 ( 0.0%) 0.0033 ( 0.4%) 0.0054 ( 0.6%) CallGraph Construction 0.0067 ( 0.9%) 0.0000 ( 0.0%) 0.0067 ( 0.8%) 0.0051 ( 0.6%) Dominator Tree Construction 0.0100 ( 1.4%) 0.0000 ( 0.0%) 0.0100 ( 1.1%) 0.0047 ( 0.5%) Natural Loop Information 0.0033 ( 0.5%) 0.0000 ( 0.0%) 0.0033 ( 0.4%) 0.0045 ( 0.5%) Dominator Tree Construction 0.0067 ( 0.9%) 0.0000 ( 0.0%) 0.0067 ( 0.8%) 0.0040 ( 0.5%) Scalar Evolution Analysis 0.0067 ( 0.9%) 0.0033 ( 2.0%) 0.0100 ( 1.1%) 0.0039 ( 0.5%) Dominator Tree Construction 0.0033 ( 0.5%) 0.0033 ( 2.0%) 0.0067 ( 0.8%) 0.0039 ( 0.4%) Function Alias Analysis Results 0.0033 ( 0.5%) 0.0000 ( 0.0%) 0.0033 ( 0.4%) 0.0038 ( 0.4%) Expand Atomic instructions 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0027 ( 0.3%) Exception handling preparation 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0024 ( 0.3%) Post-RA pseudo instruction expansion pass 0.0033 ( 0.5%) 0.0033 ( 2.0%) 0.0067 ( 0.8%) 0.0019 ( 0.2%) X86 pseudo instruction expansion pass 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0017 ( 0.2%) Remove unreachable blocks from the CFG 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0016 ( 0.2%) Bundle Machine CFG Edges 0.0000 ( 0.0%) 0.0033 ( 2.0%) 0.0033 ( 0.4%) 0.0014 ( 0.2%) Basic Alias Analysis (stateless AA impl) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0014 ( 0.2%) Eliminate PHI nodes for register allocation 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0011 ( 0.1%) Expand ISel Pseudo-instructions 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0010 ( 0.1%) Insert XRay ops 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0008 ( 0.1%) Implement the 'patchable-function' attribute 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0008 ( 0.1%) StackMap Liveness Analysis 0.0033 ( 0.5%) 0.0000 ( 0.0%) 0.0033 ( 0.4%) 0.0008 ( 0.1%) Local Stack Slot Allocation 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0007 ( 0.1%) X86 FP Stackifier 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0007 ( 0.1%) Contiguously Lay Out Funclets 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0007 ( 0.1%) X86 PIC Global Base Reg Initialization 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0007 ( 0.1%) X86 WinAlloca Expander 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0007 ( 0.1%) Safe Stack instrumentation pass 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0007 ( 0.1%) Analyze Machine Code For Garbage Collection 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0007 ( 0.1%) X86 vzeroupper inserter 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0006 ( 0.1%) Shadow Stack GC Lowering 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0006 ( 0.1%) Lower Garbage Collection Instructions 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0003 ( 0.0%) Create Garbage Collector Module Metadata 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0002 ( 0.0%) Assumption Cache Tracker 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0001 ( 0.0%) Pre-ISel Intrinsic Lowering 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Rewrite Symbols 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Rewrite Symbols 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Force set function attributes 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Type-Based Alias Analysis 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Target Library Information 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Pre-ISel Intrinsic Lowering 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Type-Based Alias Analysis 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Target Transform Information 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Target Transform Information 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Target Library Information 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Target Library Information 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Scoped NoAlias Alias Analysis 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Profile summary info 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Machine Module Information 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Machine Module Information 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Create Garbage Collector Module Metadata 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Assumption Cache Tracker 0.7067 (100.0%) 0.1633 (100.0%) 0.8700 (100.0%) 0.8690 (100.0%) Total time: 1.078; rss: 149MB LLVM passes time: 0.000; rss: 149MB serialize work products time: 0.031; rss: 131MB linking Finished dev [unoptimized + debuginfo] target(s) in 15.8 secs
It takes 3.3s to build just libring.rlib, and not the C code, and 9.3s to build both the rlib and the C code.
It takes 0.41 seconds to build a trivial crate that takes the SHA512 of a line of stdin, after the deps (including ring) are already built:
Compiling f v0.1.0 (file:///home/cmr/proj/ring/t/f) time: 0.000; rss: 48MB parsing time: 0.000; rss: 48MB recursion limit time: 0.000; rss: 48MB crate injection time: 0.000; rss: 48MB plugin loading time: 0.000; rss: 48MB plugin registration time: 0.022; rss: 84MB expansion time: 0.000; rss: 84MB maybe building test harness time: 0.000; rss: 84MB maybe creating a macro crate time: 0.000; rss: 84MB checking for inline asm in case the target doesn't support it time: 0.000; rss: 84MB early lint checks time: 0.000; rss: 84MB AST validation time: 0.005; rss: 84MB name resolution time: 0.000; rss: 84MB complete gated feature checking time: 0.000; rss: 84MB lowering ast -> hir time: 0.000; rss: 84MB indexing hir time: 0.000; rss: 84MB attribute checking time: 0.000; rss: 84MB language item collection time: 0.000; rss: 84MB lifetime resolution time: 0.000; rss: 84MB looking for entry point time: 0.000; rss: 84MB looking for plugin registrar time: 0.000; rss: 84MB region resolution time: 0.000; rss: 84MB loop checking time: 0.000; rss: 84MB static item recursion checking time: 0.000; rss: 87MB compute_incremental_hashes_map time: 0.000; rss: 87MB load_dep_graph time: 0.000; rss: 87MB stability index time: 0.000; rss: 87MB stability checking time: 0.000; rss: 87MB type collecting time: 0.000; rss: 87MB variance inference time: 0.000; rss: 87MB impl wf inference time: 0.000; rss: 87MB coherence checking time: 0.000; rss: 87MB wf checking time: 0.001; rss: 87MB item-types checking time: 0.011; rss: 101MB item-bodies checking time: 0.002; rss: 101MB const checking time: 0.000; rss: 101MB privacy checking time: 0.000; rss: 101MB intrinsic checking time: 0.000; rss: 101MB effect checking time: 0.000; rss: 101MB match checking time: 0.000; rss: 101MB liveness checking time: 0.000; rss: 101MB rvalue checking time: 0.000; rss: 101MB MIR dump time: 0.000; rss: 101MB SimplifyCfg time: 0.000; rss: 101MB QualifyAndPromoteConstants time: 0.000; rss: 101MB TypeckMir time: 0.000; rss: 101MB SimplifyBranches time: 0.000; rss: 101MB SimplifyCfg time: 0.001; rss: 101MB MIR cleanup and validation time: 0.000; rss: 101MB borrow checking time: 0.000; rss: 101MB reachability checking time: 0.000; rss: 101MB death checking time: 0.000; rss: 101MB unused lib feature checking warning: unused result which must be used --> src/main.rs:7:5 | 7 | stdin().read_line(&mut s); | ^^^^^^^^^^^^^^^^^^^^^^^^^^ | = note: #[warn(unused_must_use)] on by default time: 0.000; rss: 101MB lint checking time: 0.002; rss: 101MB resolving dependency formats time: 0.000; rss: 101MB NoLandingPads time: 0.000; rss: 101MB SimplifyCfg time: 0.000; rss: 101MB EraseRegions time: 0.000; rss: 101MB AddCallGuards time: 0.000; rss: 101MB ElaborateDrops time: 0.000; rss: 101MB NoLandingPads time: 0.000; rss: 101MB SimplifyCfg time: 0.000; rss: 101MB Inline time: 0.000; rss: 101MB InstCombine time: 0.000; rss: 101MB Deaggregator time: 0.000; rss: 101MB CopyPropagation time: 0.000; rss: 101MB SimplifyLocals time: 0.000; rss: 101MB AddCallGuards time: 0.000; rss: 101MB PreTrans time: 0.000; rss: 101MB MIR optimisations time: 0.000; rss: 101MB write metadata time: 0.004; rss: 101MB translation item collection time: 0.001; rss: 101MB codegen unit partitioning time: 0.001; rss: 114MB internalize symbols time: 0.095; rss: 114MB translation time: 0.000; rss: 114MB assert dep graph time: 0.000; rss: 114MB serialize dep graph time: 0.002; rss: 114MB llvm function passes [0] time: 0.001; rss: 114MB llvm module passes [0] time: 0.037; rss: 118MB codegen passes [0] time: 0.000; rss: 118MB codegen passes [0] ===-------------------------------------------------------------------------=== Instruction Selection and Scheduling ===-------------------------------------------------------------------------=== Total Execution Time: 0.0100 seconds (0.0045 wall clock) ---User Time--- --User+System-- ---Wall Time--- --- Name --- 0.0033 ( 33.3%) 0.0033 ( 33.3%) 0.0012 ( 26.8%) Instruction Selection 0.0033 ( 33.3%) 0.0033 ( 33.3%) 0.0008 ( 18.6%) Instruction Scheduling 0.0033 ( 33.3%) 0.0033 ( 33.3%) 0.0008 ( 17.4%) DAG Combining 1 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0005 ( 10.3%) DAG Combining 2 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0005 ( 10.1%) Instruction Creation 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0004 ( 8.3%) DAG Legalization 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0002 ( 5.2%) Type Legalization 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0001 ( 1.8%) Instruction Scheduling Cleanup 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 1.0%) Vector Legalization 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.4%) DAG Combining after legalize types 0.0100 (100.0%) 0.0100 (100.0%) 0.0045 (100.0%) Total ===-------------------------------------------------------------------------=== DWARF Emission ===-------------------------------------------------------------------------=== Total Execution Time: 0.0000 seconds (0.0047 wall clock) ---Wall Time--- --- Name --- 0.0032 ( 68.8%) Debug Info Emission 0.0012 ( 26.7%) DWARF Exception Writer 0.0002 ( 4.5%) DWARF Debug Writer 0.0047 (100.0%) Total ===-------------------------------------------------------------------------=== ... Pass execution timing report ... ===-------------------------------------------------------------------------=== Total Execution Time: 0.0300 seconds (0.0305 wall clock) ---User Time--- --System Time-- --User+System-- ---Wall Time--- --- Name --- 0.0100 ( 37.5%) 0.0000 ( 0.0%) 0.0100 ( 33.3%) 0.0102 ( 33.5%) X86 DAG->DAG Instruction Selection 0.0067 ( 25.0%) 0.0000 ( 0.0%) 0.0067 ( 22.2%) 0.0085 ( 27.8%) X86 Assembly / Object Emitter 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0015 ( 4.8%) Module Verifier 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0014 ( 4.6%) Module Verifier 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0011 ( 3.7%) Module Verifier 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0010 ( 3.3%) Prologue/Epilogue Insertion & Frame Finalization 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0008 ( 2.6%) Fast Register Allocator 0.0033 ( 12.5%) 0.0033 (100.0%) 0.0067 ( 22.2%) 0.0007 ( 2.3%) Machine Function Analysis 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0006 ( 2.0%) Inliner for always_inline functions 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0005 ( 1.8%) Live DEBUG_VALUE analysis 0.0033 ( 12.5%) 0.0000 ( 0.0%) 0.0033 ( 11.1%) 0.0004 ( 1.4%) Insert stack protectors 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0003 ( 1.0%) Two-Address instruction pass 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0003 ( 1.0%) Profile summary info 0.0033 ( 12.5%) 0.0000 ( 0.0%) 0.0033 ( 11.1%) 0.0003 ( 0.9%) Dominator Tree Construction 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0002 ( 0.8%) Natural Loop Information 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0002 ( 0.7%) Function Alias Analysis Results 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0002 ( 0.7%) Dominator Tree Construction 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0002 ( 0.7%) CallGraph Construction 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0002 ( 0.7%) Scalar Evolution Analysis 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0002 ( 0.6%) Dominator Tree Construction 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0002 ( 0.5%) Exception handling preparation 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0002 ( 0.5%) Dominator Tree Construction 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0001 ( 0.4%) Remove unreachable blocks from the CFG 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0001 ( 0.4%) Post-RA pseudo instruction expansion pass 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0001 ( 0.3%) Bundle Machine CFG Edges 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0001 ( 0.3%) X86 pseudo instruction expansion pass 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0001 ( 0.3%) Basic Alias Analysis (stateless AA impl) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0001 ( 0.2%) Eliminate PHI nodes for register allocation 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0001 ( 0.2%) Expand ISel Pseudo-instructions 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0001 ( 0.2%) Insert XRay ops 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0001 ( 0.2%) X86 FP Stackifier 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.2%) StackMap Liveness Analysis 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.2%) Shadow Stack GC Lowering 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.2%) Implement the 'patchable-function' attribute 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.1%) Analyze Machine Code For Garbage Collection 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.1%) Local Stack Slot Allocation 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.1%) Safe Stack instrumentation pass 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.1%) X86 WinAlloca Expander 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.1%) Contiguously Lay Out Funclets 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.1%) X86 PIC Global Base Reg Initialization 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.1%) X86 vzeroupper inserter 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.1%) Lower Garbage Collection Instructions 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.1%) Create Garbage Collector Module Metadata 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Assumption Cache Tracker 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Pre-ISel Intrinsic Lowering 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Target Library Information 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Rewrite Symbols 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Assumption Cache Tracker 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Type-Based Alias Analysis 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Rewrite Symbols 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Force set function attributes 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Type-Based Alias Analysis 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Target Transform Information 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Target Transform Information 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Target Library Information 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Scoped NoAlias Alias Analysis 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Pre-ISel Intrinsic Lowering 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Machine Module Information 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Machine Module Information 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Create Garbage Collector Module Metadata 0.0267 (100.0%) 0.0033 (100.0%) 0.0300 (100.0%) 0.0305 (100.0%) Total time: 0.041; rss: 118MB LLVM passes time: 0.000; rss: 118MB serialize work products time: 0.196; rss: 121MB running linker time: 0.198; rss: 121MB linking Finished dev [unoptimized + debuginfo] target(s) in 0.41 secs
Here's the size of the artifacts inside libring.rlib
:
60K add.o 8.0K aes-x86_64-elf.o 40K aes.o 8.0K aesni-gcm-x86_64-elf.o 8.0K aesni-x86_64-elf.o 68K bn.o 76K bn_test_convert.o 52K bn_test_new.o 8.0K bsaes-x86_64-elf.o 12K chacha-x86_64-elf.o 56K cmp.o 76K constant_time_test.o 60K convert.o 48K cpu-intel.o 44K crypto.o 160K curve25519.o 64K div.o 60K e_aes.o 48K ecp_nistz.o 4.0K ecp_nistz256-x86_64-elf.o 216K ecp_nistz256.o 60K exponentiation.o 52K gcd.o 68K gcm.o 60K generic.o 48K gfp_p256.o 84K gfp_p384.o 12K ghash-x86_64-elf.o 64K limbs.o 40K mem.o 60K montgomery.o 56K montgomery_inv.o 52K mul.o 12K p256-x86_64-asm-elf.o 12K poly1305-x86_64-elf.o 44K random.o 1.7M ring-15bf6c46f8e53abc.0.o 24K sha256-x86_64-elf.o 24K sha512-x86_64-elf.o 56K shift.o 96K sysrand.o 8.0K vpaes-x86_64-elf.o 16K x25519-asm-x86_64.o 48K x25519-x86_64.o 8.0K x86_64-mont-elf.o 12K x86_64-mont5-elf.o 716K ring-15bf6c46f8e53abc.0.bytecode.deflate 944K rust.metadata.bin
In release mode, things aren't that much worse. Building ring by itself takes 15.6s to build, 21.78s with build deps. Rebuilding just libring.rlib takes just 4.64s. Building the C code takes most of the time. Here's the profile:
Compiling ring v0.7.3 (file:///home/cmr/proj/ring) time: 0.045; rss: 56MB parsing time: 0.000; rss: 56MB recursion limit time: 0.000; rss: 56MB crate injection time: 0.000; rss: 56MB plugin loading time: 0.000; rss: 56MB plugin registration time: 0.077; rss: 94MB expansion time: 0.000; rss: 94MB maybe building test harness time: 0.001; rss: 94MB maybe creating a macro crate time: 0.000; rss: 94MB checking for inline asm in case the target doesn't support it time: 0.003; rss: 94MB early lint checks time: 0.001; rss: 94MB AST validation time: 0.011; rss: 99MB name resolution time: 0.008; rss: 99MB complete gated feature checking time: 0.009; rss: 101MB lowering ast -> hir time: 0.003; rss: 105MB indexing hir time: 0.001; rss: 105MB attribute checking time: 0.004; rss: 105MB language item collection time: 0.002; rss: 105MB lifetime resolution time: 0.000; rss: 105MB looking for entry point time: 0.000; rss: 105MB looking for plugin registrar time: 0.003; rss: 107MB region resolution time: 0.001; rss: 107MB loop checking time: 0.000; rss: 107MB static item recursion checking time: 0.014; rss: 107MB compute_incremental_hashes_map time: 0.000; rss: 107MB load_dep_graph time: 0.001; rss: 107MB stability index time: 0.003; rss: 107MB stability checking time: 0.464; rss: 123MB type collecting time: 0.000; rss: 123MB variance inference time: 0.000; rss: 123MB impl wf inference time: 0.014; rss: 126MB coherence checking time: 0.020; rss: 126MB wf checking time: 0.045; rss: 126MB item-types checking time: 0.322; rss: 133MB item-bodies checking time: 0.022; rss: 133MB const checking time: 0.005; rss: 133MB privacy checking time: 0.004; rss: 133MB intrinsic checking time: 0.001; rss: 133MB effect checking time: 0.006; rss: 133MB match checking time: 0.003; rss: 133MB liveness checking time: 0.016; rss: 133MB rvalue checking time: 0.036; rss: 150MB MIR dump time: 0.004; rss: 150MB SimplifyCfg time: 0.009; rss: 150MB QualifyAndPromoteConstants time: 0.013; rss: 150MB TypeckMir time: 0.000; rss: 150MB SimplifyBranches time: 0.002; rss: 150MB SimplifyCfg time: 0.029; rss: 150MB MIR cleanup and validation time: 0.045; rss: 152MB borrow checking time: 0.000; rss: 152MB reachability checking time: 0.003; rss: 152MB death checking time: 0.000; rss: 152MB unused lib feature checking time: 0.041; rss: 152MB lint checking time: 0.000; rss: 152MB resolving dependency formats time: 0.000; rss: 152MB NoLandingPads time: 0.002; rss: 152MB SimplifyCfg time: 0.005; rss: 152MB EraseRegions time: 0.001; rss: 152MB AddCallGuards time: 0.014; rss: 152MB ElaborateDrops time: 0.000; rss: 152MB NoLandingPads time: 0.002; rss: 152MB SimplifyCfg time: 0.000; rss: 152MB Inline time: 0.002; rss: 152MB InstCombine time: 0.001; rss: 152MB Deaggregator time: 0.000; rss: 152MB CopyPropagation time: 0.003; rss: 152MB SimplifyLocals time: 0.001; rss: 152MB AddCallGuards time: 0.000; rss: 152MB PreTrans time: 0.031; rss: 152MB MIR optimisations time: 0.009; rss: 154MB write metadata time: 0.065; rss: 156MB translation item collection time: 0.013; rss: 156MB codegen unit partitioning time: 0.007; rss: 170MB internalize symbols time: 0.391; rss: 170MB translation time: 0.000; rss: 170MB assert dep graph time: 0.000; rss: 170MB serialize dep graph time: 0.164; rss: 136MB llvm function passes [0] time: 2.155; rss: 140MB llvm module passes [0] time: 0.562; rss: 143MB codegen passes [0] time: 0.001; rss: 143MB codegen passes [0] ===-------------------------------------------------------------------------=== Register Allocation ===-------------------------------------------------------------------------=== Total Execution Time: 0.0167 seconds (0.0174 wall clock) ---User Time--- --User+System-- ---Wall Time--- --- Name --- 0.0100 ( 60.0%) 0.0100 ( 60.0%) 0.0118 ( 67.5%) Global Splitting 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0023 ( 13.1%) Evict 0.0033 ( 20.0%) 0.0033 ( 20.0%) 0.0017 ( 9.7%) Spiller 0.0033 ( 20.0%) 0.0033 ( 20.0%) 0.0013 ( 7.4%) Local Splitting 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0004 ( 2.3%) Seed Live Regs 0.0167 (100.0%) 0.0167 (100.0%) 0.0174 (100.0%) Total ===-------------------------------------------------------------------------=== Instruction Selection and Scheduling ===-------------------------------------------------------------------------=== Total Execution Time: 0.1567 seconds (0.1546 wall clock) ---User Time--- --System Time-- --User+System-- ---Wall Time--- --- Name --- 0.0367 ( 25.0%) 0.0033 ( 33.3%) 0.0400 ( 25.5%) 0.0419 ( 27.1%) Instruction Selection 0.0167 ( 11.4%) 0.0000 ( 0.0%) 0.0167 ( 10.6%) 0.0219 ( 14.2%) Instruction Scheduling 0.0200 ( 13.6%) 0.0000 ( 0.0%) 0.0200 ( 12.8%) 0.0215 ( 13.9%) DAG Combining 1 0.0167 ( 11.4%) 0.0033 ( 33.3%) 0.0200 ( 12.8%) 0.0186 ( 12.0%) DAG Combining 2 0.0167 ( 11.4%) 0.0000 ( 0.0%) 0.0167 ( 10.6%) 0.0138 ( 8.9%) DAG Legalization 0.0200 ( 13.6%) 0.0033 ( 33.3%) 0.0233 ( 14.9%) 0.0125 ( 8.1%) Instruction Creation 0.0100 ( 6.8%) 0.0000 ( 0.0%) 0.0100 ( 6.4%) 0.0103 ( 6.7%) Type Legalization 0.0067 ( 4.5%) 0.0000 ( 0.0%) 0.0067 ( 4.3%) 0.0080 ( 5.2%) DAG Combining after legalize types 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0041 ( 2.6%) Vector Legalization 0.0033 ( 2.3%) 0.0000 ( 0.0%) 0.0033 ( 2.1%) 0.0018 ( 1.2%) Instruction Scheduling Cleanup 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0001 ( 0.1%) DAG Combining after legalize vectors 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Type Legalization 2 0.1467 (100.0%) 0.0100 (100.0%) 0.1567 (100.0%) 0.1546 (100.0%) Total ===-------------------------------------------------------------------------=== DWARF Emission ===-------------------------------------------------------------------------=== Total Execution Time: 0.0033 seconds (0.0011 wall clock) --System Time-- --User+System-- ---Wall Time--- --- Name --- 0.0033 (100.0%) 0.0033 (100.0%) 0.0007 ( 63.9%) DWARF Exception Writer 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0004 ( 35.4%) Debug Info Emission 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.6%) DWARF Debug Writer 0.0033 (100.0%) 0.0033 (100.0%) 0.0011 (100.0%) Total ===-------------------------------------------------------------------------=== ... Pass execution timing report ... ===-------------------------------------------------------------------------=== Total Execution Time: 2.5233 seconds (2.5454 wall clock) ---User Time--- --System Time-- --User+System-- ---Wall Time--- --- Name --- 0.2267 ( 9.6%) 0.0167 ( 10.9%) 0.2433 ( 9.6%) 0.2300 ( 9.0%) Dominator Tree Construction 0.2000 ( 8.4%) 0.0200 ( 13.0%) 0.2200 ( 8.7%) 0.2087 ( 8.2%) Function Integration/Inlining 0.1400 ( 5.9%) 0.0100 ( 6.5%) 0.1500 ( 5.9%) 0.1373 ( 5.4%) Combine redundant instructions 0.1267 ( 5.3%) 0.0067 ( 4.3%) 0.1333 ( 5.3%) 0.1243 ( 4.9%) Global Value Numbering 0.0933 ( 3.9%) 0.0067 ( 4.3%) 0.1000 ( 4.0%) 0.0995 ( 3.9%) Induction Variable Simplification 0.0833 ( 3.5%) 0.0067 ( 4.3%) 0.0900 ( 3.6%) 0.0955 ( 3.8%) Combine redundant instructions 0.1067 ( 4.5%) 0.0000 ( 0.0%) 0.1067 ( 4.2%) 0.0951 ( 3.7%) Combine redundant instructions 0.1067 ( 4.5%) 0.0033 ( 2.2%) 0.1100 ( 4.4%) 0.0945 ( 3.7%) Global Value Numbering 0.0867 ( 3.7%) 0.0000 ( 0.0%) 0.0867 ( 3.4%) 0.0875 ( 3.4%) Combine redundant instructions 0.0767 ( 3.2%) 0.0000 ( 0.0%) 0.0767 ( 3.0%) 0.0860 ( 3.4%) Combine redundant instructions 0.0600 ( 2.5%) 0.0000 ( 0.0%) 0.0600 ( 2.4%) 0.0730 ( 2.9%) SROA 0.0367 ( 1.5%) 0.0033 ( 2.2%) 0.0400 ( 1.6%) 0.0613 ( 2.4%) SROA 0.0500 ( 2.1%) 0.0033 ( 2.2%) 0.0533 ( 2.1%) 0.0475 ( 1.9%) Dead Store Elimination 0.0500 ( 2.1%) 0.0000 ( 0.0%) 0.0500 ( 2.0%) 0.0459 ( 1.8%) Value Propagation 0.0367 ( 1.5%) 0.0000 ( 0.0%) 0.0367 ( 1.5%) 0.0457 ( 1.8%) Value Propagation 0.0500 ( 2.1%) 0.0033 ( 2.2%) 0.0533 ( 2.1%) 0.0439 ( 1.7%) Module Verifier 0.0500 ( 2.1%) 0.0000 ( 0.0%) 0.0500 ( 2.0%) 0.0432 ( 1.7%) Early CSE 0.0267 ( 1.1%) 0.0033 ( 2.2%) 0.0300 ( 1.2%) 0.0381 ( 1.5%) MemCpy Optimization 0.0300 ( 1.3%) 0.0000 ( 0.0%) 0.0300 ( 1.2%) 0.0354 ( 1.4%) Jump Threading 0.0200 ( 0.8%) 0.0067 ( 4.3%) 0.0267 ( 1.1%) 0.0340 ( 1.3%) Combine redundant instructions 0.0333 ( 1.4%) 0.0100 ( 6.5%) 0.0433 ( 1.7%) 0.0323 ( 1.3%) Combine redundant instructions 0.0200 ( 0.8%) 0.0000 ( 0.0%) 0.0200 ( 0.8%) 0.0318 ( 1.3%) Combine redundant instructions 0.0267 ( 1.1%) 0.0000 ( 0.0%) 0.0267 ( 1.1%) 0.0292 ( 1.1%) Greedy Register Allocator 0.0367 ( 1.5%) 0.0000 ( 0.0%) 0.0367 ( 1.5%) 0.0291 ( 1.1%) Jump Threading 0.0200 ( 0.8%) 0.0000 ( 0.0%) 0.0200 ( 0.8%) 0.0264 ( 1.0%) Loop Strength Reduction 0.0200 ( 0.8%) 0.0067 ( 4.3%) 0.0267 ( 1.1%) 0.0259 ( 1.0%) Machine Instruction Scheduler 0.0233 ( 1.0%) 0.0000 ( 0.0%) 0.0233 ( 0.9%) 0.0247 ( 1.0%) Loop Invariant Code Motion 0.0167 ( 0.7%) 0.0000 ( 0.0%) 0.0167 ( 0.7%) 0.0240 ( 0.9%) Early CSE 0.0167 ( 0.7%) 0.0033 ( 2.2%) 0.0200 ( 0.8%) 0.0200 ( 0.8%) Reassociate expressions 0.0300 ( 1.3%) 0.0000 ( 0.0%) 0.0300 ( 1.2%) 0.0162 ( 0.6%) Simplify the CFG 0.0100 ( 0.4%) 0.0033 ( 2.2%) 0.0133 ( 0.5%) 0.0146 ( 0.6%) Promote 'by reference' arguments to scalars 0.0100 ( 0.4%) 0.0033 ( 2.2%) 0.0133 ( 0.5%) 0.0141 ( 0.6%) Deduce function attributes 0.0067 ( 0.3%) 0.0033 ( 2.2%) 0.0100 ( 0.4%) 0.0132 ( 0.5%) SLP Vectorizer 0.0033 ( 0.1%) 0.0000 ( 0.0%) 0.0033 ( 0.1%) 0.0130 ( 0.5%) Simplify the CFG 0.0067 ( 0.3%) 0.0067 ( 4.3%) 0.0133 ( 0.5%) 0.0128 ( 0.5%) Simplify the CFG 0.0100 ( 0.4%) 0.0000 ( 0.0%) 0.0100 ( 0.4%) 0.0125 ( 0.5%) Loop Invariant Code Motion 0.0067 ( 0.3%) 0.0000 ( 0.0%) 0.0067 ( 0.3%) 0.0125 ( 0.5%) Simplify the CFG 0.0167 ( 0.7%) 0.0000 ( 0.0%) 0.0167 ( 0.7%) 0.0119 ( 0.5%) Live Variable Analysis 0.0133 ( 0.6%) 0.0000 ( 0.0%) 0.0133 ( 0.5%) 0.0118 ( 0.5%) Globals Alias Analysis 0.0067 ( 0.3%) 0.0000 ( 0.0%) 0.0067 ( 0.3%) 0.0116 ( 0.5%) CodeGen Prepare 0.0133 ( 0.6%) 0.0000 ( 0.0%) 0.0133 ( 0.5%) 0.0110 ( 0.4%) Simplify the CFG 0.0200 ( 0.8%) 0.0000 ( 0.0%) 0.0200 ( 0.8%) 0.0109 ( 0.4%) Simplify the CFG 0.0100 ( 0.4%) 0.0000 ( 0.0%) 0.0100 ( 0.4%) 0.0108 ( 0.4%) Module Verifier 0.0033 ( 0.1%) 0.0000 ( 0.0%) 0.0033 ( 0.1%) 0.0097 ( 0.4%) Natural Loop Information 0.0033 ( 0.1%) 0.0000 ( 0.0%) 0.0033 ( 0.1%) 0.0096 ( 0.4%) Sparse Conditional Constant Propagation 0.0067 ( 0.3%) 0.0000 ( 0.0%) 0.0067 ( 0.3%) 0.0096 ( 0.4%) Interprocedural Sparse Conditional Constant Propagation 0.0067 ( 0.3%) 0.0000 ( 0.0%) 0.0067 ( 0.3%) 0.0083 ( 0.3%) Bit-Tracking Dead Code Elimination 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0083 ( 0.3%) Scoped NoAlias Alias Analysis 0.0033 ( 0.1%) 0.0000 ( 0.0%) 0.0033 ( 0.1%) 0.0069 ( 0.3%) Natural Loop Information 0.0100 ( 0.4%) 0.0000 ( 0.0%) 0.0100 ( 0.4%) 0.0069 ( 0.3%) Tail Call Elimination 0.0200 ( 0.8%) 0.0000 ( 0.0%) 0.0200 ( 0.8%) 0.0068 ( 0.3%) Dominator Tree Construction 0.0067 ( 0.3%) 0.0000 ( 0.0%) 0.0067 ( 0.3%) 0.0068 ( 0.3%) Unroll loops 0.0033 ( 0.1%) 0.0000 ( 0.0%) 0.0033 ( 0.1%) 0.0066 ( 0.3%) Dominator Tree Construction 0.0133 ( 0.6%) 0.0000 ( 0.0%) 0.0133 ( 0.5%) 0.0066 ( 0.3%) Loop Invariant Code Motion 0.0033 ( 0.1%) 0.0000 ( 0.0%) 0.0033 ( 0.1%) 0.0064 ( 0.3%) Dominator Tree Construction 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0063 ( 0.2%) Unroll loops 0.0033 ( 0.1%) 0.0000 ( 0.0%) 0.0033 ( 0.1%) 0.0062 ( 0.2%) Machine Common Subexpression Elimination 0.0067 ( 0.3%) 0.0000 ( 0.0%) 0.0067 ( 0.3%) 0.0061 ( 0.2%) Dominator Tree Construction 0.0100 ( 0.4%) 0.0000 ( 0.0%) 0.0100 ( 0.4%) 0.0060 ( 0.2%) Unswitch loops 0.0033 ( 0.1%) 0.0000 ( 0.0%) 0.0033 ( 0.1%) 0.0058 ( 0.2%) Remove unused exception handling info 0.0067 ( 0.3%) 0.0000 ( 0.0%) 0.0067 ( 0.3%) 0.0057 ( 0.2%) Remove redundant instructions 0.0033 ( 0.1%) 0.0000 ( 0.0%) 0.0033 ( 0.1%) 0.0053 ( 0.2%) Dominator Tree Construction 0.0067 ( 0.3%) 0.0000 ( 0.0%) 0.0067 ( 0.3%) 0.0051 ( 0.2%) Dominator Tree Construction 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0051 ( 0.2%) Loop-Closed SSA Form Pass 0.0033 ( 0.1%) 0.0000 ( 0.0%) 0.0033 ( 0.1%) 0.0050 ( 0.2%) Natural Loop Information 0.0100 ( 0.4%) 0.0000 ( 0.0%) 0.0100 ( 0.4%) 0.0050 ( 0.2%) Prologue/Epilogue Insertion & Frame Finalization 0.0067 ( 0.3%) 0.0033 ( 2.2%) 0.0100 ( 0.4%) 0.0049 ( 0.2%) Aggressive Dead Code Elimination 0.0033 ( 0.1%) 0.0033 ( 2.2%) 0.0067 ( 0.3%) 0.0049 ( 0.2%) Machine Function Analysis 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0048 ( 0.2%) Control Flow Optimizer 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0046 ( 0.2%) Insert stack protectors 0.0033 ( 0.1%) 0.0000 ( 0.0%) 0.0033 ( 0.1%) 0.0046 ( 0.2%) Dead Argument Elimination 0.0033 ( 0.1%) 0.0000 ( 0.0%) 0.0033 ( 0.1%) 0.0044 ( 0.2%) Dominator Tree Construction 0.0033 ( 0.1%) 0.0000 ( 0.0%) 0.0033 ( 0.1%) 0.0043 ( 0.2%) Loop-Closed SSA Form Pass 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0042 ( 0.2%) Scalar Evolution Analysis 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0041 ( 0.2%) Dominator Tree Construction 0.0033 ( 0.1%) 0.0000 ( 0.0%) 0.0033 ( 0.1%) 0.0040 ( 0.2%) Simplify the CFG 0.0033 ( 0.1%) 0.0000 ( 0.0%) 0.0033 ( 0.1%) 0.0039 ( 0.2%) Scalar Evolution Analysis 0.0067 ( 0.3%) 0.0000 ( 0.0%) 0.0067 ( 0.3%) 0.0039 ( 0.2%) Loop Vectorization 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0039 ( 0.2%) Rotate Loops 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0038 ( 0.2%) Demanded bits analysis 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0038 ( 0.1%) Scalar Evolution Analysis 0.0033 ( 0.1%) 0.0000 ( 0.0%) 0.0033 ( 0.1%) 0.0037 ( 0.1%) Loop-Closed SSA Form Pass 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0034 ( 0.1%) Canonicalize natural loops 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0034 ( 0.1%) Scalar Evolution Analysis 0.0000 ( 0.0%) 0.0033 ( 2.2%) 0.0033 ( 0.1%) 0.0034 ( 0.1%) Basic Alias Analysis (stateless AA impl) 0.0033 ( 0.1%) 0.0000 ( 0.0%) 0.0033 ( 0.1%) 0.0032 ( 0.1%) Canonicalize natural loops 0.0067 ( 0.3%) 0.0000 ( 0.0%) 0.0067 ( 0.3%) 0.0032 ( 0.1%) X86 Byte/Word Instruction Fixup 0.0067 ( 0.3%) 0.0000 ( 0.0%) 0.0067 ( 0.3%) 0.0031 ( 0.1%) Function Alias Analysis Results 0.0067 ( 0.3%) 0.0000 ( 0.0%) 0.0067 ( 0.3%) 0.0031 ( 0.1%) Demanded bits analysis 0.0033 ( 0.1%) 0.0000 ( 0.0%) 0.0033 ( 0.1%) 0.0030 ( 0.1%) Lazy Value Information Analysis 0.0100 ( 0.4%) 0.0000 ( 0.0%) 0.0100 ( 0.4%) 0.0030 ( 0.1%) Two-Address instruction pass 0.0033 ( 0.1%) 0.0000 ( 0.0%) 0.0033 ( 0.1%) 0.0030 ( 0.1%) Dominator Tree Construction 0.0033 ( 0.1%) 0.0000 ( 0.0%) 0.0033 ( 0.1%) 0.0028 ( 0.1%) Canonicalize natural loops 0.0033 ( 0.1%) 0.0000 ( 0.0%) 0.0033 ( 0.1%) 0.0027 ( 0.1%) Dominator Tree Construction 0.0067 ( 0.3%) 0.0033 ( 2.2%) 0.0100 ( 0.4%) 0.0027 ( 0.1%) Lazy Value Information Analysis 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0026 ( 0.1%) Dominator Tree Construction 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0025 ( 0.1%) Function Alias Analysis Results 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0025 ( 0.1%) Basic Alias Analysis (stateless AA impl) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0025 ( 0.1%) Function Alias Analysis Results 0.0033 ( 0.1%) 0.0000 ( 0.0%) 0.0033 ( 0.1%) 0.0024 ( 0.1%) Natural Loop Information 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0024 ( 0.1%) Virtual Register Rewriter 0.0000 ( 0.0%) 0.0033 ( 2.2%) 0.0033 ( 0.1%) 0.0024 ( 0.1%) Natural Loop Information 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0024 ( 0.1%) Function Alias Analysis Results 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0023 ( 0.1%) Recognize loop idioms 0.0033 ( 0.1%) 0.0000 ( 0.0%) 0.0033 ( 0.1%) 0.0023 ( 0.1%) Function Alias Analysis Results 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0022 ( 0.1%) Machine Loop Invariant Code Motion 0.0033 ( 0.1%) 0.0000 ( 0.0%) 0.0033 ( 0.1%) 0.0022 ( 0.1%) Function Alias Analysis Results 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0021 ( 0.1%) Function Alias Analysis Results 0.0033 ( 0.1%) 0.0000 ( 0.0%) 0.0033 ( 0.1%) 0.0021 ( 0.1%) Function Alias Analysis Results 0.0033 ( 0.1%) 0.0000 ( 0.0%) 0.0033 ( 0.1%) 0.0021 ( 0.1%) MergedLoadStoreMotion 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0020 ( 0.1%) PGOIndirectCallPromotion 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0020 ( 0.1%) Function Alias Analysis Results 0.0000 ( 0.0%) 0.0067 ( 4.3%) 0.0067 ( 0.3%) 0.0020 ( 0.1%) Function Alias Analysis Results 0.0067 ( 0.3%) 0.0000 ( 0.0%) 0.0067 ( 0.3%) 0.0019 ( 0.1%) Loop Load Elimination 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0019 ( 0.1%) Basic Alias Analysis (stateless AA impl) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0018 ( 0.1%) Basic Alias Analysis (stateless AA impl) 0.0033 ( 0.1%) 0.0000 ( 0.0%) 0.0033 ( 0.1%) 0.0018 ( 0.1%) MachineDominator Tree Construction 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0018 ( 0.1%) Execution dependency fix 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0017 ( 0.1%) Basic Alias Analysis (stateless AA impl) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0017 ( 0.1%) Speculatively execute instructions if target has divergent branches 0.0033 ( 0.1%) 0.0000 ( 0.0%) 0.0033 ( 0.1%) 0.0017 ( 0.1%) Global Variable Optimizer 0.0033 ( 0.1%) 0.0000 ( 0.0%) 0.0033 ( 0.1%) 0.0017 ( 0.1%) CallGraph Construction 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0017 ( 0.1%) Basic Alias Analysis (stateless AA impl) 0.0033 ( 0.1%) 0.0000 ( 0.0%) 0.0033 ( 0.1%) 0.0016 ( 0.1%) Branch Probability Analysis 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0016 ( 0.1%) Remove dead machine instructions 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0015 ( 0.1%) Loop-Closed SSA Form Pass 0.0033 ( 0.1%) 0.0000 ( 0.0%) 0.0033 ( 0.1%) 0.0015 ( 0.1%) Loop-Closed SSA Form Pass 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0015 ( 0.1%) Machine Block Frequency Analysis 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0015 ( 0.1%) Machine InstCombiner 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0015 ( 0.1%) Basic Alias Analysis (stateless AA impl) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0015 ( 0.1%) Demanded bits analysis 0.0033 ( 0.1%) 0.0000 ( 0.0%) 0.0033 ( 0.1%) 0.0015 ( 0.1%) Basic Alias Analysis (stateless AA impl) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0014 ( 0.1%) Loop-Closed SSA Form Pass 0.0033 ( 0.1%) 0.0000 ( 0.0%) 0.0033 ( 0.1%) 0.0014 ( 0.1%) Alignment from assumptions 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0014 ( 0.1%) Block Frequency Analysis 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0014 ( 0.1%) Dead Global Elimination 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0013 ( 0.1%) Eliminate PHI nodes for register allocation 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0013 ( 0.1%) Machine Block Frequency Analysis 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0013 ( 0.1%) Slot index numbering 0.0067 ( 0.3%) 0.0000 ( 0.0%) 0.0067 ( 0.3%) 0.0013 ( 0.1%) MachineDominator Tree Construction 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0013 ( 0.1%) Memory Dependence Analysis 0.0033 ( 0.1%) 0.0000 ( 0.0%) 0.0033 ( 0.1%) 0.0013 ( 0.1%) Promote Memory to Register 0.0033 ( 0.1%) 0.0000 ( 0.0%) 0.0033 ( 0.1%) 0.0013 ( 0.1%) CallGraph Construction 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0012 ( 0.0%) Memory Dependence Analysis 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0012 ( 0.0%) Memory Dependence Analysis 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0012 ( 0.0%) MachineDominator Tree Construction 0.0033 ( 0.1%) 0.0000 ( 0.0%) 0.0033 ( 0.1%) 0.0012 ( 0.0%) X86 LEA Optimize 0.0033 ( 0.1%) 0.0000 ( 0.0%) 0.0033 ( 0.1%) 0.0012 ( 0.0%) Dominator Tree Construction 0.0033 ( 0.1%) 0.0000 ( 0.0%) 0.0033 ( 0.1%) 0.0012 ( 0.0%) Machine Block Frequency Analysis 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0012 ( 0.0%) MachinePostDominator Tree Construction 0.0033 ( 0.1%) 0.0000 ( 0.0%) 0.0033 ( 0.1%) 0.0012 ( 0.0%) Branch Probability Analysis 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0012 ( 0.0%) MachinePostDominator Tree Construction 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0011 ( 0.0%) Constant Hoisting 0.0033 ( 0.1%) 0.0000 ( 0.0%) 0.0033 ( 0.1%) 0.0011 ( 0.0%) Target Library Information 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0011 ( 0.0%) Slot index numbering 0.0033 ( 0.1%) 0.0000 ( 0.0%) 0.0033 ( 0.1%) 0.0011 ( 0.0%) MachineDominator Tree Construction 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0011 ( 0.0%) Memory Dependence Analysis 0.0033 ( 0.1%) 0.0000 ( 0.0%) 0.0033 ( 0.1%) 0.0011 ( 0.0%) Delete dead loops 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0011 ( 0.0%) Machine Natural Loop Construction 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0011 ( 0.0%) Lower 'expect' Intrinsics 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0011 ( 0.0%) Machine Block Frequency Analysis 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0010 ( 0.0%) Loop Access Analysis 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0010 ( 0.0%) Float to int 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0009 ( 0.0%) MachineDominator Tree Construction 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0009 ( 0.0%) Expand Atomic instructions 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0008 ( 0.0%) Function Alias Analysis Results 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0008 ( 0.0%) Loop-Closed SSA Form Pass 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0008 ( 0.0%) Machine Natural Loop Construction 0.0033 ( 0.1%) 0.0000 ( 0.0%) 0.0033 ( 0.1%) 0.0007 ( 0.0%) Post-RA pseudo instruction expansion pass 0.0067 ( 0.3%) 0.0000 ( 0.0%) 0.0067 ( 0.3%) 0.0007 ( 0.0%) Profile summary info 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0007 ( 0.0%) Canonicalize natural loops 0.0033 ( 0.1%) 0.0000 ( 0.0%) 0.0033 ( 0.1%) 0.0007 ( 0.0%) Canonicalize natural loops 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0006 ( 0.0%) Machine Loop Invariant Code Motion 0.0033 ( 0.1%) 0.0000 ( 0.0%) 0.0033 ( 0.1%) 0.0006 ( 0.0%) Partially inline calls to library functions 0.0033 ( 0.1%) 0.0000 ( 0.0%) 0.0033 ( 0.1%) 0.0006 ( 0.0%) Stack Slot Coloring 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0006 ( 0.0%) Machine Natural Loop Construction 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0006 ( 0.0%) Scalar Evolution Analysis 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0006 ( 0.0%) Tail Duplication 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0005 ( 0.0%) Assumption Cache Tracker 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0005 ( 0.0%) Canonicalize natural loops 0.0033 ( 0.1%) 0.0000 ( 0.0%) 0.0033 ( 0.1%) 0.0005 ( 0.0%) Scalar Evolution Analysis 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0005 ( 0.0%) Remove unreachable machine basic blocks 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0005 ( 0.0%) X86 Optimize Call Frame 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0004 ( 0.0%) Canonicalize natural loops 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0004 ( 0.0%) Scalar Evolution Analysis 0.0000 ( 0.0%) 0.0033 ( 2.2%) 0.0033 ( 0.1%) 0.0004 ( 0.0%) Scalar Evolution Analysis 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0004 ( 0.0%) Scalar Evolution Analysis 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0004 ( 0.0%) Canonicalize natural loops 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0004 ( 0.0%) Tail Duplication 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0004 ( 0.0%) Rotate Loops 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0004 ( 0.0%) X86 pseudo instruction expansion pass 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0004 ( 0.0%) Scalar Evolution Analysis 0.0033 ( 0.1%) 0.0000 ( 0.0%) 0.0033 ( 0.1%) 0.0004 ( 0.0%) Debug Variable Analysis 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0003 ( 0.0%) Remove unreachable blocks from the CFG 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0003 ( 0.0%) Function Alias Analysis Results 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0003 ( 0.0%) Function Alias Analysis Results 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0003 ( 0.0%) Canonicalize natural loops 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0003 ( 0.0%) Machine Trace Metrics 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0003 ( 0.0%) Live Register Matrix 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0003 ( 0.0%) Spill Code Placement Analysis 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0003 ( 0.0%) Function Alias Analysis Results 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0003 ( 0.0%) Shrink Wrapping analysis 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0003 ( 0.0%) Function Alias Analysis Results 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0003 ( 0.0%) Basic Alias Analysis (stateless AA impl) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0003 ( 0.0%) Bundle Machine CFG Edges 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0003 ( 0.0%) Expand ISel Pseudo-instructions 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0003 ( 0.0%) Globals Alias Analysis 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0003 ( 0.0%) X86 LEA Fixup 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0002 ( 0.0%) Live Stack Slot Analysis 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0002 ( 0.0%) X86 Fixup SetCC 0.0033 ( 0.1%) 0.0000 ( 0.0%) 0.0033 ( 0.1%) 0.0002 ( 0.0%) Post RA top-down list latency scheduler 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0002 ( 0.0%) Basic Alias Analysis (stateless AA impl) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0002 ( 0.0%) Loop Access Analysis 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0002 ( 0.0%) Loop Distribition 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0002 ( 0.0%) Basic Alias Analysis (stateless AA impl) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0002 ( 0.0%) Basic Alias Analysis (stateless AA impl) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0002 ( 0.0%) Virtual Register Map 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0002 ( 0.0%) Optimize machine instruction PHIs 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0002 ( 0.0%) Basic Alias Analysis (stateless AA impl) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0001 ( 0.0%) Insert XRay ops 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0001 ( 0.0%) Local Dynamic TLS Access Clean-up 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0001 ( 0.0%) Early If-Conversion 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0001 ( 0.0%) Live DEBUG_VALUE analysis 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0001 ( 0.0%) StackMap Liveness Analysis 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0001 ( 0.0%) Loop Access Analysis 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0001 ( 0.0%) Optimization Remark Emitter 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0001 ( 0.0%) Rename Disconnected Subregister Components 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0001 ( 0.0%) Contiguously Lay Out Funclets 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0001 ( 0.0%) X86 Atom pad short functions 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0001 ( 0.0%) Lazy Block Frequency Analysis 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0001 ( 0.0%) Analyze Machine Code For Garbage Collection 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0001 ( 0.0%) Implement the 'patchable-function' attribute 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0001 ( 0.0%) Lower Garbage Collection Instructions 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0001 ( 0.0%) Local Stack Slot Allocation 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0001 ( 0.0%) X86 PIC Global Base Reg Initialization 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0001 ( 0.0%) X86 WinAlloca Expander 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0001 ( 0.0%) X86 FP Stackifier 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0001 ( 0.0%) X86 vzeroupper inserter 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0001 ( 0.0%) Shadow Stack GC Lowering 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0001 ( 0.0%) Deduce function attributes in RPO 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0001 ( 0.0%) Infer set function attributes 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0001 ( 0.0%) Create Garbage Collector Module Metadata 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Target Transform Information 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Merge Duplicate Global Constants 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Strip Unused Function Prototypes 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Assumption Cache Tracker 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Pre-ISel Intrinsic Lowering 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Eliminate Available Externally Globals 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Force set function attributes 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Rewrite Symbols 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Target Library Information 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Scoped NoAlias Alias Analysis 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Type-Based Alias Analysis 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Type-Based Alias Analysis 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Type-Based Alias Analysis 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Target Transform Information 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Target Transform Information 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Target Pass Configuration 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Target Pass Configuration 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Target Library Information 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Scoped NoAlias Alias Analysis 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Pre-ISel Intrinsic Lowering 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Machine Module Information 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Machine Module Information 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Machine Branch Probability Analysis 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Machine Branch Probability Analysis 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Create Garbage Collector Module Metadata 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Assumption Cache Tracker 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) A No-Op Barrier Pass 2.3700 (100.0%) 0.1533 (100.0%) 2.5233 (100.0%) 2.5454 (100.0%) Total time: 2.897; rss: 143MB LLVM passes time: 0.000; rss: 143MB serialize work products time: 0.011; rss: 134MB linking Finished release [optimized] target(s) in 4.64 secs
Note that moving out #[cfg(test)]
stuff will have approximately 0 impact on build times. It would make expansion
and parsing
slightly faster but that's already less than 130ms of the build.
Note that moving out #[cfg(test)] stuff will have approximately 0 impact on build times. It would make expansion and parsing slightly faster but that's already less than 130ms of the build.
I believe it will make builds faster then editing the tests, because the lib won't be recompiled at all. To me this is a major benefit since "cargo test" is the primary way I build ring.
I've done a bunch of work to remove the need for C++ at all, and to greatly reduce the amount of C code present. Unfortunately it's kind of a big project that's 75% done, hard to commit incrementally, and kind of stalled right now. But if the C/C++ stuff is what's making things slow, then this will help a lot when I can get time to finish it.
I believe it will make builds faster then editing the tests, because the lib won't be recompiled at all. To me this is a major benefit since "cargo test" is the primary way I build ring.
That's a good benefit that I didn't think of.
But if the C/C++ stuff is what's making things slow, then this will help a lot when I can get time to finish it.
I instrumented the build script to print how much time executing subcommands takes (not accounting for any parallelism or any work the build script does). Overall, it takes 4.462682 seconds. The perl takes 1.891556 seconds, the asm takes 0.3488179 seconds, and the C itself takes 2.168519 seconds, in debug mode.
In release mode, the asm takes 0.3802813 seconds, and the C takes 4.583867 seconds.
cmr notifications@github.com wrote:
I instrumented the build script to print how much time executing subcommands takes (not accounting for any parallelism or any work the build script does). Overall, it takes 4.462682 seconds. The perl takes 1.891556 seconds, the asm takes 0.3488179 seconds, and the C itself takes 2.168519 seconds, in debug mode.
We should decide which use cases we're trying to optimize for.
I think Ted is mostly interested in the "time to build ring as a dependency of another project" wall time. In that case, the Perl step is skipped because it is precomputed.
I'm mostly interested in "cargo test --features=rsa_signing" build speed, to help people contributing code, testing, etc. to ring.
I have to say, I personally think the build speed is quite bearable.
In any case, I did an experiment last night that shows we'll soon be able to remove 10 files, including all the C++, which all adds up to about 2,800 lines of code. And I think not long after that we'll remove another ~10 C files. So the natural progression looks positive as far as build speed is concerned.
cc @arielb1
In efdffc91db7adb9923a6635ebdd49829db61d7cb and 0aea3d20c2d7a3ff44176544a4d24a80b1d2495e we removed 10 C/C++ source files, including all the C++ source files. It would be good to see if this made any significant difference in the build time on the systems you are measuring it on.
Now that rust-lang/rust#41469 is merged, you should also look at time-passes.
BTW, if you don't need RSA then in ring 0.8.0 you'll be able to with --no-default-features
or --no-default-features --features=dev_urandom_fallback
to avoid building some Rust code. build.rs could be changed to avoid building crypto/bn/* when the use_heap
default feature isn't enabled to make that even faster.
Is there anybody unhappy with the build time now? We've made several changes that should be improvements, though we haven't attempted to measure everything again. Without new measurements this is unactionable.
OK, I'm going to close this now, on the assumption everything is A-OK.
In another bug, @luser suggested that we add a way to just build the digest API, without the rest, in the name of making the build faster: “Mostly faster compiling, yeah. Rust compilation is slow enough, pulling in another large dependency just to use one small bit of it makes the problem even worse.”
Since that time, the build system has been completely rewritten, mostly through @weiznich's awesome work. One thing we did was pregenerate the assembly language code from the PerlAsm scripts, so that all the Perl steps are skipped when building from crates.io, which may help.
But, we should still try to make the build faster. This requires somebody to profile the build to find out what the bottlenecks are.
Wild guess ideas:
While we did spend significant effort ensuring the build is parallelized, we didn't make everything perfect in that respect. There is probably some low-hanging fruit regarding parallelism there.
When we're not building from .Git (when ".git" doesn't exist), maybe we should just avoid dirty checking and just go straight to compiling everything unconditionally. Presumably, Cargo does its own dirty checking to ensure each library is only built once, so our own dirty checking is superfluous in that case as our build script would only be run when all files are dirty. (Dirty checking would still be essential for Git builds, of course.)
Now there is a ring-test library that contains some test code, which is built in addition to the ring-core library which contains the C & asm code for the library proper. The ring-test library ideally shouldn't even be linked into the ring library at all. If we changed constant_time_test and bn_test to be "integration" tests instead of "unit" tests then we could use
#[link]
inside the integration test files to link libring-test.a only to the integration tests, and not to everything else. If this were done, then dependent crates that build ring from crates.io wouldn't need to link libring-test.a. Presumably, we would use the presence or absence of ".git" in build.rs to determine whether or not to build libring-test.a. However, I'm not sure that this is a good idea because users can do "cargo test -p ring" from within a dependent crate to run the ring test suite, and this would break. Probably we need to wait for https://github.com/rust-lang/cargo/issues/1581 instead.