briansmith / ring

Safe, fast, small crypto using Rust
Other
3.74k stars 704 forks source link

Speed up the build (i.e. Build faster) #489

Closed briansmith closed 7 years ago

briansmith commented 7 years ago

In another bug, @luser suggested that we add a way to just build the digest API, without the rest, in the name of making the build faster: “Mostly faster compiling, yeah. Rust compilation is slow enough, pulling in another large dependency just to use one small bit of it makes the problem even worse.”

Since that time, the build system has been completely rewritten, mostly through @weiznich's awesome work. One thing we did was pregenerate the assembly language code from the PerlAsm scripts, so that all the Perl steps are skipped when building from crates.io, which may help.

But, we should still try to make the build faster. This requires somebody to profile the build to find out what the bottlenecks are.

Wild guess ideas:

briansmith commented 7 years ago

More ideas:

emberian commented 7 years ago

Here's some profiles of -Z time-passes -Z time-llvm-passes.

Fresh debug build takes 15.8s:

$  cargo rustc -- -Z time-passes -Z time-llvm-passes 
   Compiling libc v0.2.21
   Compiling gcc v0.3.45
   Compiling lazy_static v0.2.6
   Compiling untrusted v0.3.2
   Compiling rand v0.3.15
   Compiling num_cpus v1.3.0
   Compiling deque v0.3.1
   Compiling rayon v0.6.0
   Compiling ring v0.7.3 (file:///home/cmr/proj/ring)
time: 0.046; rss: 56MB  parsing
time: 0.000; rss: 56MB  recursion limit
time: 0.000; rss: 56MB  crate injection
time: 0.000; rss: 56MB  plugin loading
time: 0.000; rss: 56MB  plugin registration
time: 0.078; rss: 95MB  expansion
time: 0.000; rss: 95MB  maybe building test harness
time: 0.000; rss: 95MB  maybe creating a macro crate
time: 0.000; rss: 95MB  checking for inline asm in case the target doesn't support it
time: 0.003; rss: 95MB  early lint checks
time: 0.001; rss: 95MB  AST validation
time: 0.012; rss: 98MB  name resolution
time: 0.008; rss: 98MB  complete gated feature checking
time: 0.010; rss: 104MB lowering ast -> hir
time: 0.003; rss: 104MB indexing hir
time: 0.001; rss: 104MB attribute checking
time: 0.004; rss: 107MB language item collection
time: 0.002; rss: 107MB lifetime resolution
time: 0.000; rss: 107MB looking for entry point
time: 0.000; rss: 107MB looking for plugin registrar
time: 0.003; rss: 107MB region resolution
time: 0.001; rss: 107MB loop checking
time: 0.000; rss: 107MB static item recursion checking
time: 0.031; rss: 108MB compute_incremental_hashes_map
time: 0.000; rss: 108MB load_dep_graph
time: 0.001; rss: 108MB stability index
time: 0.003; rss: 108MB stability checking
time: 0.461; rss: 124MB type collecting
time: 0.000; rss: 124MB variance inference
time: 0.000; rss: 124MB impl wf inference
time: 0.014; rss: 127MB coherence checking
time: 0.021; rss: 127MB wf checking
time: 0.045; rss: 127MB item-types checking
time: 0.324; rss: 134MB item-bodies checking
time: 0.022; rss: 136MB const checking
time: 0.005; rss: 136MB privacy checking
time: 0.002; rss: 136MB intrinsic checking
time: 0.001; rss: 136MB effect checking
time: 0.005; rss: 136MB match checking
time: 0.003; rss: 136MB liveness checking
time: 0.016; rss: 136MB rvalue checking
time: 0.037; rss: 151MB MIR dump
  time: 0.005; rss: 151MB   SimplifyCfg
  time: 0.008; rss: 151MB   QualifyAndPromoteConstants
  time: 0.012; rss: 151MB   TypeckMir
  time: 0.000; rss: 151MB   SimplifyBranches
  time: 0.002; rss: 151MB   SimplifyCfg
time: 0.028; rss: 151MB MIR cleanup and validation
time: 0.044; rss: 151MB borrow checking
time: 0.000; rss: 151MB reachability checking
time: 0.003; rss: 151MB death checking
time: 0.000; rss: 151MB unused lib feature checking
time: 0.041; rss: 151MB lint checking
time: 0.000; rss: 151MB resolving dependency formats
  time: 0.000; rss: 151MB   NoLandingPads
  time: 0.002; rss: 151MB   SimplifyCfg
  time: 0.004; rss: 151MB   EraseRegions
  time: 0.001; rss: 151MB   AddCallGuards
  time: 0.015; rss: 154MB   ElaborateDrops
  time: 0.000; rss: 154MB   NoLandingPads
  time: 0.003; rss: 154MB   SimplifyCfg
  time: 0.000; rss: 154MB   Inline
  time: 0.003; rss: 154MB   InstCombine
  time: 0.001; rss: 154MB   Deaggregator
  time: 0.000; rss: 154MB   CopyPropagation
  time: 0.003; rss: 154MB   SimplifyLocals
  time: 0.001; rss: 154MB   AddCallGuards
  time: 0.000; rss: 154MB   PreTrans
time: 0.033; rss: 154MB MIR optimisations
  time: 0.009; rss: 154MB   write metadata
  time: 0.066; rss: 156MB   translation item collection
  time: 0.013; rss: 156MB   codegen unit partitioning
  time: 0.007; rss: 176MB   internalize symbols
time: 0.532; rss: 176MB translation
time: 0.000; rss: 176MB assert dep graph
time: 0.000; rss: 176MB serialize dep graph
  time: 0.046; rss: 142MB   llvm function passes [0]
  time: 0.035; rss: 144MB   llvm module passes [0]
  time: 0.941; rss: 150MB   codegen passes [0]
  time: 0.000; rss: 149MB   codegen passes [0]
===-------------------------------------------------------------------------===
                      Instruction Selection and Scheduling
===-------------------------------------------------------------------------===
  Total Execution Time: 0.1400 seconds (0.1241 wall clock)

   ---User Time---   --System Time--   --User+System--   ---Wall Time---  --- Name ---
   0.0333 ( 37.0%)   0.0167 ( 33.3%)   0.0500 ( 35.7%)   0.0349 ( 28.2%)  Instruction Selection
   0.0067 (  7.4%)   0.0100 ( 20.0%)   0.0167 ( 11.9%)   0.0214 ( 17.2%)  Instruction Scheduling
   0.0133 ( 14.8%)   0.0033 (  6.7%)   0.0167 ( 11.9%)   0.0185 ( 14.9%)  DAG Combining 1
   0.0200 ( 22.2%)   0.0067 ( 13.3%)   0.0267 ( 19.0%)   0.0122 (  9.8%)  DAG Combining 2
   0.0100 ( 11.1%)   0.0067 ( 13.3%)   0.0167 ( 11.9%)   0.0117 (  9.4%)  Instruction Creation
   0.0033 (  3.7%)   0.0000 (  0.0%)   0.0033 (  2.4%)   0.0100 (  8.0%)  DAG Legalization
   0.0033 (  3.7%)   0.0067 ( 13.3%)   0.0100 (  7.1%)   0.0081 (  6.5%)  Type Legalization
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0032 (  2.6%)  DAG Combining after legalize types
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0021 (  1.7%)  Instruction Scheduling Cleanup
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0020 (  1.6%)  Vector Legalization
   0.0900 (100.0%)   0.0500 (100.0%)   0.1400 (100.0%)   0.1241 (100.0%)  Total

===-------------------------------------------------------------------------===
                                 DWARF Emission
===-------------------------------------------------------------------------===
  Total Execution Time: 0.1133 seconds (0.1251 wall clock)

   ---User Time---   --System Time--   --User+System--   ---Wall Time---  --- Name ---
   0.0733 ( 73.3%)   0.0067 ( 50.0%)   0.0800 ( 70.6%)   0.0846 ( 67.7%)  Debug Info Emission
   0.0233 ( 23.3%)   0.0067 ( 50.0%)   0.0300 ( 26.5%)   0.0395 ( 31.6%)  DWARF Exception Writer
   0.0033 (  3.3%)   0.0000 (  0.0%)   0.0033 (  2.9%)   0.0009 (  0.8%)  DWARF Debug Writer
   0.1000 (100.0%)   0.0133 (100.0%)   0.1133 (100.0%)   0.1251 (100.0%)  Total

===-------------------------------------------------------------------------===
                      ... Pass execution timing report ...
===-------------------------------------------------------------------------===
  Total Execution Time: 0.8700 seconds (0.8690 wall clock)

   ---User Time---   --System Time--   --User+System--   ---Wall Time---  --- Name ---
   0.2100 ( 29.7%)   0.0833 ( 51.0%)   0.2933 ( 33.7%)   0.2814 ( 32.4%)  X86 DAG->DAG Instruction Selection
   0.2267 ( 32.1%)   0.0233 ( 14.3%)   0.2500 ( 28.7%)   0.2652 ( 30.5%)  X86 Assembly / Object Emitter
   0.0533 (  7.5%)   0.0067 (  4.1%)   0.0600 (  6.9%)   0.0519 (  6.0%)  Module Verifier
   0.0367 (  5.2%)   0.0033 (  2.0%)   0.0400 (  4.6%)   0.0425 (  4.9%)  Module Verifier
   0.0367 (  5.2%)   0.0067 (  4.1%)   0.0433 (  5.0%)   0.0420 (  4.8%)  Module Verifier
   0.0233 (  3.3%)   0.0033 (  2.0%)   0.0267 (  3.1%)   0.0281 (  3.2%)  Inliner for always_inline functions
   0.0167 (  2.4%)   0.0067 (  4.1%)   0.0233 (  2.7%)   0.0259 (  3.0%)  Prologue/Epilogue Insertion & Frame Finalization
   0.0133 (  1.9%)   0.0100 (  6.1%)   0.0233 (  2.7%)   0.0223 (  2.6%)  Fast Register Allocator
   0.0100 (  1.4%)   0.0033 (  2.0%)   0.0133 (  1.5%)   0.0125 (  1.4%)  Live DEBUG_VALUE analysis
   0.0100 (  1.4%)   0.0033 (  2.0%)   0.0133 (  1.5%)   0.0121 (  1.4%)  Machine Function Analysis
   0.0067 (  0.9%)   0.0000 (  0.0%)   0.0067 (  0.8%)   0.0104 (  1.2%)  Insert stack protectors
   0.0067 (  0.9%)   0.0000 (  0.0%)   0.0067 (  0.8%)   0.0084 (  1.0%)  Two-Address instruction pass
   0.0067 (  0.9%)   0.0000 (  0.0%)   0.0067 (  0.8%)   0.0067 (  0.8%)  Dominator Tree Construction
   0.0033 (  0.5%)   0.0000 (  0.0%)   0.0033 (  0.4%)   0.0054 (  0.6%)  CallGraph Construction
   0.0067 (  0.9%)   0.0000 (  0.0%)   0.0067 (  0.8%)   0.0051 (  0.6%)  Dominator Tree Construction
   0.0100 (  1.4%)   0.0000 (  0.0%)   0.0100 (  1.1%)   0.0047 (  0.5%)  Natural Loop Information
   0.0033 (  0.5%)   0.0000 (  0.0%)   0.0033 (  0.4%)   0.0045 (  0.5%)  Dominator Tree Construction
   0.0067 (  0.9%)   0.0000 (  0.0%)   0.0067 (  0.8%)   0.0040 (  0.5%)  Scalar Evolution Analysis
   0.0067 (  0.9%)   0.0033 (  2.0%)   0.0100 (  1.1%)   0.0039 (  0.5%)  Dominator Tree Construction
   0.0033 (  0.5%)   0.0033 (  2.0%)   0.0067 (  0.8%)   0.0039 (  0.4%)  Function Alias Analysis Results
   0.0033 (  0.5%)   0.0000 (  0.0%)   0.0033 (  0.4%)   0.0038 (  0.4%)  Expand Atomic instructions
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0027 (  0.3%)  Exception handling preparation
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0024 (  0.3%)  Post-RA pseudo instruction expansion pass
   0.0033 (  0.5%)   0.0033 (  2.0%)   0.0067 (  0.8%)   0.0019 (  0.2%)  X86 pseudo instruction expansion pass
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0017 (  0.2%)  Remove unreachable blocks from the CFG
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0016 (  0.2%)  Bundle Machine CFG Edges
   0.0000 (  0.0%)   0.0033 (  2.0%)   0.0033 (  0.4%)   0.0014 (  0.2%)  Basic Alias Analysis (stateless AA impl)
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0014 (  0.2%)  Eliminate PHI nodes for register allocation
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0011 (  0.1%)  Expand ISel Pseudo-instructions
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0010 (  0.1%)  Insert XRay ops
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0008 (  0.1%)  Implement the 'patchable-function' attribute
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0008 (  0.1%)  StackMap Liveness Analysis
   0.0033 (  0.5%)   0.0000 (  0.0%)   0.0033 (  0.4%)   0.0008 (  0.1%)  Local Stack Slot Allocation
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0007 (  0.1%)  X86 FP Stackifier
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0007 (  0.1%)  Contiguously Lay Out Funclets
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0007 (  0.1%)  X86 PIC Global Base Reg Initialization
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0007 (  0.1%)  X86 WinAlloca Expander
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0007 (  0.1%)  Safe Stack instrumentation pass
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0007 (  0.1%)  Analyze Machine Code For Garbage Collection
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0007 (  0.1%)  X86 vzeroupper inserter
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0006 (  0.1%)  Shadow Stack GC Lowering
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0006 (  0.1%)  Lower Garbage Collection Instructions
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0003 (  0.0%)  Create Garbage Collector Module Metadata
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0002 (  0.0%)  Assumption Cache Tracker
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0001 (  0.0%)  Pre-ISel Intrinsic Lowering
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Rewrite Symbols
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Rewrite Symbols
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Force set function attributes
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Type-Based Alias Analysis
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Target Library Information
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Pre-ISel Intrinsic Lowering
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Type-Based Alias Analysis
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Target Transform Information
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Target Transform Information
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Target Library Information
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Target Library Information
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Scoped NoAlias Alias Analysis
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Profile summary info
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Machine Module Information
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Machine Module Information
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Create Garbage Collector Module Metadata
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Assumption Cache Tracker
   0.7067 (100.0%)   0.1633 (100.0%)   0.8700 (100.0%)   0.8690 (100.0%)  Total

time: 1.078; rss: 149MB LLVM passes
time: 0.000; rss: 149MB serialize work products
time: 0.031; rss: 131MB linking
    Finished dev [unoptimized + debuginfo] target(s) in 15.8 secs

It takes 3.3s to build just libring.rlib, and not the C code, and 9.3s to build both the rlib and the C code.

It takes 0.41 seconds to build a trivial crate that takes the SHA512 of a line of stdin, after the deps (including ring) are already built:

   Compiling f v0.1.0 (file:///home/cmr/proj/ring/t/f)
time: 0.000; rss: 48MB  parsing
time: 0.000; rss: 48MB  recursion limit
time: 0.000; rss: 48MB  crate injection
time: 0.000; rss: 48MB  plugin loading
time: 0.000; rss: 48MB  plugin registration
time: 0.022; rss: 84MB  expansion
time: 0.000; rss: 84MB  maybe building test harness
time: 0.000; rss: 84MB  maybe creating a macro crate
time: 0.000; rss: 84MB  checking for inline asm in case the target doesn't support it
time: 0.000; rss: 84MB  early lint checks
time: 0.000; rss: 84MB  AST validation
time: 0.005; rss: 84MB  name resolution
time: 0.000; rss: 84MB  complete gated feature checking
time: 0.000; rss: 84MB  lowering ast -> hir
time: 0.000; rss: 84MB  indexing hir
time: 0.000; rss: 84MB  attribute checking
time: 0.000; rss: 84MB  language item collection
time: 0.000; rss: 84MB  lifetime resolution
time: 0.000; rss: 84MB  looking for entry point
time: 0.000; rss: 84MB  looking for plugin registrar
time: 0.000; rss: 84MB  region resolution
time: 0.000; rss: 84MB  loop checking
time: 0.000; rss: 84MB  static item recursion checking
time: 0.000; rss: 87MB  compute_incremental_hashes_map
time: 0.000; rss: 87MB  load_dep_graph
time: 0.000; rss: 87MB  stability index
time: 0.000; rss: 87MB  stability checking
time: 0.000; rss: 87MB  type collecting
time: 0.000; rss: 87MB  variance inference
time: 0.000; rss: 87MB  impl wf inference
time: 0.000; rss: 87MB  coherence checking
time: 0.000; rss: 87MB  wf checking
time: 0.001; rss: 87MB  item-types checking
time: 0.011; rss: 101MB item-bodies checking
time: 0.002; rss: 101MB const checking
time: 0.000; rss: 101MB privacy checking
time: 0.000; rss: 101MB intrinsic checking
time: 0.000; rss: 101MB effect checking
time: 0.000; rss: 101MB match checking
time: 0.000; rss: 101MB liveness checking
time: 0.000; rss: 101MB rvalue checking
time: 0.000; rss: 101MB MIR dump
  time: 0.000; rss: 101MB   SimplifyCfg
  time: 0.000; rss: 101MB   QualifyAndPromoteConstants
  time: 0.000; rss: 101MB   TypeckMir
  time: 0.000; rss: 101MB   SimplifyBranches
  time: 0.000; rss: 101MB   SimplifyCfg
time: 0.001; rss: 101MB MIR cleanup and validation
time: 0.000; rss: 101MB borrow checking
time: 0.000; rss: 101MB reachability checking
time: 0.000; rss: 101MB death checking
time: 0.000; rss: 101MB unused lib feature checking
warning: unused result which must be used
 --> src/main.rs:7:5
  |
7 |     stdin().read_line(&mut s);
  |     ^^^^^^^^^^^^^^^^^^^^^^^^^^
  |
  = note: #[warn(unused_must_use)] on by default

time: 0.000; rss: 101MB lint checking
time: 0.002; rss: 101MB resolving dependency formats
  time: 0.000; rss: 101MB   NoLandingPads
  time: 0.000; rss: 101MB   SimplifyCfg
  time: 0.000; rss: 101MB   EraseRegions
  time: 0.000; rss: 101MB   AddCallGuards
  time: 0.000; rss: 101MB   ElaborateDrops
  time: 0.000; rss: 101MB   NoLandingPads
  time: 0.000; rss: 101MB   SimplifyCfg
  time: 0.000; rss: 101MB   Inline
  time: 0.000; rss: 101MB   InstCombine
  time: 0.000; rss: 101MB   Deaggregator
  time: 0.000; rss: 101MB   CopyPropagation
  time: 0.000; rss: 101MB   SimplifyLocals
  time: 0.000; rss: 101MB   AddCallGuards
  time: 0.000; rss: 101MB   PreTrans
time: 0.000; rss: 101MB MIR optimisations
  time: 0.000; rss: 101MB   write metadata
  time: 0.004; rss: 101MB   translation item collection
  time: 0.001; rss: 101MB   codegen unit partitioning
  time: 0.001; rss: 114MB   internalize symbols
time: 0.095; rss: 114MB translation
time: 0.000; rss: 114MB assert dep graph
time: 0.000; rss: 114MB serialize dep graph
  time: 0.002; rss: 114MB   llvm function passes [0]
  time: 0.001; rss: 114MB   llvm module passes [0]
  time: 0.037; rss: 118MB   codegen passes [0]
  time: 0.000; rss: 118MB   codegen passes [0]
===-------------------------------------------------------------------------===
                      Instruction Selection and Scheduling
===-------------------------------------------------------------------------===
  Total Execution Time: 0.0100 seconds (0.0045 wall clock)

   ---User Time---   --User+System--   ---Wall Time---  --- Name ---
   0.0033 ( 33.3%)   0.0033 ( 33.3%)   0.0012 ( 26.8%)  Instruction Selection
   0.0033 ( 33.3%)   0.0033 ( 33.3%)   0.0008 ( 18.6%)  Instruction Scheduling
   0.0033 ( 33.3%)   0.0033 ( 33.3%)   0.0008 ( 17.4%)  DAG Combining 1
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0005 ( 10.3%)  DAG Combining 2
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0005 ( 10.1%)  Instruction Creation
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0004 (  8.3%)  DAG Legalization
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0002 (  5.2%)  Type Legalization
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0001 (  1.8%)  Instruction Scheduling Cleanup
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  1.0%)  Vector Legalization
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.4%)  DAG Combining after legalize types
   0.0100 (100.0%)   0.0100 (100.0%)   0.0045 (100.0%)  Total

===-------------------------------------------------------------------------===
                                 DWARF Emission
===-------------------------------------------------------------------------===
  Total Execution Time: 0.0000 seconds (0.0047 wall clock)

   ---Wall Time---  --- Name ---
   0.0032 ( 68.8%)  Debug Info Emission
   0.0012 ( 26.7%)  DWARF Exception Writer
   0.0002 (  4.5%)  DWARF Debug Writer
   0.0047 (100.0%)  Total

===-------------------------------------------------------------------------===
                      ... Pass execution timing report ...
===-------------------------------------------------------------------------===
  Total Execution Time: 0.0300 seconds (0.0305 wall clock)

   ---User Time---   --System Time--   --User+System--   ---Wall Time---  --- Name ---
   0.0100 ( 37.5%)   0.0000 (  0.0%)   0.0100 ( 33.3%)   0.0102 ( 33.5%)  X86 DAG->DAG Instruction Selection
   0.0067 ( 25.0%)   0.0000 (  0.0%)   0.0067 ( 22.2%)   0.0085 ( 27.8%)  X86 Assembly / Object Emitter
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0015 (  4.8%)  Module Verifier
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0014 (  4.6%)  Module Verifier
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0011 (  3.7%)  Module Verifier
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0010 (  3.3%)  Prologue/Epilogue Insertion & Frame Finalization
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0008 (  2.6%)  Fast Register Allocator
   0.0033 ( 12.5%)   0.0033 (100.0%)   0.0067 ( 22.2%)   0.0007 (  2.3%)  Machine Function Analysis
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0006 (  2.0%)  Inliner for always_inline functions
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0005 (  1.8%)  Live DEBUG_VALUE analysis
   0.0033 ( 12.5%)   0.0000 (  0.0%)   0.0033 ( 11.1%)   0.0004 (  1.4%)  Insert stack protectors
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0003 (  1.0%)  Two-Address instruction pass
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0003 (  1.0%)  Profile summary info
   0.0033 ( 12.5%)   0.0000 (  0.0%)   0.0033 ( 11.1%)   0.0003 (  0.9%)  Dominator Tree Construction
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0002 (  0.8%)  Natural Loop Information
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0002 (  0.7%)  Function Alias Analysis Results
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0002 (  0.7%)  Dominator Tree Construction
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0002 (  0.7%)  CallGraph Construction
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0002 (  0.7%)  Scalar Evolution Analysis
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0002 (  0.6%)  Dominator Tree Construction
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0002 (  0.5%)  Exception handling preparation
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0002 (  0.5%)  Dominator Tree Construction
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0001 (  0.4%)  Remove unreachable blocks from the CFG
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0001 (  0.4%)  Post-RA pseudo instruction expansion pass
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0001 (  0.3%)  Bundle Machine CFG Edges
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0001 (  0.3%)  X86 pseudo instruction expansion pass
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0001 (  0.3%)  Basic Alias Analysis (stateless AA impl)
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0001 (  0.2%)  Eliminate PHI nodes for register allocation
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0001 (  0.2%)  Expand ISel Pseudo-instructions
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0001 (  0.2%)  Insert XRay ops
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0001 (  0.2%)  X86 FP Stackifier
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.2%)  StackMap Liveness Analysis
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.2%)  Shadow Stack GC Lowering
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.2%)  Implement the 'patchable-function' attribute
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.1%)  Analyze Machine Code For Garbage Collection
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.1%)  Local Stack Slot Allocation
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.1%)  Safe Stack instrumentation pass
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.1%)  X86 WinAlloca Expander
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.1%)  Contiguously Lay Out Funclets
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.1%)  X86 PIC Global Base Reg Initialization
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.1%)  X86 vzeroupper inserter
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.1%)  Lower Garbage Collection Instructions
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.1%)  Create Garbage Collector Module Metadata
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Assumption Cache Tracker
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Pre-ISel Intrinsic Lowering
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Target Library Information
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Rewrite Symbols
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Assumption Cache Tracker
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Type-Based Alias Analysis
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Rewrite Symbols
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Force set function attributes
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Type-Based Alias Analysis
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Target Transform Information
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Target Transform Information
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Target Library Information
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Scoped NoAlias Alias Analysis
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Pre-ISel Intrinsic Lowering
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Machine Module Information
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Machine Module Information
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Create Garbage Collector Module Metadata
   0.0267 (100.0%)   0.0033 (100.0%)   0.0300 (100.0%)   0.0305 (100.0%)  Total

time: 0.041; rss: 118MB LLVM passes
time: 0.000; rss: 118MB serialize work products
  time: 0.196; rss: 121MB   running linker
time: 0.198; rss: 121MB linking
    Finished dev [unoptimized + debuginfo] target(s) in 0.41 secs

Here's the size of the artifacts inside libring.rlib:

60K     add.o
8.0K    aes-x86_64-elf.o
40K     aes.o
8.0K    aesni-gcm-x86_64-elf.o
8.0K    aesni-x86_64-elf.o
68K     bn.o
76K     bn_test_convert.o
52K     bn_test_new.o
8.0K    bsaes-x86_64-elf.o
12K     chacha-x86_64-elf.o
56K     cmp.o
76K     constant_time_test.o
60K     convert.o
48K     cpu-intel.o
44K     crypto.o
160K    curve25519.o
64K     div.o
60K     e_aes.o
48K     ecp_nistz.o
4.0K    ecp_nistz256-x86_64-elf.o
216K    ecp_nistz256.o
60K     exponentiation.o
52K     gcd.o
68K     gcm.o
60K     generic.o
48K     gfp_p256.o
84K     gfp_p384.o
12K     ghash-x86_64-elf.o
64K     limbs.o
40K     mem.o
60K     montgomery.o
56K     montgomery_inv.o
52K     mul.o
12K     p256-x86_64-asm-elf.o
12K     poly1305-x86_64-elf.o
44K     random.o
1.7M    ring-15bf6c46f8e53abc.0.o
24K     sha256-x86_64-elf.o
24K     sha512-x86_64-elf.o
56K     shift.o
96K     sysrand.o
8.0K    vpaes-x86_64-elf.o
16K     x25519-asm-x86_64.o
48K     x25519-x86_64.o
8.0K    x86_64-mont-elf.o
12K     x86_64-mont5-elf.o
716K    ring-15bf6c46f8e53abc.0.bytecode.deflate
944K    rust.metadata.bin

In release mode, things aren't that much worse. Building ring by itself takes 15.6s to build, 21.78s with build deps. Rebuilding just libring.rlib takes just 4.64s. Building the C code takes most of the time. Here's the profile:

   Compiling ring v0.7.3 (file:///home/cmr/proj/ring)
time: 0.045; rss: 56MB  parsing
time: 0.000; rss: 56MB  recursion limit
time: 0.000; rss: 56MB  crate injection
time: 0.000; rss: 56MB  plugin loading
time: 0.000; rss: 56MB  plugin registration
time: 0.077; rss: 94MB  expansion
time: 0.000; rss: 94MB  maybe building test harness
time: 0.001; rss: 94MB  maybe creating a macro crate
time: 0.000; rss: 94MB  checking for inline asm in case the target doesn't support it
time: 0.003; rss: 94MB  early lint checks
time: 0.001; rss: 94MB  AST validation
time: 0.011; rss: 99MB  name resolution
time: 0.008; rss: 99MB  complete gated feature checking
time: 0.009; rss: 101MB lowering ast -> hir
time: 0.003; rss: 105MB indexing hir
time: 0.001; rss: 105MB attribute checking
time: 0.004; rss: 105MB language item collection
time: 0.002; rss: 105MB lifetime resolution
time: 0.000; rss: 105MB looking for entry point
time: 0.000; rss: 105MB looking for plugin registrar
time: 0.003; rss: 107MB region resolution
time: 0.001; rss: 107MB loop checking
time: 0.000; rss: 107MB static item recursion checking
time: 0.014; rss: 107MB compute_incremental_hashes_map
time: 0.000; rss: 107MB load_dep_graph
time: 0.001; rss: 107MB stability index
time: 0.003; rss: 107MB stability checking
time: 0.464; rss: 123MB type collecting
time: 0.000; rss: 123MB variance inference
time: 0.000; rss: 123MB impl wf inference
time: 0.014; rss: 126MB coherence checking
time: 0.020; rss: 126MB wf checking
time: 0.045; rss: 126MB item-types checking
time: 0.322; rss: 133MB item-bodies checking
time: 0.022; rss: 133MB const checking
time: 0.005; rss: 133MB privacy checking
time: 0.004; rss: 133MB intrinsic checking
time: 0.001; rss: 133MB effect checking
time: 0.006; rss: 133MB match checking
time: 0.003; rss: 133MB liveness checking
time: 0.016; rss: 133MB rvalue checking
time: 0.036; rss: 150MB MIR dump
  time: 0.004; rss: 150MB   SimplifyCfg
  time: 0.009; rss: 150MB   QualifyAndPromoteConstants
  time: 0.013; rss: 150MB   TypeckMir
  time: 0.000; rss: 150MB   SimplifyBranches
  time: 0.002; rss: 150MB   SimplifyCfg
time: 0.029; rss: 150MB MIR cleanup and validation
time: 0.045; rss: 152MB borrow checking
time: 0.000; rss: 152MB reachability checking
time: 0.003; rss: 152MB death checking
time: 0.000; rss: 152MB unused lib feature checking
time: 0.041; rss: 152MB lint checking
time: 0.000; rss: 152MB resolving dependency formats
  time: 0.000; rss: 152MB   NoLandingPads
  time: 0.002; rss: 152MB   SimplifyCfg
  time: 0.005; rss: 152MB   EraseRegions
  time: 0.001; rss: 152MB   AddCallGuards
  time: 0.014; rss: 152MB   ElaborateDrops
  time: 0.000; rss: 152MB   NoLandingPads
  time: 0.002; rss: 152MB   SimplifyCfg
  time: 0.000; rss: 152MB   Inline
  time: 0.002; rss: 152MB   InstCombine
  time: 0.001; rss: 152MB   Deaggregator
  time: 0.000; rss: 152MB   CopyPropagation
  time: 0.003; rss: 152MB   SimplifyLocals
  time: 0.001; rss: 152MB   AddCallGuards
  time: 0.000; rss: 152MB   PreTrans
time: 0.031; rss: 152MB MIR optimisations
  time: 0.009; rss: 154MB   write metadata
  time: 0.065; rss: 156MB   translation item collection
  time: 0.013; rss: 156MB   codegen unit partitioning
  time: 0.007; rss: 170MB   internalize symbols
time: 0.391; rss: 170MB translation
time: 0.000; rss: 170MB assert dep graph
time: 0.000; rss: 170MB serialize dep graph
  time: 0.164; rss: 136MB   llvm function passes [0]
  time: 2.155; rss: 140MB   llvm module passes [0]
  time: 0.562; rss: 143MB   codegen passes [0]
  time: 0.001; rss: 143MB   codegen passes [0]
===-------------------------------------------------------------------------===
                              Register Allocation
===-------------------------------------------------------------------------===
  Total Execution Time: 0.0167 seconds (0.0174 wall clock)

   ---User Time---   --User+System--   ---Wall Time---  --- Name ---
   0.0100 ( 60.0%)   0.0100 ( 60.0%)   0.0118 ( 67.5%)  Global Splitting
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0023 ( 13.1%)  Evict
   0.0033 ( 20.0%)   0.0033 ( 20.0%)   0.0017 (  9.7%)  Spiller
   0.0033 ( 20.0%)   0.0033 ( 20.0%)   0.0013 (  7.4%)  Local Splitting
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0004 (  2.3%)  Seed Live Regs
   0.0167 (100.0%)   0.0167 (100.0%)   0.0174 (100.0%)  Total

===-------------------------------------------------------------------------===
                      Instruction Selection and Scheduling
===-------------------------------------------------------------------------===
  Total Execution Time: 0.1567 seconds (0.1546 wall clock)

   ---User Time---   --System Time--   --User+System--   ---Wall Time---  --- Name ---
   0.0367 ( 25.0%)   0.0033 ( 33.3%)   0.0400 ( 25.5%)   0.0419 ( 27.1%)  Instruction Selection
   0.0167 ( 11.4%)   0.0000 (  0.0%)   0.0167 ( 10.6%)   0.0219 ( 14.2%)  Instruction Scheduling
   0.0200 ( 13.6%)   0.0000 (  0.0%)   0.0200 ( 12.8%)   0.0215 ( 13.9%)  DAG Combining 1
   0.0167 ( 11.4%)   0.0033 ( 33.3%)   0.0200 ( 12.8%)   0.0186 ( 12.0%)  DAG Combining 2
   0.0167 ( 11.4%)   0.0000 (  0.0%)   0.0167 ( 10.6%)   0.0138 (  8.9%)  DAG Legalization
   0.0200 ( 13.6%)   0.0033 ( 33.3%)   0.0233 ( 14.9%)   0.0125 (  8.1%)  Instruction Creation
   0.0100 (  6.8%)   0.0000 (  0.0%)   0.0100 (  6.4%)   0.0103 (  6.7%)  Type Legalization
   0.0067 (  4.5%)   0.0000 (  0.0%)   0.0067 (  4.3%)   0.0080 (  5.2%)  DAG Combining after legalize types
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0041 (  2.6%)  Vector Legalization
   0.0033 (  2.3%)   0.0000 (  0.0%)   0.0033 (  2.1%)   0.0018 (  1.2%)  Instruction Scheduling Cleanup
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0001 (  0.1%)  DAG Combining after legalize vectors
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Type Legalization 2
   0.1467 (100.0%)   0.0100 (100.0%)   0.1567 (100.0%)   0.1546 (100.0%)  Total

===-------------------------------------------------------------------------===
                                 DWARF Emission
===-------------------------------------------------------------------------===
  Total Execution Time: 0.0033 seconds (0.0011 wall clock)

   --System Time--   --User+System--   ---Wall Time---  --- Name ---
   0.0033 (100.0%)   0.0033 (100.0%)   0.0007 ( 63.9%)  DWARF Exception Writer
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0004 ( 35.4%)  Debug Info Emission
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.6%)  DWARF Debug Writer
   0.0033 (100.0%)   0.0033 (100.0%)   0.0011 (100.0%)  Total

===-------------------------------------------------------------------------===
                      ... Pass execution timing report ...
===-------------------------------------------------------------------------===
  Total Execution Time: 2.5233 seconds (2.5454 wall clock)

   ---User Time---   --System Time--   --User+System--   ---Wall Time---  --- Name ---
   0.2267 (  9.6%)   0.0167 ( 10.9%)   0.2433 (  9.6%)   0.2300 (  9.0%)  Dominator Tree Construction
   0.2000 (  8.4%)   0.0200 ( 13.0%)   0.2200 (  8.7%)   0.2087 (  8.2%)  Function Integration/Inlining
   0.1400 (  5.9%)   0.0100 (  6.5%)   0.1500 (  5.9%)   0.1373 (  5.4%)  Combine redundant instructions
   0.1267 (  5.3%)   0.0067 (  4.3%)   0.1333 (  5.3%)   0.1243 (  4.9%)  Global Value Numbering
   0.0933 (  3.9%)   0.0067 (  4.3%)   0.1000 (  4.0%)   0.0995 (  3.9%)  Induction Variable Simplification
   0.0833 (  3.5%)   0.0067 (  4.3%)   0.0900 (  3.6%)   0.0955 (  3.8%)  Combine redundant instructions
   0.1067 (  4.5%)   0.0000 (  0.0%)   0.1067 (  4.2%)   0.0951 (  3.7%)  Combine redundant instructions
   0.1067 (  4.5%)   0.0033 (  2.2%)   0.1100 (  4.4%)   0.0945 (  3.7%)  Global Value Numbering
   0.0867 (  3.7%)   0.0000 (  0.0%)   0.0867 (  3.4%)   0.0875 (  3.4%)  Combine redundant instructions
   0.0767 (  3.2%)   0.0000 (  0.0%)   0.0767 (  3.0%)   0.0860 (  3.4%)  Combine redundant instructions
   0.0600 (  2.5%)   0.0000 (  0.0%)   0.0600 (  2.4%)   0.0730 (  2.9%)  SROA
   0.0367 (  1.5%)   0.0033 (  2.2%)   0.0400 (  1.6%)   0.0613 (  2.4%)  SROA
   0.0500 (  2.1%)   0.0033 (  2.2%)   0.0533 (  2.1%)   0.0475 (  1.9%)  Dead Store Elimination
   0.0500 (  2.1%)   0.0000 (  0.0%)   0.0500 (  2.0%)   0.0459 (  1.8%)  Value Propagation
   0.0367 (  1.5%)   0.0000 (  0.0%)   0.0367 (  1.5%)   0.0457 (  1.8%)  Value Propagation
   0.0500 (  2.1%)   0.0033 (  2.2%)   0.0533 (  2.1%)   0.0439 (  1.7%)  Module Verifier
   0.0500 (  2.1%)   0.0000 (  0.0%)   0.0500 (  2.0%)   0.0432 (  1.7%)  Early CSE
   0.0267 (  1.1%)   0.0033 (  2.2%)   0.0300 (  1.2%)   0.0381 (  1.5%)  MemCpy Optimization
   0.0300 (  1.3%)   0.0000 (  0.0%)   0.0300 (  1.2%)   0.0354 (  1.4%)  Jump Threading
   0.0200 (  0.8%)   0.0067 (  4.3%)   0.0267 (  1.1%)   0.0340 (  1.3%)  Combine redundant instructions
   0.0333 (  1.4%)   0.0100 (  6.5%)   0.0433 (  1.7%)   0.0323 (  1.3%)  Combine redundant instructions
   0.0200 (  0.8%)   0.0000 (  0.0%)   0.0200 (  0.8%)   0.0318 (  1.3%)  Combine redundant instructions
   0.0267 (  1.1%)   0.0000 (  0.0%)   0.0267 (  1.1%)   0.0292 (  1.1%)  Greedy Register Allocator
   0.0367 (  1.5%)   0.0000 (  0.0%)   0.0367 (  1.5%)   0.0291 (  1.1%)  Jump Threading
   0.0200 (  0.8%)   0.0000 (  0.0%)   0.0200 (  0.8%)   0.0264 (  1.0%)  Loop Strength Reduction
   0.0200 (  0.8%)   0.0067 (  4.3%)   0.0267 (  1.1%)   0.0259 (  1.0%)  Machine Instruction Scheduler
   0.0233 (  1.0%)   0.0000 (  0.0%)   0.0233 (  0.9%)   0.0247 (  1.0%)  Loop Invariant Code Motion
   0.0167 (  0.7%)   0.0000 (  0.0%)   0.0167 (  0.7%)   0.0240 (  0.9%)  Early CSE
   0.0167 (  0.7%)   0.0033 (  2.2%)   0.0200 (  0.8%)   0.0200 (  0.8%)  Reassociate expressions
   0.0300 (  1.3%)   0.0000 (  0.0%)   0.0300 (  1.2%)   0.0162 (  0.6%)  Simplify the CFG
   0.0100 (  0.4%)   0.0033 (  2.2%)   0.0133 (  0.5%)   0.0146 (  0.6%)  Promote 'by reference' arguments to scalars
   0.0100 (  0.4%)   0.0033 (  2.2%)   0.0133 (  0.5%)   0.0141 (  0.6%)  Deduce function attributes
   0.0067 (  0.3%)   0.0033 (  2.2%)   0.0100 (  0.4%)   0.0132 (  0.5%)  SLP Vectorizer
   0.0033 (  0.1%)   0.0000 (  0.0%)   0.0033 (  0.1%)   0.0130 (  0.5%)  Simplify the CFG
   0.0067 (  0.3%)   0.0067 (  4.3%)   0.0133 (  0.5%)   0.0128 (  0.5%)  Simplify the CFG
   0.0100 (  0.4%)   0.0000 (  0.0%)   0.0100 (  0.4%)   0.0125 (  0.5%)  Loop Invariant Code Motion
   0.0067 (  0.3%)   0.0000 (  0.0%)   0.0067 (  0.3%)   0.0125 (  0.5%)  Simplify the CFG
   0.0167 (  0.7%)   0.0000 (  0.0%)   0.0167 (  0.7%)   0.0119 (  0.5%)  Live Variable Analysis
   0.0133 (  0.6%)   0.0000 (  0.0%)   0.0133 (  0.5%)   0.0118 (  0.5%)  Globals Alias Analysis
   0.0067 (  0.3%)   0.0000 (  0.0%)   0.0067 (  0.3%)   0.0116 (  0.5%)  CodeGen Prepare
   0.0133 (  0.6%)   0.0000 (  0.0%)   0.0133 (  0.5%)   0.0110 (  0.4%)  Simplify the CFG
   0.0200 (  0.8%)   0.0000 (  0.0%)   0.0200 (  0.8%)   0.0109 (  0.4%)  Simplify the CFG
   0.0100 (  0.4%)   0.0000 (  0.0%)   0.0100 (  0.4%)   0.0108 (  0.4%)  Module Verifier
   0.0033 (  0.1%)   0.0000 (  0.0%)   0.0033 (  0.1%)   0.0097 (  0.4%)  Natural Loop Information
   0.0033 (  0.1%)   0.0000 (  0.0%)   0.0033 (  0.1%)   0.0096 (  0.4%)  Sparse Conditional Constant Propagation
   0.0067 (  0.3%)   0.0000 (  0.0%)   0.0067 (  0.3%)   0.0096 (  0.4%)  Interprocedural Sparse Conditional Constant Propagation
   0.0067 (  0.3%)   0.0000 (  0.0%)   0.0067 (  0.3%)   0.0083 (  0.3%)  Bit-Tracking Dead Code Elimination
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0083 (  0.3%)  Scoped NoAlias Alias Analysis
   0.0033 (  0.1%)   0.0000 (  0.0%)   0.0033 (  0.1%)   0.0069 (  0.3%)  Natural Loop Information
   0.0100 (  0.4%)   0.0000 (  0.0%)   0.0100 (  0.4%)   0.0069 (  0.3%)  Tail Call Elimination
   0.0200 (  0.8%)   0.0000 (  0.0%)   0.0200 (  0.8%)   0.0068 (  0.3%)  Dominator Tree Construction
   0.0067 (  0.3%)   0.0000 (  0.0%)   0.0067 (  0.3%)   0.0068 (  0.3%)  Unroll loops
   0.0033 (  0.1%)   0.0000 (  0.0%)   0.0033 (  0.1%)   0.0066 (  0.3%)  Dominator Tree Construction
   0.0133 (  0.6%)   0.0000 (  0.0%)   0.0133 (  0.5%)   0.0066 (  0.3%)  Loop Invariant Code Motion
   0.0033 (  0.1%)   0.0000 (  0.0%)   0.0033 (  0.1%)   0.0064 (  0.3%)  Dominator Tree Construction
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0063 (  0.2%)  Unroll loops
   0.0033 (  0.1%)   0.0000 (  0.0%)   0.0033 (  0.1%)   0.0062 (  0.2%)  Machine Common Subexpression Elimination
   0.0067 (  0.3%)   0.0000 (  0.0%)   0.0067 (  0.3%)   0.0061 (  0.2%)  Dominator Tree Construction
   0.0100 (  0.4%)   0.0000 (  0.0%)   0.0100 (  0.4%)   0.0060 (  0.2%)  Unswitch loops
   0.0033 (  0.1%)   0.0000 (  0.0%)   0.0033 (  0.1%)   0.0058 (  0.2%)  Remove unused exception handling info
   0.0067 (  0.3%)   0.0000 (  0.0%)   0.0067 (  0.3%)   0.0057 (  0.2%)  Remove redundant instructions
   0.0033 (  0.1%)   0.0000 (  0.0%)   0.0033 (  0.1%)   0.0053 (  0.2%)  Dominator Tree Construction
   0.0067 (  0.3%)   0.0000 (  0.0%)   0.0067 (  0.3%)   0.0051 (  0.2%)  Dominator Tree Construction
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0051 (  0.2%)  Loop-Closed SSA Form Pass
   0.0033 (  0.1%)   0.0000 (  0.0%)   0.0033 (  0.1%)   0.0050 (  0.2%)  Natural Loop Information
   0.0100 (  0.4%)   0.0000 (  0.0%)   0.0100 (  0.4%)   0.0050 (  0.2%)  Prologue/Epilogue Insertion & Frame Finalization
   0.0067 (  0.3%)   0.0033 (  2.2%)   0.0100 (  0.4%)   0.0049 (  0.2%)  Aggressive Dead Code Elimination
   0.0033 (  0.1%)   0.0033 (  2.2%)   0.0067 (  0.3%)   0.0049 (  0.2%)  Machine Function Analysis
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0048 (  0.2%)  Control Flow Optimizer
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0046 (  0.2%)  Insert stack protectors
   0.0033 (  0.1%)   0.0000 (  0.0%)   0.0033 (  0.1%)   0.0046 (  0.2%)  Dead Argument Elimination
   0.0033 (  0.1%)   0.0000 (  0.0%)   0.0033 (  0.1%)   0.0044 (  0.2%)  Dominator Tree Construction
   0.0033 (  0.1%)   0.0000 (  0.0%)   0.0033 (  0.1%)   0.0043 (  0.2%)  Loop-Closed SSA Form Pass
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0042 (  0.2%)  Scalar Evolution Analysis
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0041 (  0.2%)  Dominator Tree Construction
   0.0033 (  0.1%)   0.0000 (  0.0%)   0.0033 (  0.1%)   0.0040 (  0.2%)  Simplify the CFG
   0.0033 (  0.1%)   0.0000 (  0.0%)   0.0033 (  0.1%)   0.0039 (  0.2%)  Scalar Evolution Analysis
   0.0067 (  0.3%)   0.0000 (  0.0%)   0.0067 (  0.3%)   0.0039 (  0.2%)  Loop Vectorization
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0039 (  0.2%)  Rotate Loops
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0038 (  0.2%)  Demanded bits analysis
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0038 (  0.1%)  Scalar Evolution Analysis
   0.0033 (  0.1%)   0.0000 (  0.0%)   0.0033 (  0.1%)   0.0037 (  0.1%)  Loop-Closed SSA Form Pass
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0034 (  0.1%)  Canonicalize natural loops
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0034 (  0.1%)  Scalar Evolution Analysis
   0.0000 (  0.0%)   0.0033 (  2.2%)   0.0033 (  0.1%)   0.0034 (  0.1%)  Basic Alias Analysis (stateless AA impl)
   0.0033 (  0.1%)   0.0000 (  0.0%)   0.0033 (  0.1%)   0.0032 (  0.1%)  Canonicalize natural loops
   0.0067 (  0.3%)   0.0000 (  0.0%)   0.0067 (  0.3%)   0.0032 (  0.1%)  X86 Byte/Word Instruction Fixup
   0.0067 (  0.3%)   0.0000 (  0.0%)   0.0067 (  0.3%)   0.0031 (  0.1%)  Function Alias Analysis Results
   0.0067 (  0.3%)   0.0000 (  0.0%)   0.0067 (  0.3%)   0.0031 (  0.1%)  Demanded bits analysis
   0.0033 (  0.1%)   0.0000 (  0.0%)   0.0033 (  0.1%)   0.0030 (  0.1%)  Lazy Value Information Analysis
   0.0100 (  0.4%)   0.0000 (  0.0%)   0.0100 (  0.4%)   0.0030 (  0.1%)  Two-Address instruction pass
   0.0033 (  0.1%)   0.0000 (  0.0%)   0.0033 (  0.1%)   0.0030 (  0.1%)  Dominator Tree Construction
   0.0033 (  0.1%)   0.0000 (  0.0%)   0.0033 (  0.1%)   0.0028 (  0.1%)  Canonicalize natural loops
   0.0033 (  0.1%)   0.0000 (  0.0%)   0.0033 (  0.1%)   0.0027 (  0.1%)  Dominator Tree Construction
   0.0067 (  0.3%)   0.0033 (  2.2%)   0.0100 (  0.4%)   0.0027 (  0.1%)  Lazy Value Information Analysis
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0026 (  0.1%)  Dominator Tree Construction
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0025 (  0.1%)  Function Alias Analysis Results
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0025 (  0.1%)  Basic Alias Analysis (stateless AA impl)
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0025 (  0.1%)  Function Alias Analysis Results
   0.0033 (  0.1%)   0.0000 (  0.0%)   0.0033 (  0.1%)   0.0024 (  0.1%)  Natural Loop Information
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0024 (  0.1%)  Virtual Register Rewriter
   0.0000 (  0.0%)   0.0033 (  2.2%)   0.0033 (  0.1%)   0.0024 (  0.1%)  Natural Loop Information
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0024 (  0.1%)  Function Alias Analysis Results
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0023 (  0.1%)  Recognize loop idioms
   0.0033 (  0.1%)   0.0000 (  0.0%)   0.0033 (  0.1%)   0.0023 (  0.1%)  Function Alias Analysis Results
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0022 (  0.1%)  Machine Loop Invariant Code Motion
   0.0033 (  0.1%)   0.0000 (  0.0%)   0.0033 (  0.1%)   0.0022 (  0.1%)  Function Alias Analysis Results
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0021 (  0.1%)  Function Alias Analysis Results
   0.0033 (  0.1%)   0.0000 (  0.0%)   0.0033 (  0.1%)   0.0021 (  0.1%)  Function Alias Analysis Results
   0.0033 (  0.1%)   0.0000 (  0.0%)   0.0033 (  0.1%)   0.0021 (  0.1%)  MergedLoadStoreMotion
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0020 (  0.1%)  PGOIndirectCallPromotion
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0020 (  0.1%)  Function Alias Analysis Results
   0.0000 (  0.0%)   0.0067 (  4.3%)   0.0067 (  0.3%)   0.0020 (  0.1%)  Function Alias Analysis Results
   0.0067 (  0.3%)   0.0000 (  0.0%)   0.0067 (  0.3%)   0.0019 (  0.1%)  Loop Load Elimination
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0019 (  0.1%)  Basic Alias Analysis (stateless AA impl)
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0018 (  0.1%)  Basic Alias Analysis (stateless AA impl)
   0.0033 (  0.1%)   0.0000 (  0.0%)   0.0033 (  0.1%)   0.0018 (  0.1%)  MachineDominator Tree Construction
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0018 (  0.1%)  Execution dependency fix
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0017 (  0.1%)  Basic Alias Analysis (stateless AA impl)
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0017 (  0.1%)  Speculatively execute instructions if target has divergent branches
   0.0033 (  0.1%)   0.0000 (  0.0%)   0.0033 (  0.1%)   0.0017 (  0.1%)  Global Variable Optimizer
   0.0033 (  0.1%)   0.0000 (  0.0%)   0.0033 (  0.1%)   0.0017 (  0.1%)  CallGraph Construction
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0017 (  0.1%)  Basic Alias Analysis (stateless AA impl)
   0.0033 (  0.1%)   0.0000 (  0.0%)   0.0033 (  0.1%)   0.0016 (  0.1%)  Branch Probability Analysis
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0016 (  0.1%)  Remove dead machine instructions
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0015 (  0.1%)  Loop-Closed SSA Form Pass
   0.0033 (  0.1%)   0.0000 (  0.0%)   0.0033 (  0.1%)   0.0015 (  0.1%)  Loop-Closed SSA Form Pass
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0015 (  0.1%)  Machine Block Frequency Analysis
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0015 (  0.1%)  Machine InstCombiner
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0015 (  0.1%)  Basic Alias Analysis (stateless AA impl)
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0015 (  0.1%)  Demanded bits analysis
   0.0033 (  0.1%)   0.0000 (  0.0%)   0.0033 (  0.1%)   0.0015 (  0.1%)  Basic Alias Analysis (stateless AA impl)
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0014 (  0.1%)  Loop-Closed SSA Form Pass
   0.0033 (  0.1%)   0.0000 (  0.0%)   0.0033 (  0.1%)   0.0014 (  0.1%)  Alignment from assumptions
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0014 (  0.1%)  Block Frequency Analysis
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0014 (  0.1%)  Dead Global Elimination
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0013 (  0.1%)  Eliminate PHI nodes for register allocation
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0013 (  0.1%)  Machine Block Frequency Analysis
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0013 (  0.1%)  Slot index numbering
   0.0067 (  0.3%)   0.0000 (  0.0%)   0.0067 (  0.3%)   0.0013 (  0.1%)  MachineDominator Tree Construction
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0013 (  0.1%)  Memory Dependence Analysis
   0.0033 (  0.1%)   0.0000 (  0.0%)   0.0033 (  0.1%)   0.0013 (  0.1%)  Promote Memory to Register
   0.0033 (  0.1%)   0.0000 (  0.0%)   0.0033 (  0.1%)   0.0013 (  0.1%)  CallGraph Construction
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0012 (  0.0%)  Memory Dependence Analysis
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0012 (  0.0%)  Memory Dependence Analysis
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0012 (  0.0%)  MachineDominator Tree Construction
   0.0033 (  0.1%)   0.0000 (  0.0%)   0.0033 (  0.1%)   0.0012 (  0.0%)  X86 LEA Optimize
   0.0033 (  0.1%)   0.0000 (  0.0%)   0.0033 (  0.1%)   0.0012 (  0.0%)  Dominator Tree Construction
   0.0033 (  0.1%)   0.0000 (  0.0%)   0.0033 (  0.1%)   0.0012 (  0.0%)  Machine Block Frequency Analysis
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0012 (  0.0%)  MachinePostDominator Tree Construction
   0.0033 (  0.1%)   0.0000 (  0.0%)   0.0033 (  0.1%)   0.0012 (  0.0%)  Branch Probability Analysis
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0012 (  0.0%)  MachinePostDominator Tree Construction
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0011 (  0.0%)  Constant Hoisting
   0.0033 (  0.1%)   0.0000 (  0.0%)   0.0033 (  0.1%)   0.0011 (  0.0%)  Target Library Information
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0011 (  0.0%)  Slot index numbering
   0.0033 (  0.1%)   0.0000 (  0.0%)   0.0033 (  0.1%)   0.0011 (  0.0%)  MachineDominator Tree Construction
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0011 (  0.0%)  Memory Dependence Analysis
   0.0033 (  0.1%)   0.0000 (  0.0%)   0.0033 (  0.1%)   0.0011 (  0.0%)  Delete dead loops
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0011 (  0.0%)  Machine Natural Loop Construction
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0011 (  0.0%)  Lower 'expect' Intrinsics
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0011 (  0.0%)  Machine Block Frequency Analysis
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0010 (  0.0%)  Loop Access Analysis
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0010 (  0.0%)  Float to int
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0009 (  0.0%)  MachineDominator Tree Construction
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0009 (  0.0%)  Expand Atomic instructions
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0008 (  0.0%)  Function Alias Analysis Results
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0008 (  0.0%)  Loop-Closed SSA Form Pass
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0008 (  0.0%)  Machine Natural Loop Construction
   0.0033 (  0.1%)   0.0000 (  0.0%)   0.0033 (  0.1%)   0.0007 (  0.0%)  Post-RA pseudo instruction expansion pass
   0.0067 (  0.3%)   0.0000 (  0.0%)   0.0067 (  0.3%)   0.0007 (  0.0%)  Profile summary info
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0007 (  0.0%)  Canonicalize natural loops
   0.0033 (  0.1%)   0.0000 (  0.0%)   0.0033 (  0.1%)   0.0007 (  0.0%)  Canonicalize natural loops
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0006 (  0.0%)  Machine Loop Invariant Code Motion
   0.0033 (  0.1%)   0.0000 (  0.0%)   0.0033 (  0.1%)   0.0006 (  0.0%)  Partially inline calls to library functions
   0.0033 (  0.1%)   0.0000 (  0.0%)   0.0033 (  0.1%)   0.0006 (  0.0%)  Stack Slot Coloring
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0006 (  0.0%)  Machine Natural Loop Construction
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0006 (  0.0%)  Scalar Evolution Analysis
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0006 (  0.0%)  Tail Duplication
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0005 (  0.0%)  Assumption Cache Tracker
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0005 (  0.0%)  Canonicalize natural loops
   0.0033 (  0.1%)   0.0000 (  0.0%)   0.0033 (  0.1%)   0.0005 (  0.0%)  Scalar Evolution Analysis
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0005 (  0.0%)  Remove unreachable machine basic blocks
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0005 (  0.0%)  X86 Optimize Call Frame
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0004 (  0.0%)  Canonicalize natural loops
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0004 (  0.0%)  Scalar Evolution Analysis
   0.0000 (  0.0%)   0.0033 (  2.2%)   0.0033 (  0.1%)   0.0004 (  0.0%)  Scalar Evolution Analysis
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0004 (  0.0%)  Scalar Evolution Analysis
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0004 (  0.0%)  Canonicalize natural loops
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0004 (  0.0%)  Tail Duplication
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0004 (  0.0%)  Rotate Loops
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0004 (  0.0%)  X86 pseudo instruction expansion pass
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0004 (  0.0%)  Scalar Evolution Analysis
   0.0033 (  0.1%)   0.0000 (  0.0%)   0.0033 (  0.1%)   0.0004 (  0.0%)  Debug Variable Analysis
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0003 (  0.0%)  Remove unreachable blocks from the CFG
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0003 (  0.0%)  Function Alias Analysis Results
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0003 (  0.0%)  Function Alias Analysis Results
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0003 (  0.0%)  Canonicalize natural loops
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0003 (  0.0%)  Machine Trace Metrics
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0003 (  0.0%)  Live Register Matrix
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0003 (  0.0%)  Spill Code Placement Analysis
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0003 (  0.0%)  Function Alias Analysis Results
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0003 (  0.0%)  Shrink Wrapping analysis
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0003 (  0.0%)  Function Alias Analysis Results
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0003 (  0.0%)  Basic Alias Analysis (stateless AA impl)
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0003 (  0.0%)  Bundle Machine CFG Edges
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0003 (  0.0%)  Expand ISel Pseudo-instructions
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0003 (  0.0%)  Globals Alias Analysis
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0003 (  0.0%)  X86 LEA Fixup
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0002 (  0.0%)  Live Stack Slot Analysis
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0002 (  0.0%)  X86 Fixup SetCC
   0.0033 (  0.1%)   0.0000 (  0.0%)   0.0033 (  0.1%)   0.0002 (  0.0%)  Post RA top-down list latency scheduler
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0002 (  0.0%)  Basic Alias Analysis (stateless AA impl)
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0002 (  0.0%)  Loop Access Analysis
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0002 (  0.0%)  Loop Distribition
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0002 (  0.0%)  Basic Alias Analysis (stateless AA impl)
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0002 (  0.0%)  Basic Alias Analysis (stateless AA impl)
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0002 (  0.0%)  Virtual Register Map
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0002 (  0.0%)  Optimize machine instruction PHIs
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0002 (  0.0%)  Basic Alias Analysis (stateless AA impl)
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0001 (  0.0%)  Insert XRay ops
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0001 (  0.0%)  Local Dynamic TLS Access Clean-up
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0001 (  0.0%)  Early If-Conversion
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0001 (  0.0%)  Live DEBUG_VALUE analysis
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0001 (  0.0%)  StackMap Liveness Analysis
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0001 (  0.0%)  Loop Access Analysis
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0001 (  0.0%)  Optimization Remark Emitter
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0001 (  0.0%)  Rename Disconnected Subregister Components
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0001 (  0.0%)  Contiguously Lay Out Funclets
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0001 (  0.0%)  X86 Atom pad short functions
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0001 (  0.0%)  Lazy Block Frequency Analysis
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0001 (  0.0%)  Analyze Machine Code For Garbage Collection
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0001 (  0.0%)  Implement the 'patchable-function' attribute
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0001 (  0.0%)  Lower Garbage Collection Instructions
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0001 (  0.0%)  Local Stack Slot Allocation
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0001 (  0.0%)  X86 PIC Global Base Reg Initialization
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0001 (  0.0%)  X86 WinAlloca Expander
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0001 (  0.0%)  X86 FP Stackifier
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0001 (  0.0%)  X86 vzeroupper inserter
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0001 (  0.0%)  Shadow Stack GC Lowering
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0001 (  0.0%)  Deduce function attributes in RPO
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0001 (  0.0%)  Infer set function attributes
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0001 (  0.0%)  Create Garbage Collector Module Metadata
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Target Transform Information
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Merge Duplicate Global Constants
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Strip Unused Function Prototypes
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Assumption Cache Tracker
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Pre-ISel Intrinsic Lowering
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Eliminate Available Externally Globals
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Force set function attributes
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Rewrite Symbols
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Target Library Information
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Scoped NoAlias Alias Analysis
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Type-Based Alias Analysis
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Type-Based Alias Analysis
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Type-Based Alias Analysis
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Target Transform Information
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Target Transform Information
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Target Pass Configuration
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Target Pass Configuration
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Target Library Information
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Scoped NoAlias Alias Analysis
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Pre-ISel Intrinsic Lowering
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Machine Module Information
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Machine Module Information
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Machine Branch Probability Analysis
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Machine Branch Probability Analysis
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Create Garbage Collector Module Metadata
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Assumption Cache Tracker
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  A No-Op Barrier Pass
   2.3700 (100.0%)   0.1533 (100.0%)   2.5233 (100.0%)   2.5454 (100.0%)  Total

time: 2.897; rss: 143MB LLVM passes
time: 0.000; rss: 143MB serialize work products
time: 0.011; rss: 134MB linking
    Finished release [optimized] target(s) in 4.64 secs
emberian commented 7 years ago

Note that moving out #[cfg(test)] stuff will have approximately 0 impact on build times. It would make expansion and parsing slightly faster but that's already less than 130ms of the build.

briansmith commented 7 years ago

Note that moving out #[cfg(test)] stuff will have approximately 0 impact on build times. It would make expansion and parsing slightly faster but that's already less than 130ms of the build.

I believe it will make builds faster then editing the tests, because the lib won't be recompiled at all. To me this is a major benefit since "cargo test" is the primary way I build ring.

I've done a bunch of work to remove the need for C++ at all, and to greatly reduce the amount of C code present. Unfortunately it's kind of a big project that's 75% done, hard to commit incrementally, and kind of stalled right now. But if the C/C++ stuff is what's making things slow, then this will help a lot when I can get time to finish it.

emberian commented 7 years ago

I believe it will make builds faster then editing the tests, because the lib won't be recompiled at all. To me this is a major benefit since "cargo test" is the primary way I build ring.

That's a good benefit that I didn't think of.

But if the C/C++ stuff is what's making things slow, then this will help a lot when I can get time to finish it.

I instrumented the build script to print how much time executing subcommands takes (not accounting for any parallelism or any work the build script does). Overall, it takes 4.462682 seconds. The perl takes 1.891556 seconds, the asm takes 0.3488179 seconds, and the C itself takes 2.168519 seconds, in debug mode.

In release mode, the asm takes 0.3802813 seconds, and the C takes 4.583867 seconds.

briansmith commented 7 years ago

cmr notifications@github.com wrote:

I instrumented the build script to print how much time executing subcommands takes (not accounting for any parallelism or any work the build script does). Overall, it takes 4.462682 seconds. The perl takes 1.891556 seconds, the asm takes 0.3488179 seconds, and the C itself takes 2.168519 seconds, in debug mode.

We should decide which use cases we're trying to optimize for.

I think Ted is mostly interested in the "time to build ring as a dependency of another project" wall time. In that case, the Perl step is skipped because it is precomputed.

I'm mostly interested in "cargo test --features=rsa_signing" build speed, to help people contributing code, testing, etc. to ring.

I have to say, I personally think the build speed is quite bearable.

In any case, I did an experiment last night that shows we'll soon be able to remove 10 files, including all the C++, which all adds up to about 2,800 lines of code. And I think not long after that we'll remove another ~10 C files. So the natural progression looks positive as far as build speed is concerned.

eddyb commented 7 years ago

cc @arielb1

briansmith commented 7 years ago

In efdffc91db7adb9923a6635ebdd49829db61d7cb and 0aea3d20c2d7a3ff44176544a4d24a80b1d2495e we removed 10 C/C++ source files, including all the C++ source files. It would be good to see if this made any significant difference in the build time on the systems you are measuring it on.

eddyb commented 7 years ago

Now that rust-lang/rust#41469 is merged, you should also look at time-passes.

briansmith commented 7 years ago

BTW, if you don't need RSA then in ring 0.8.0 you'll be able to with --no-default-features or --no-default-features --features=dev_urandom_fallback to avoid building some Rust code. build.rs could be changed to avoid building crypto/bn/* when the use_heap default feature isn't enabled to make that even faster.

briansmith commented 7 years ago

Is there anybody unhappy with the build time now? We've made several changes that should be improvements, though we haven't attempted to measure everything again. Without new measurements this is unactionable.

briansmith commented 7 years ago

OK, I'm going to close this now, on the assumption everything is A-OK.