Quuxplusone / LLVMBugzillaTest

0 stars 0 forks source link

Compiling interp.i takes 54 seconds: 78% time spent in Simple Register Coalescing #4742

Closed Quuxplusone closed 14 years ago

Quuxplusone commented 15 years ago
Bugzilla Link PR4252
Status RESOLVED FIXED
Importance P normal
Reported by Török Edwin (edwin+bugs@etorok.eu)
Reported on 2009-05-23 07:42:21 -0700
Last modified on 2009-12-07 07:17:32 -0800
Version unspecified
Hardware PC Linux
CC anton@korobeynikov.info, devang.patel@gmail.com, evan.cheng@apple.com, llvm-bugs@lists.llvm.org, nicholas@mxc.ca, nlewycky@google.com, stoklund@2pi.dk
Fixed by commit(s)
Attachments interp.bc (434324 bytes, application/octet-stream)
interp.i (86998 bytes, application/octet-stream)
Blocks
Blocked by
See also
Created attachment 3017
llc interp.bc

Compiling attached file takes more than 54 seconds with clang -O.
Compiling same file with gcc takes 0.002s.

You can reproduce by:
$ llc -time-passes interp.bc

Or by running clang -O:
$ clang interp.i -c -O -ftime-report
===-------------------------------------------------------------------------===
                      Instruction Selection and Scheduling
===-------------------------------------------------------------------------===
  Total Execution Time: 0.2400 seconds (0.2590 wall clock)

   ---User Time---   --System Time--   --User+System--   ---Wall Time---  --- Name ---
   0.0480 ( 21.4%)   0.0040 ( 25.0%)   0.0520 ( 21.6%)   0.0644 ( 24.8%)  Instruction Scheduling
   0.0400 ( 17.8%)   0.0000 (  0.0%)   0.0400 ( 16.6%)   0.0458 ( 17.7%)  DAG Legalization
   0.0480 ( 21.4%)   0.0040 ( 24.9%)   0.0520 ( 21.6%)   0.0443 ( 17.1%)  Type Legalization
   0.0200 (  8.9%)   0.0040 ( 25.0%)   0.0240 ( 10.0%)   0.0320 ( 12.3%)  Instruction Selection
   0.0360 ( 16.0%)   0.0040 ( 24.9%)   0.0400 ( 16.6%)   0.0298 ( 11.5%)  Instruction Creation
   0.0120 (  5.3%)   0.0000 (  0.0%)   0.0120 (  5.0%)   0.0199 (  7.7%)  DAG Combining 1
   0.0120 (  5.3%)   0.0000 (  0.0%)   0.0120 (  4.9%)   0.0137 (  5.3%)  DAG Combining 2
   0.0080 (  3.5%)   0.0000 (  0.0%)   0.0080 (  3.3%)   0.0079 (  3.0%)  Instruction Scheduling Cleanup
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0006 (  0.2%)  DAG Combining after legalize types
   0.2240 (100.0%)   0.0160 (100.0%)   0.2400 (100.0%)   0.2590 (100.0%)  TOTAL

===-------------------------------------------------------------------------===
                      ... Pass execution timing report ...
===-------------------------------------------------------------------------===
  Total Execution Time: 54.6514 seconds (54.6741 wall clock)

   ---User Time---   --System Time--   --User+System--   ---Wall Time---  --- Name ---
  42.7026 ( 78.2%)   0.0240 ( 21.4%)  42.7266 ( 78.1%)  42.7355 ( 78.1%)  Simple Register Coalescing
   3.7242 (  6.8%)   0.0040 (  3.5%)   3.7282 (  6.8%)   3.7292 (  6.8%)  Unswitch loops
   1.4080 (  2.5%)   0.0000 (  0.0%)   1.4080 (  2.5%)   1.4124 (  2.5%)  Canonicalize natural loops
   1.1840 (  2.1%)   0.0000 (  0.0%)   1.1840 (  2.1%)   1.1913 (  2.1%)  Canonicalize natural loops
   0.8320 (  1.5%)   0.0000 (  0.0%)   0.8320 (  1.5%)   0.8321 (  1.5%)  Simplify the CFG
   0.6840 (  1.2%)   0.0000 (  0.0%)   0.6840 (  1.2%)   0.6861 (  1.2%)  Optimize for code generation
   0.6080 (  1.1%)   0.0000 (  0.0%)   0.6080 (  1.1%)   0.6070 (  1.1%)  Simplify the CFG
   0.5440 (  0.9%)   0.0000 (  0.0%)   0.5440 (  0.9%)   0.5444 (  0.9%)  Linear Scan Register Allocator
   0.4720 (  0.8%)   0.0320 ( 28.5%)   0.5040 (  0.9%)   0.5015 (  0.9%)  X86 DAG->DAG Instruction Selection
   0.3640 (  0.6%)   0.0000 (  0.0%)   0.3640 (  0.6%)   0.3662 (  0.6%)  Break critical edges in CFG
   0.3280 (  0.6%)   0.0000 (  0.0%)   0.3280 (  0.6%)   0.3248 (  0.5%)  Global Value Numbering
   0.1920 (  0.3%)   0.0000 (  0.0%)   0.1920 (  0.3%)   0.1925 (  0.3%)  Control Flow Optimizer
   0.1120 (  0.2%)   0.0000 (  0.0%)   0.1120 (  0.2%)   0.1104 (  0.2%)  Eliminate PHI nodes for register allocation
   0.1120 (  0.2%)   0.0040 (  3.5%)   0.1160 (  0.2%)   0.1081 (  0.1%)  Canonicalize Induction Variables
   0.1040 (  0.1%)   0.0000 (  0.0%)   0.1040 (  0.1%)   0.0961 (  0.1%)  Loop Strength Reduction
   0.0680 (  0.1%)   0.0000 (  0.0%)   0.0680 (  0.1%)   0.0906 (  0.1%)  Induction Variable Users
   0.0600 (  0.1%)   0.0120 ( 10.7%)   0.0720 (  0.1%)   0.0764 (  0.1%)  Live Interval Analysis
   0.0480 (  0.0%)   0.0200 ( 17.8%)   0.0680 (  0.1%)   0.0714 (  0.1%)  Live Variable Analysis
   0.0720 (  0.1%)   0.0040 (  3.5%)   0.0760 (  0.1%)   0.0709 (  0.1%)  Loop-Closed SSA Form Pass
   0.0680 (  0.1%)   0.0000 (  0.0%)   0.0680 (  0.1%)   0.0685 (  0.1%)  Simplify the CFG
   0.0640 (  0.1%)   0.0000 (  0.0%)   0.0640 (  0.1%)   0.0641 (  0.1%)  Combine redundant instructions
   0.0600 (  0.1%)   0.0000 (  0.0%)   0.0600 (  0.1%)   0.0593 (  0.1%)  Combine redundant instructions
   0.0520 (  0.0%)   0.0000 (  0.0%)   0.0520 (  0.0%)   0.0528 (  0.0%)  Promote Memory to Register
   0.0520 (  0.0%)   0.0000 (  0.0%)   0.0520 (  0.0%)   0.0511 (  0.0%)  Loop-Closed SSA Form Pass
   0.0360 (  0.0%)   0.0000 (  0.0%)   0.0360 (  0.0%)   0.0475 (  0.0%)  Induction Variable Users
   0.0480 (  0.0%)   0.0000 (  0.0%)   0.0480 (  0.0%)   0.0465 (  0.0%)  Simplify the CFG
   0.0400 (  0.0%)   0.0000 (  0.0%)   0.0400 (  0.0%)   0.0392 (  0.0%)  Sparse Conditional Constant Propagation
   0.0400 (  0.0%)   0.0000 (  0.0%)   0.0400 (  0.0%)   0.0392 (  0.0%)  Dominator Tree Construction
   0.0320 (  0.0%)   0.0000 (  0.0%)   0.0320 (  0.0%)   0.0317 (  0.0%)  Two-Address instruction pass
   0.0280 (  0.0%)   0.0040 (  3.5%)   0.0320 (  0.0%)   0.0290 (  0.0%)  Loop Invariant Code Motion
   0.0240 (  0.0%)   0.0000 (  0.0%)   0.0240 (  0.0%)   0.0231 (  0.0%)  Simplify the CFG
   0.0200 (  0.0%)   0.0000 (  0.0%)   0.0200 (  0.0%)   0.0222 (  0.0%)  Combine redundant instructions
   0.0200 (  0.0%)   0.0000 (  0.0%)   0.0200 (  0.0%)   0.0205 (  0.0%)  Combine redundant instructions
   0.0240 (  0.0%)   0.0000 (  0.0%)   0.0240 (  0.0%)   0.0204 (  0.0%)  Dominance Frontier Construction
   0.0160 (  0.0%)   0.0000 (  0.0%)   0.0160 (  0.0%)   0.0195 (  0.0%)  Dominance Frontier Construction
   0.0160 (  0.0%)   0.0000 (  0.0%)   0.0160 (  0.0%)   0.0153 (  0.0%)  Break critical edges in CFG
   0.0160 (  0.0%)   0.0000 (  0.0%)   0.0160 (  0.0%)   0.0148 (  0.0%)  Stack Slot Coloring
   0.0120 (  0.0%)   0.0000 (  0.0%)   0.0120 (  0.0%)   0.0141 (  0.0%)  Dominator Tree Construction
   0.0120 (  0.0%)   0.0000 (  0.0%)   0.0120 (  0.0%)   0.0129 (  0.0%)  MachineDominator Tree Construction
   0.0119 (  0.0%)   0.0000 (  0.0%)   0.0119 (  0.0%)   0.0126 (  0.0%)  Simplify the CFG
   0.0120 (  0.0%)   0.0000 (  0.0%)   0.0120 (  0.0%)   0.0124 (  0.0%)  Dominator Tree Construction
   0.0200 (  0.0%)   0.0000 (  0.0%)   0.0200 (  0.0%)   0.0115 (  0.0%)  Delete dead loops
   0.0079 (  0.0%)   0.0000 (  0.0%)   0.0079 (  0.0%)   0.0111 (  0.0%)  MachineDominator Tree Construction
   0.0079 (  0.0%)   0.0040 (  3.5%)   0.0120 (  0.0%)   0.0105 (  0.0%)  Machine Code Deleter
   0.0080 (  0.0%)   0.0000 (  0.0%)   0.0080 (  0.0%)   0.0096 (  0.0%)  Aggressive Dead Code Elimination
   0.0079 (  0.0%)   0.0000 (  0.0%)   0.0079 (  0.0%)   0.0095 (  0.0%)  Jump Threading
   0.0080 (  0.0%)   0.0000 (  0.0%)   0.0080 (  0.0%)   0.0092 (  0.0%)  Machine code sinking
   0.0080 (  0.0%)   0.0000 (  0.0%)   0.0080 (  0.0%)   0.0078 (  0.0%)  Dominator Tree Construction
   0.0080 (  0.0%)   0.0000 (  0.0%)   0.0080 (  0.0%)   0.0076 (  0.0%)  Dominator Tree Construction
   0.0080 (  0.0%)   0.0000 (  0.0%)   0.0080 (  0.0%)   0.0075 (  0.0%)  Combine redundant instructions
   0.0080 (  0.0%)   0.0000 (  0.0%)   0.0080 (  0.0%)   0.0071 (  0.0%)  Find Used Types
   0.0040 (  0.0%)   0.0000 (  0.0%)   0.0040 (  0.0%)   0.0069 (  0.0%)  Dominator Tree Construction
   0.0079 (  0.0%)   0.0000 (  0.0%)   0.0079 (  0.0%)   0.0068 (  0.0%)  Natural Loop Information
   0.0080 (  0.0%)   0.0000 (  0.0%)   0.0080 (  0.0%)   0.0068 (  0.0%)  Dominator Tree Construction
   0.0080 (  0.0%)   0.0000 (  0.0%)   0.0080 (  0.0%)   0.0067 (  0.0%)  X86 AT&T-Style Assembly Printer
   0.0080 (  0.0%)   0.0000 (  0.0%)   0.0080 (  0.0%)   0.0061 (  0.0%)  Prolog/Epilog Insertion & Frame Finalization
   0.0000 (  0.0%)   0.0040 (  3.5%)   0.0040 (  0.0%)   0.0058 (  0.0%)  Dominance Frontier Construction
   0.0080 (  0.0%)   0.0000 (  0.0%)   0.0080 (  0.0%)   0.0057 (  0.0%)  Remove unreachable blocks from the CFG
   0.0080 (  0.0%)   0.0000 (  0.0%)   0.0080 (  0.0%)   0.0054 (  0.0%)  Remove unreachable machine basic blocks
   0.0079 (  0.0%)   0.0000 (  0.0%)   0.0079 (  0.0%)   0.0053 (  0.0%)  Natural Loop Information
   0.0039 (  0.0%)   0.0000 (  0.0%)   0.0039 (  0.0%)   0.0051 (  0.0%)  Conditional Propagation
   0.0039 (  0.0%)   0.0000 (  0.0%)   0.0039 (  0.0%)   0.0046 (  0.0%)  MachineDominator Tree Construction
   0.0039 (  0.0%)   0.0000 (  0.0%)   0.0039 (  0.0%)   0.0040 (  0.0%)  Dominance Frontier Construction
   0.0040 (  0.0%)   0.0000 (  0.0%)   0.0040 (  0.0%)   0.0040 (  0.0%)  Dominance Frontier Construction
   0.0039 (  0.0%)   0.0000 (  0.0%)   0.0039 (  0.0%)   0.0038 (  0.0%)  Combine redundant instructions
   0.0040 (  0.0%)   0.0000 (  0.0%)   0.0040 (  0.0%)   0.0038 (  0.0%)  Combine redundant instructions
   0.0040 (  0.0%)   0.0000 (  0.0%)   0.0040 (  0.0%)   0.0033 (  0.0%)  Dead Store Elimination
   0.0040 (  0.0%)   0.0000 (  0.0%)   0.0040 (  0.0%)   0.0032 (  0.0%)  Virtual Register Map
   0.0040 (  0.0%)   0.0000 (  0.0%)   0.0040 (  0.0%)   0.0029 (  0.0%)  Machine Natural Loop Construction
   0.0040 (  0.0%)   0.0000 (  0.0%)   0.0040 (  0.0%)   0.0027 (  0.0%)  Machine Natural Loop Construction
   0.0039 (  0.0%)   0.0000 (  0.0%)   0.0039 (  0.0%)   0.0026 (  0.0%)  X86 FP Stackifier
   0.0040 (  0.0%)   0.0000 (  0.0%)   0.0040 (  0.0%)   0.0023 (  0.0%)  Loop-Closed SSA Form Pass
   0.0040 (  0.0%)   0.0000 (  0.0%)   0.0040 (  0.0%)   0.0020 (  0.0%)  Module Verifier
   0.0040 (  0.0%)   0.0000 (  0.0%)   0.0040 (  0.0%)   0.0019 (  0.0%)  Exception handling preparation
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0018 (  0.0%)  Reassociate expressions
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0018 (  0.0%)  MemCpy Optimization
   0.0039 (  0.0%)   0.0000 (  0.0%)   0.0039 (  0.0%)   0.0017 (  0.0%)  Conditional Propagation
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0012 (  0.0%)  Dead Global Elimination
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0012 (  0.0%)  Rotate Loops
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0010 (  0.0%)  Machine Natural Loop Construction
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0007 (  0.0%)  Tail Call Elimination
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0006 (  0.0%)  Subregister lowering instruction pass
   0.0040 (  0.0%)   0.0000 (  0.0%)   0.0040 (  0.0%)   0.0006 (  0.0%)  Memory Dependence Analysis
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0005 (  0.0%)  Scalar Evolution Analysis
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0003 (  0.0%)  Machine Instruction LICM
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0003 (  0.0%)  Scalar Evolution Analysis
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0003 (  0.0%)  Label Folder
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0002 (  0.0%)  Memory Dependence Analysis
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0001 (  0.0%)  Code Placement Optimizater
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0001 (  0.0%)  Basic CallGraph Construction
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Live Stack Slot Analysis
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Remove unused exception handling info
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Inliner for always_inline functions
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  X86 Maximal Stack Alignment Calculator
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Simplify well-known library calls
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Deduce function attributes
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Memory Dependence Analysis
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Dead Argument Elimination
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Scalar Replacement of Aggregates
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Dead Type Elimination
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Global Variable Optimizer
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Preliminary module verification
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Interprocedural constant propagation
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  X86 FP_REG_KILL inserter
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Basic Alias Analysis (default AA impl)
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Promote Memory to Register
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Analyze Machine Code For Garbage Collection
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Strip Unused Function Prototypes
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Insert stack protectors
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Target Data Layout
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Raise allocations from calls to instructions
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Lower Garbage Collection Instructions
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Delete Garbage Collector Information
  54.5394 (100.0%)   0.1120 (100.0%)  54.6514 (100.0%)  54.6741 (100.0%)  TOTAL

clang-cc: /home/edwin/llvm-svn/llvm/include/llvm/Support/Timer.h:158:
llvm::TimerGroup::~TimerGroup(): Assertion `NumTimers == 0 && "TimerGroup
destroyed before all contained timers!"' failed.
0   clang-cc        0x00000000010ec94f
1   clang-cc        0x00000000010ecd49
2   libpthread.so.0 0x0000003b0a80e7b0
3   libc.so.6       0x0000003b09c32065 gsignal + 53
4   libc.so.6       0x0000003b09c35153 abort + 387
5   libc.so.6       0x0000003b09c2b159 __assert_fail + 233
6   clang-cc        0x0000000000dbcc5c llvm::TimerGroup::~TimerGroup() + 124
7   libc.so.6       0x0000003b09c367dd exit + 157
8   libc.so.6       0x0000003b09c1e5ad __libc_start_main + 237
9   clang-cc        0x000000000042d479
Quuxplusone commented 15 years ago

Attached interp.bc (434324 bytes, application/octet-stream): llc interp.bc

Quuxplusone commented 15 years ago

Attached interp.i (86998 bytes, application/octet-stream): interp.i

Quuxplusone commented 15 years ago
I think this happens because there are lots of large live intervals, see
oprofile output below:

CPU: Core 2, speed 2000 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask
of 0x00 (Unhalted core cycles) count 400000
samples  %        image name               app name                 symbol name
26958    20.8757  llc                      llc
llvm::SimpleRegisterCoalescing::RemoveUnnecessaryKills(unsignedint,
llvm::LiveInterval&)
21483    16.6359  no-vmlinux               no-vmlinux               (no symbols)
11104     8.5987  llc                      llc
llvm::SimpleRegisterCoalescing::JoinIntervals(llvm::LiveInterval&,
llvm::LiveInterval&, bool&)
7863      6.0889  llc                      llc
llvm::LiveInterval::FindLiveRangeContaining(unsigned int) const
6607      5.1163  llc                      llc
llvm::LiveIntervals::getInstructionIndex(llvm::MachineInstr*) const
5557      4.3032  libc-2.9.so              libc-2.9.so              (no symbols)
4329      3.3523  llc                      llc
llvm::X86InstrInfo::isMoveInstr(llvm::MachineInstr const&, unsigned int&,
unsigned int&, unsigned int&, unsigned int&) const
3145      2.4354  llc                      llc
llvm::LiveIntervals::getVNInfoSourceReg(llvm::VNInfo const*) const
Quuxplusone commented 15 years ago

I'd like to point out in the usual -O0 path, this code is never used.

Quuxplusone commented 15 years ago

Is this still an issue?

Quuxplusone commented 14 years ago
(In reply to comment #4)
> Is this still an issue?
>

No, this one is fixed.

However the coalescer still has issues on other large files (see PR5708).