Quuxplusone / LLVMBugzillaTest

0 stars 0 forks source link

slow compilation #13167

Open Quuxplusone opened 12 years ago

Quuxplusone commented 12 years ago
Bugzilla Link PR13075
Status NEW
Importance P normal
Reported by Ami Fischman (fischman@chromium.org)
Reported on 2012-06-10 17:55:10 -0700
Last modified on 2019-05-20 16:20:09 -0700
Version trunk
Hardware PC Linux
CC dgregor@apple.com, jrose@belkadan.com, jryans@gmail.com, kremenek@apple.com, llvm-bugs@lists.llvm.org, rafael@espindo.la
Fixed by commit(s)
Attachments webcore_rendering.RenderMediaControlsChromium.E.bz2 (290781 bytes, application/octet-stream)
Blocks
Blocked by
See also
clang version 3.2 (trunk 155835)

The attached 2.6MB .E file is a pre-processed (-E) file from the chromium build
of WebKit, and clang takes ~3s to compile it down to a 3.4M .o file (on a beefy
linux machine).

$ /usr/bin/time clang -w -g   -x c++  -c
webcore_rendering.RenderMediaControlsChromium.E -o /dev/null
2.99user 0.08system 0:03.08elapsed 99%CPU (0avgtext+0avgdata 705440maxresident)k
0inputs+0outputs (0major+56787minor)pagefaults 0swaps
Dropping the -g drops the compile time to 2.2s-2.3s, although this still seems
excessively long.

When I do the same pre-processing for a g++-based ARM cross-compile of this
file, the -E output is still 2.6M, but g++ 4.6.3 takes only ~70ms (!) to
compile it down to a 1.4M .o file (once caches are warmed up).

So for some reason clang is 42x slower than gcc in producing output that is
less than 3x the size.
Quuxplusone commented 12 years ago

cloned to rdar://problem/11637948

Quuxplusone commented 12 years ago

Sorry to get back to you months later, but I don't see a preprocessed file...

Quuxplusone commented 12 years ago

Attached webcore_rendering.RenderMediaControlsChromium.E.bz2 (290781 bytes, application/octet-stream): pre-processed /tmp/webcore_rendering.RenderMediaControlsChromium.E

Quuxplusone commented 12 years ago
Nico Weber tells me this bug needs -ftime-report output, so here it is, for
clang version 3.2 (trunk 163674) on x86_64-unknown-linux-gnu:

$ /usr/bin/time ~/src/chromium/src/third_party/llvm-
build/Release+Asserts/bin/clang -w -g   -x c++  -c
webcore_rendering.RenderMediaControlsChromium.E -o /dev/null -ftime-report
===-------------------------------------------------------------------------===
                      Instruction Selection and Scheduling
===-------------------------------------------------------------------------===
  Total Execution Time: 0.0100 seconds (0.0168 wall clock)

   --System Time--   --User+System--   ---Wall Time---  --- Name ---
   0.0100 (100.0%)   0.0100 (100.0%)   0.0042 ( 24.7%)  Instruction Scheduling
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0028 ( 16.6%)  Instruction Selection
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0025 ( 15.1%)  Instruction Creation
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0019 ( 11.4%)  DAG Combining 1
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0017 (  9.9%)  Vector Legalization
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0014 (  8.2%)  DAG Legalization
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0010 (  6.1%)  Type Legalization
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0009 (  5.5%)  DAG Combining 2
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0004 (  2.3%)  Instruction Scheduling Cleanup
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.2%)  DAG Combining after legalize types
   0.0100 (100.0%)   0.0100 (100.0%)   0.0168 (100.0%)  Total

===-------------------------------------------------------------------------===
                                 DWARF Emission
===-------------------------------------------------------------------------===
  Total Execution Time: 0.1100 seconds (0.1267 wall clock)

   ---User Time---   --User+System--   ---Wall Time---  --- Name ---
   0.1100 (100.0%)   0.1100 (100.0%)   0.1254 ( 99.0%)  DWARF Debug Writer
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0013 (  1.0%)  DWARF Exception Writer
   0.1100 (100.0%)   0.1100 (100.0%)   0.1267 (100.0%)  Total

===-------------------------------------------------------------------------===
                      ... Pass execution timing report ...
===-------------------------------------------------------------------------===
  Total Execution Time: 0.1000 seconds (0.1219 wall clock)

   ---User Time---   --System Time--   --User+System--   ---Wall Time---  --- Name ---
   0.0200 ( 22.2%)   0.0100 (100.0%)   0.0300 ( 30.0%)   0.0362 ( 29.7%)  X86 DAG->DAG Instruction Selection
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0263 ( 21.6%)  X86 AT&T-Style Assembly Printer
   0.0200 ( 22.2%)   0.0000 (  0.0%)   0.0200 ( 20.0%)   0.0138 ( 11.3%)  Module Verifier
   0.0100 ( 11.1%)   0.0000 (  0.0%)   0.0100 ( 10.0%)   0.0128 ( 10.5%)  Module Verifier
   0.0100 ( 11.1%)   0.0000 (  0.0%)   0.0100 ( 10.0%)   0.0123 ( 10.1%)  Module Verifier
   0.0200 ( 22.2%)   0.0000 (  0.0%)   0.0200 ( 20.0%)   0.0037 (  3.0%)  Fast Register Allocator
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0030 (  2.4%)  Machine Function Analysis
   0.0100 ( 11.1%)   0.0000 (  0.0%)   0.0100 ( 10.0%)   0.0026 (  2.1%)  Prologue/Epilogue Insertion & Frame Finalization
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0019 (  1.6%)  Two-Address instruction pass
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0012 (  1.0%)  Dominator Tree Construction
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0009 (  0.7%)  Dominator Tree Construction
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0009 (  0.7%)  Basic Alias Analysis (stateless AA impl)
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0009 (  0.7%)  Dominator Tree Construction
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0007 (  0.6%)  Basic CallGraph Construction
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0005 (  0.4%)  Remove unreachable blocks from the CFG
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0004 (  0.4%)  No Alias Analysis (always returns 'may' alias)
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0004 (  0.3%)  Post-RA pseudo instruction expansion pass
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0004 (  0.3%)  Bundle Machine CFG Edges
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0004 (  0.3%)  Inliner for always_inline functions
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0003 (  0.3%)  Eliminate PHI nodes for register allocation
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0003 (  0.2%)  Exception handling preparation
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0002 (  0.2%)  Expand ISel Pseudo-instructions
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0002 (  0.2%)  Preliminary module verification
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0002 (  0.2%)  Preliminary module verification
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0002 (  0.2%)  Insert stack protectors
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0002 (  0.1%)  X86 FP Stackifier
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0002 (  0.1%)  Analyze Machine Code For Garbage Collection
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0002 (  0.1%)  Local Stack Slot Allocation
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0002 (  0.1%)  X86 Maximal Stack Alignment Check
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0001 (  0.1%)  Lower Garbage Collection Instructions
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0001 (  0.1%)  Preliminary module verification
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0001 (  0.1%)  Delete Garbage Collector Information
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0001 (  0.1%)  Create Garbage Collector Module Metadata
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Target Library Information
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Target Pass Configuration
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Machine Module Information
   0.0900 (100.0%)   0.0100 (100.0%)   0.1000 (100.0%)   0.1219 (100.0%)  Total

===-------------------------------------------------------------------------===
                         Miscellaneous Ungrouped Timers
===-------------------------------------------------------------------------===

   ---User Time---   --System Time--   --User+System--   ---Wall Time---  --- Name ---
   1.7600 ( 76.9%)   0.0800 ( 66.7%)   1.8400 ( 76.3%)   1.8533 ( 76.6%)  Clang front-end timer
   0.2400 ( 10.5%)   0.0400 ( 33.3%)   0.2800 ( 11.6%)   0.2847 ( 11.8%)  Code Generation Time
   0.2900 ( 12.7%)   0.0000 (  0.0%)   0.2900 ( 12.0%)   0.2810 ( 11.6%)  LLVM IR Generation Time
   2.2900 (100.0%)   0.1200 (100.0%)   2.4100 (100.0%)   2.4191 (100.0%)  Total

1.78user 0.09system 0:01.88elapsed 99%CPU (0avgtext+0avgdata 646960maxresident)k
0inputs+0outputs (0major+51782minor)pagefaults 0swaps
Quuxplusone commented 11 years ago
Is this still an issue?  On my machine, this takes 1.4s to compile:

$ time ~/llvm-make/Release/bin/clang -c
/tmp/webcore_rendering.RenderMediaControlsChromium.ii -w -g -fno-blocks

real    0m1.395s
user    0m1.316s
sys 0m0.074s
Quuxplusone commented 11 years ago
Is this still an issue?  On my machine, this takes 1.4s to compile:

$ time ~/llvm-make/Release/bin/clang -c
/tmp/webcore_rendering.RenderMediaControlsChromium.ii -w -g -fno-blocks

real    0m1.395s
user    0m1.316s
sys 0m0.074s
Quuxplusone commented 11 years ago

As of r168474 my test from comment#3 is down to 1.6s on an HP z620 running precise (original timings were from a z600 running lucid, FWIW).

So, there's about a 2x speed improvement over the originally-reported bug, but that's still over 20x slower than gcc.

Quuxplusone commented 11 years ago
(In reply to comment #7)
> As of r168474 my test from comment#3 is down to 1.6s on an HP z620 running
> precise (original timings were from a z600 running lucid, FWIW).
>
> So, there's about a 2x speed improvement over the originally-reported bug, but
> that's still over 20x slower than gcc.

Thanks.  The comparison against gcc is useful.
Quuxplusone commented 11 years ago
(In reply to comment #7)
> As of r168474 my test from comment#3 is down to 1.6s on an HP z620 running
> precise (original timings were from a z600 running lucid, FWIW).
>
> So, there's about a 2x speed improvement over the originally-reported bug, but
> that's still over 20x slower than gcc.

... but top-of-tree is r171419.  That's a significant number of revisions.  For
sanity, could you recheck the timings with tip-of-trunk?
Quuxplusone commented 11 years ago
(In reply to comment #9)
> ... but top-of-tree is r171419.  That's a significant number of revisions.
For
> sanity, could you recheck the timings with tip-of-trunk?

I fail to build at r171419.
Using chromium's pre-built clang at r170392 shows no speedup (still in 1.5-
1.6s).

Does the llvm project have performance/timing tests that might benefit from
adding this test case?