Open Quuxplusone opened 12 years ago
cloned to rdar://problem/11637948
Sorry to get back to you months later, but I don't see a preprocessed file...
Attached webcore_rendering.RenderMediaControlsChromium.E.bz2
(290781 bytes, application/octet-stream): pre-processed /tmp/webcore_rendering.RenderMediaControlsChromium.E
Nico Weber tells me this bug needs -ftime-report output, so here it is, for
clang version 3.2 (trunk 163674) on x86_64-unknown-linux-gnu:
$ /usr/bin/time ~/src/chromium/src/third_party/llvm-
build/Release+Asserts/bin/clang -w -g -x c++ -c
webcore_rendering.RenderMediaControlsChromium.E -o /dev/null -ftime-report
===-------------------------------------------------------------------------===
Instruction Selection and Scheduling
===-------------------------------------------------------------------------===
Total Execution Time: 0.0100 seconds (0.0168 wall clock)
--System Time-- --User+System-- ---Wall Time--- --- Name ---
0.0100 (100.0%) 0.0100 (100.0%) 0.0042 ( 24.7%) Instruction Scheduling
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0028 ( 16.6%) Instruction Selection
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0025 ( 15.1%) Instruction Creation
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0019 ( 11.4%) DAG Combining 1
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0017 ( 9.9%) Vector Legalization
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0014 ( 8.2%) DAG Legalization
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0010 ( 6.1%) Type Legalization
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0009 ( 5.5%) DAG Combining 2
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0004 ( 2.3%) Instruction Scheduling Cleanup
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.2%) DAG Combining after legalize types
0.0100 (100.0%) 0.0100 (100.0%) 0.0168 (100.0%) Total
===-------------------------------------------------------------------------===
DWARF Emission
===-------------------------------------------------------------------------===
Total Execution Time: 0.1100 seconds (0.1267 wall clock)
---User Time--- --User+System-- ---Wall Time--- --- Name ---
0.1100 (100.0%) 0.1100 (100.0%) 0.1254 ( 99.0%) DWARF Debug Writer
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0013 ( 1.0%) DWARF Exception Writer
0.1100 (100.0%) 0.1100 (100.0%) 0.1267 (100.0%) Total
===-------------------------------------------------------------------------===
... Pass execution timing report ...
===-------------------------------------------------------------------------===
Total Execution Time: 0.1000 seconds (0.1219 wall clock)
---User Time--- --System Time-- --User+System-- ---Wall Time--- --- Name ---
0.0200 ( 22.2%) 0.0100 (100.0%) 0.0300 ( 30.0%) 0.0362 ( 29.7%) X86 DAG->DAG Instruction Selection
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0263 ( 21.6%) X86 AT&T-Style Assembly Printer
0.0200 ( 22.2%) 0.0000 ( 0.0%) 0.0200 ( 20.0%) 0.0138 ( 11.3%) Module Verifier
0.0100 ( 11.1%) 0.0000 ( 0.0%) 0.0100 ( 10.0%) 0.0128 ( 10.5%) Module Verifier
0.0100 ( 11.1%) 0.0000 ( 0.0%) 0.0100 ( 10.0%) 0.0123 ( 10.1%) Module Verifier
0.0200 ( 22.2%) 0.0000 ( 0.0%) 0.0200 ( 20.0%) 0.0037 ( 3.0%) Fast Register Allocator
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0030 ( 2.4%) Machine Function Analysis
0.0100 ( 11.1%) 0.0000 ( 0.0%) 0.0100 ( 10.0%) 0.0026 ( 2.1%) Prologue/Epilogue Insertion & Frame Finalization
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0019 ( 1.6%) Two-Address instruction pass
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0012 ( 1.0%) Dominator Tree Construction
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0009 ( 0.7%) Dominator Tree Construction
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0009 ( 0.7%) Basic Alias Analysis (stateless AA impl)
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0009 ( 0.7%) Dominator Tree Construction
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0007 ( 0.6%) Basic CallGraph Construction
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0005 ( 0.4%) Remove unreachable blocks from the CFG
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0004 ( 0.4%) No Alias Analysis (always returns 'may' alias)
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0004 ( 0.3%) Post-RA pseudo instruction expansion pass
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0004 ( 0.3%) Bundle Machine CFG Edges
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0004 ( 0.3%) Inliner for always_inline functions
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0003 ( 0.3%) Eliminate PHI nodes for register allocation
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0003 ( 0.2%) Exception handling preparation
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0002 ( 0.2%) Expand ISel Pseudo-instructions
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0002 ( 0.2%) Preliminary module verification
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0002 ( 0.2%) Preliminary module verification
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0002 ( 0.2%) Insert stack protectors
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0002 ( 0.1%) X86 FP Stackifier
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0002 ( 0.1%) Analyze Machine Code For Garbage Collection
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0002 ( 0.1%) Local Stack Slot Allocation
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0002 ( 0.1%) X86 Maximal Stack Alignment Check
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0001 ( 0.1%) Lower Garbage Collection Instructions
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0001 ( 0.1%) Preliminary module verification
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0001 ( 0.1%) Delete Garbage Collector Information
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0001 ( 0.1%) Create Garbage Collector Module Metadata
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Target Library Information
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Target Pass Configuration
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Machine Module Information
0.0900 (100.0%) 0.0100 (100.0%) 0.1000 (100.0%) 0.1219 (100.0%) Total
===-------------------------------------------------------------------------===
Miscellaneous Ungrouped Timers
===-------------------------------------------------------------------------===
---User Time--- --System Time-- --User+System-- ---Wall Time--- --- Name ---
1.7600 ( 76.9%) 0.0800 ( 66.7%) 1.8400 ( 76.3%) 1.8533 ( 76.6%) Clang front-end timer
0.2400 ( 10.5%) 0.0400 ( 33.3%) 0.2800 ( 11.6%) 0.2847 ( 11.8%) Code Generation Time
0.2900 ( 12.7%) 0.0000 ( 0.0%) 0.2900 ( 12.0%) 0.2810 ( 11.6%) LLVM IR Generation Time
2.2900 (100.0%) 0.1200 (100.0%) 2.4100 (100.0%) 2.4191 (100.0%) Total
1.78user 0.09system 0:01.88elapsed 99%CPU (0avgtext+0avgdata 646960maxresident)k
0inputs+0outputs (0major+51782minor)pagefaults 0swaps
Is this still an issue? On my machine, this takes 1.4s to compile:
$ time ~/llvm-make/Release/bin/clang -c
/tmp/webcore_rendering.RenderMediaControlsChromium.ii -w -g -fno-blocks
real 0m1.395s
user 0m1.316s
sys 0m0.074s
Is this still an issue? On my machine, this takes 1.4s to compile:
$ time ~/llvm-make/Release/bin/clang -c
/tmp/webcore_rendering.RenderMediaControlsChromium.ii -w -g -fno-blocks
real 0m1.395s
user 0m1.316s
sys 0m0.074s
As of r168474 my test from comment#3 is down to 1.6s on an HP z620 running precise (original timings were from a z600 running lucid, FWIW).
So, there's about a 2x speed improvement over the originally-reported bug, but that's still over 20x slower than gcc.
(In reply to comment #7)
> As of r168474 my test from comment#3 is down to 1.6s on an HP z620 running
> precise (original timings were from a z600 running lucid, FWIW).
>
> So, there's about a 2x speed improvement over the originally-reported bug, but
> that's still over 20x slower than gcc.
Thanks. The comparison against gcc is useful.
(In reply to comment #7)
> As of r168474 my test from comment#3 is down to 1.6s on an HP z620 running
> precise (original timings were from a z600 running lucid, FWIW).
>
> So, there's about a 2x speed improvement over the originally-reported bug, but
> that's still over 20x slower than gcc.
... but top-of-tree is r171419. That's a significant number of revisions. For
sanity, could you recheck the timings with tip-of-trunk?
(In reply to comment #9)
> ... but top-of-tree is r171419. That's a significant number of revisions.
For
> sanity, could you recheck the timings with tip-of-trunk?
I fail to build at r171419.
Using chromium's pre-built clang at r170392 shows no speedup (still in 1.5-
1.6s).
Does the llvm project have performance/timing tests that might benefit from
adding this test case?
webcore_rendering.RenderMediaControlsChromium.E.bz2
(290781 bytes, application/octet-stream)