Open Quuxplusone opened 11 years ago
Attached CodeGen_ARM.cpp
(45497 bytes, application/octet-stream): Source file which causes slow compilation performance
CodeGen_ARM.cpp:1:10: fatal error: 'CodeGen_ARM.h' file not found
#include "CodeGen_ARM.h"
^
Could you preprocess it and upload the .ii?
Full preprocessed source here: https://gist.github.com/jrk/6117757
Compiled on OS X with: c++ -O3 CodeGen_ARM.ii. (Without -O3, performance is nominal.)
-ftime-report points at SROA. Probably unhappy because the CFG is extremely complex.
Just profiled this at r206481.
The SROA slowdown is the same as http://llvm.org/bugs/show_bug.cgi?id=17855.
Bottleneck is SSAUpdater.
Running Time Self Symbol Name
64684.0ms 43.5% 0.0 (anonymous
namespace)::SROA::runOnFunction(llvm::Function&)
64677.0ms 43.5% 1.0
llvm::LoadAndStorePromoter::run(llvm::SmallVectorImpl<llvm::Instruction*>
const&) const
64676.0ms 43.5% 0.0
llvm::SSAUpdater::GetValueInMiddleOfBlock(llvm::BasicBlock*)
64676.0ms 43.5% 0.0
llvm::SSAUpdater::GetValueAtEndOfBlockInternal(llvm::BasicBlock*)
64672.0ms 43.5% 9.0
llvm::SSAUpdaterImpl<llvm::SSAUpdater>::GetValue(llvm::BasicBlock*)
64379.0ms 43.3% 63469.0
llvm::SSAUpdaterImpl<llvm::SSAUpdater>::FindAvailableVals(llvm::SmallVectorImpl<llvm::SSAUpdaterImpl<llvm::SSAUpdater>::BBInfo*>*)
However, even more time is spent in CorrelatedValuePropagation. The bottleneck
there is LVIValueHandle::deleted().
Running Time Self Symbol Name
68279.0ms 46.0% 30.0 (anonymous
namespace)::CorrelatedValuePropagation::runOnFunction(llvm::Function&)
65876.0ms 44.3% 10.0
llvm::Value::replaceAllUsesWith(llvm::Value*)
65856.0ms 44.3% 12.0
llvm::ValueHandleBase::ValueIsRAUWd(llvm::Value*, llvm::Value*)
65818.0ms 44.3% 65474.0 (anonymous
namespace)::LVIValueHandle::deleted()
I was going to close PR17855 as a dup, but now I'm thinking this PR should
track the CorrelatedValuePropagation bottleneck, while PR17855 tracks the SROA
slowdown.
r245820 fixes the SROA issue and improves -O3 compile-time from 113s to 12s on
my machine.
Top5 is now:
Running Time Self (ms) Symbol Name
3445.0ms 29.4% 5.0 (anonymous
namespace)::JumpThreading::runOnFunction(llvm::Function&)
1483.0ms 12.6% 2.0 (anonymous
namespace)::GVN::runOnFunction(llvm::Function&)
1063.0ms 9.0% 17.0 (anonymous
namespace)::CorrelatedValuePropagation::runOnFunction(llvm::Function&)
697.0ms 5.9% 0.0 (anonymous
namespace)::X86DAGToDAGISel::runOnMachineFunction(llvm::MachineFunction&)
589.0ms 5.0% 4.0 (anonymous
namespace)::RegisterCoalescer::runOnMachineFunction(llvm::MachineFunction&)
Mehdi, sounds like this is fixed?
As of r253350, nothing changed since my last update, still ~30% of the total
clang invocation is spent in JumpThreading.
Did you get different measurements? I measured on OS X.
(In reply to comment #7)
> As of r253350, nothing changed since my last update, still ~30% of the total
> clang invocation is spent in JumpThreading.
> Did you get different measurements? I measured on OS X.
No, but the original report was that the file took 7m to compile. You said it
takes 12s now, despite the fact that 30% of the time is in JumpThreading. If
you want to leave it open to track future improvements to JumpThreading or GVN,
go for it.
Oh I see, yes that was my idea when I did the previous measurements.
Do you think I should have closed this bug as fixed and opened a new one for
JumpThreading?
Majority of the compile time now spent in GVN. Updating the title to reflect
that. Also attached codegen_arm.bc.
===-------------------------------------------------------------------------===
... Pass execution timing report ...
===-------------------------------------------------------------------------===
Total Execution Time: 10.1732 seconds (10.1731 wall clock)
---User Time--- --System Time-- --User+System-- ---Wall Time--- --- Name ---
6.4170 ( 64.2%) 0.0243 ( 14.0%) 6.4413 ( 63.3%) 6.4422 ( 63.3%) Global Value Numbering
0.6697 ( 6.7%) 0.0046 ( 2.7%) 0.6743 ( 6.6%) 0.6743 ( 6.6%) Value Propagation
0.6127 ( 6.1%) 0.0064 ( 3.7%) 0.6191 ( 6.1%) 0.6191 ( 6.1%) Jump Threading
0.2233 ( 2.2%) 0.0035 ( 2.0%) 0.2268 ( 2.2%) 0.2268 ( 2.2%) Function Integration/Inlining
0.1531 ( 1.5%) 0.0018 ( 1.0%) 0.1550 ( 1.5%) 0.1549 ( 1.5%) Jump Threading #2
Attached codegen_arm.bc
(660096 bytes, application/octet-stream): codegen_arm.bc, reproducer for compile time issue
CodeGen_ARM.cpp
(45497 bytes, application/octet-stream)codegen_arm.bc
(660096 bytes, application/octet-stream)