Quuxplusone / LLVMBugzillaTest

0 stars 0 forks source link

RDF Live Variables Analysis may cause ballooning compile times #45777

Open Quuxplusone opened 4 years ago

Quuxplusone commented 4 years ago
Bugzilla Link PR46808
Status NEW
Importance P enhancement
Reported by Scott Constable (scott.d.constable@intel.com)
Reported on 2020-07-22 09:39:40 -0700
Last modified on 2020-08-04 16:53:02 -0700
Version trunk
Hardware All All
CC llvm-bugs@lists.llvm.org
Fixed by commit(s)
Attachments bigfunc.ll.zip (371766 bytes, application/x-zip-compressed)
bigfunc-lvi.ll.zip (371806 bytes, application/x-zip-compressed)
Blocks
Blocked by
See also

The RDF live variables analysis probably has high asymptotic complexity than what is actually necessary. In particular:

https://github.com/llvm/llvm-project/blob/b5be1c5419e2a38eb60fc7e785567b54b6d9e0e0/llvm/lib/CodeGen/RDFLiveness.cpp#L208

This may be causing substantial compilation overhead for some users who enable passes that depend on RDF live variable analysis, e.g., https://github.com/rust-lang/rust/issues/74632.

Related discussion on llvm-dev: http://lists.llvm.org/pipermail/llvm-dev/2020-April/141332.html.

Quuxplusone commented 4 years ago

Is it enough to set "--x86-experimental-lvi-inline-asm-hardening", and "target-feature=+lvi-cfi,+lvi-load-hardening" to get the LVI pass to run?

Quuxplusone commented 4 years ago

For Rust, I'm not 100% certain, as I have not personally compiled any rust code with the LVI hardening.

For clang/clang++, it suffices to compile with the -mlvi-hardening flag.

Quuxplusone commented 4 years ago
I'm running llc on ScalarEvolution.bc (a leftover file I had from some earlier
investigation).  It takes about 30 seconds to process this file, and the
oprofile results are:

Using /w/test/oprofile_data/samples/ for samples directory.
CPU: Intel Skylake microarchitecture, speed 3000 MHz (estimated)
Counted cpu_clk_unhalted events () with a unit mask of 0x00 (Core cycles when
at least one thread on the physical core is not in halt state) count 100000
samples  %        image name               symbol name
54378     5.9208  llc                      unsigned int
llvm::DomTreeBuilder::SemiNCAInfo<llvm::DominatorTreeBase<llvm::MachineBasicBlock,
false> >::runDFS<false,
llvm::DomTreeBuilder::SemiNCAInfo<llvm::DominatorTreeBase<llvm::MachineBasicBlock,
false> >::verifyParentPro
perty(llvm::DominatorTreeBase<llvm::MachineBasicBlock, false>
const&)::{lambda(llvm::MachineBasicBlock*,
llvm::MachineBasicBlock*)#1}>(llvm::MachineBasicBlock*, unsigned int,
llvm::DomTreeBuilder::SemiNCAInfo<llvm::DominatorTreeBase<llvm::MachineBasicBlock,
false> >::verifyParentProperty(llvm::DominatorTreeBase<llvm::MachineBasicBlock,
false> const&)::{lambda(llvm::MachineBasicBlock*, llvm::MachineBasicBlock*)#1},
unsigned int)
47477     5.1694  llc                      (anonymous
namespace)::MachineVerifier::visitMachineFunctionAfter()
39995     4.3548  llc                      unsigned int
llvm::DomTreeBuilder::SemiNCAInfo<llvm::DominatorTreeBase<llvm::BasicBlock,
false> >::runDFS<false,
llvm::DomTreeBuilder::SemiNCAInfo<llvm::DominatorTreeBase<llvm::BasicBlock,
false> >::verifyParentProperty(llvm::DominatorTreeBase<llvm::BasicBlock, false>
const&)::{lambda(llvm::BasicBlock*, llvm::BasicBlock*)#1}>(llvm::BasicBlock*,
unsigned int,
llvm::DomTreeBuilder::SemiNCAInfo<llvm::DominatorTreeBase<llvm::BasicBlock,
false> >::verifyParentProperty(llvm::DominatorTreeBase<llvm::BasicBlock, false>
const&)::{lambda(llvm::BasicBlock*, llvm::BasicBlock*)#1}, unsigned int)
29406     3.2018  llc
llvm::DenseMapBase<llvm::DenseMap<llvm::MachineBasicBlock*,
llvm::DomTreeBuilder::SemiNCAInfo<llvm::DominatorTreeBase<llvm::MachineBasicBlock,
false> >::InfoRec, llvm::DenseMapInfo<llvm::MachineBasicBlock*>,
llvm::detail::DenseMapPair<llvm::MachineBasicBlock*,
llvm::DomTreeBuilder::SemiNCAInfo<llvm::DominatorTreeBase<llvm::MachineBasicBlock,
false> >::InfoRec> >, llvm::MachineBasicBlock*,
llvm::DomTreeBuilder::SemiNCAInfo<llvm::DominatorTreeBase<llvm::MachineBasicBlock,
false> >::InfoRec, llvm::DenseMapInfo<llvm::MachineBasicBlock*>,
llvm::detail::DenseMapPair<llvm::MachineBasicBlock*,
llvm::DomTreeBuilder::SemiNCAInfo<llvm::DominatorTreeBase<llvm::MachineBasicBlock,
false> >::InfoRec> >::FindAndConstruct(llvm::MachineBasicBlock* const&)
27112     2.9520  libc-2.27.so             _int_malloc
26622     2.8987  llc
llvm::DomTreeBuilder::SemiNCAInfo<llvm::DominatorTreeBase<llvm::BasicBlock,
false> >::ChildrenGetter<false>::Get(llvm::BasicBlock*,
llvm::DomTreeBuilder::SemiNCAInfo<llvm::DominatorTreeBase<llvm::BasicBlock,
false> >::BatchUpdateInfo*)
24294     2.6452  llc
llvm::DenseMapBase<llvm::DenseMap<llvm::BasicBlock*,
llvm::DomTreeBuilder::SemiNCAInfo<llvm::DominatorTreeBase<llvm::BasicBlock,
false> >::InfoRec, llvm::DenseMapInfo<llvm::BasicBlock*>,
llvm::detail::DenseMapPair<llvm::BasicBlock*,
llvm::DomTreeBuilder::SemiNCAInfo<llvm::DominatorTreeBase<llvm::BasicBlock,
false> >::InfoRec> >, llvm::BasicBlock*,
llvm::DomTreeBuilder::SemiNCAInfo<llvm::DominatorTreeBase<llvm::BasicBlock,
false> >::InfoRec, llvm::DenseMapInfo<llvm::BasicBlock*>,
llvm::detail::DenseMapPair<llvm::BasicBlock*,
llvm::DomTreeBuilder::SemiNCAInfo<llvm::DominatorTreeBase<llvm::BasicBlock,
false> >::InfoRec> >::FindAndConstruct(llvm::BasicBlock* const&)
24169     2.6316  llc                      bool (anonymous
namespace)::VRegFilter::filterAndAdd<llvm::DenseSet<unsigned int,
llvm::DenseMapInfo<unsigned int> > >(llvm::DenseSet<unsigned int,
llvm::DenseMapInfo<unsigned int> > const&, llvm::SmallVectorImpl<unsigned int>&)
23714     2.5821  llc                      checkForCyclesHelper(llvm::SDNode
const*, llvm::SmallPtrSetImpl<llvm::SDNode const*>&,
llvm::SmallPtrSetImpl<llvm::SDNode const*>&, llvm::SelectionDAG const*)
23583     2.5678  llc                      (anonymous
namespace)::MachineVerifier::verify(llvm::MachineFunction&)
23307     2.5377  libc-2.27.so             free
22229     2.4204  llc
llvm::DomTreeBuilder::SemiNCAInfo<llvm::DominatorTreeBase<llvm::MachineBasicBlock,
false> >::ChildrenGetter<false>::Get(llvm::MachineBasicBlock*,
llvm::DomTreeBuilder::SemiNCAInfo<llvm::DominatorTreeBase<llvm::MachineBasicBlock,
false> >::BatchUpdateInfo*)
22059     2.4019  llc                      llvm::SmallPtrSetImpl<llvm::SDNode
const*>::insert(llvm::SDNode const*)
21991     2.3944  llc
llvm::Instruction::getNumSuccessors() const
19755     2.1510  llc                      (anonymous
namespace)::MachineVerifier::visitMachineOperand(llvm::MachineOperand const*,
unsigned int)
18117     1.9726  llc
llvm::Instruction::getSuccessor(unsigned int) const
15364     1.6729  llc                      (anonymous
namespace)::MachineVerifier::visitMachineBundleAfter(llvm::MachineInstr const*)
15150     1.6496  llc                      (anonymous
namespace)::details::StructuralHash::update(llvm::Function&)
14762     1.6073  libc-2.27.so             malloc
13687     1.4903  llc                      void
llvm::SmallVectorImpl<llvm::BasicBlock*>::append<std::__1::reverse_iterator<llvm::SuccIterator<llvm::Instruction,
llvm::BasicBlock> >,
void>(std::__1::reverse_iterator<llvm::SuccIterator<llvm::Instruction,
llvm::BasicBlock> >,
std::__1::reverse_iterator<llvm::SuccIterator<llvm::Instruction,
llvm::BasicBlock> >)
11663     1.2699  llc                      llvm::detail::DenseSetPair<unsigned
int>* llvm::DenseMapBase<llvm::DenseMap<unsigned int,
llvm::detail::DenseSetEmpty, llvm::DenseMapInfo<unsigned int>,
llvm::detail::DenseSetPair<unsigned int> >, unsigned int,
llvm::detail::DenseSetEmpty, llvm::DenseMapInfo<unsigned int>,
llvm::detail::DenseSetPair<unsigned int> >::InsertIntoBucketImpl<unsigned
int>(unsigned int const&, unsigned int const&,
llvm::detail::DenseSetPair<unsigned int>*)
11226     1.2223  llc
llvm::DenseMapBase<llvm::DenseMap<llvm::BasicBlock*,
llvm::DomTreeBuilder::SemiNCAInfo<llvm::DominatorTreeBase<llvm::BasicBlock,
false> >::InfoRec, llvm::DenseMapInfo<llvm::BasicBlock*>,
llvm::detail::DenseMapPair<llvm::BasicBlock*,
llvm::DomTreeBuilder::SemiNCAInfo<llvm::DominatorTreeBase<llvm::BasicBlock,
false> >::InfoRec> >, llvm::BasicBlock*,
llvm::DomTreeBuilder::SemiNCAInfo<llvm::DominatorTreeBase<llvm::BasicBlock,
false> >::InfoRec, llvm::DenseMapInfo<llvm::BasicBlock*>,
llvm::detail::DenseMapPair<llvm::BasicBlock*,
llvm::DomTreeBuilder::SemiNCAInfo<llvm::DominatorTreeBase<llvm::BasicBlock,
false> >::InfoRec> >::clear()
10043     1.0935  llc
llvm::DenseMapBase<llvm::DenseMap<llvm::MachineBasicBlock*,
llvm::DomTreeBuilder::SemiNCAInfo<llvm::DominatorTreeBase<llvm::MachineBasicBlock,
false> >::InfoRec, llvm::DenseMapInfo<llvm::MachineBasicBlock*>,
llvm::detail::DenseMapPair<llvm::MachineBasicBlock*,
llvm::DomTreeBuilder::SemiNCAInfo<llvm::DominatorTreeBase<llvm::MachineBasicBlock,
false> >::InfoRec> >, llvm::MachineBasicBlock*,
llvm::DomTreeBuilder::SemiNCAInfo<llvm::DominatorTreeBase<llvm::MachineBasicBlock,
false> >::InfoRec, llvm::DenseMapInfo<llvm::MachineBasicBlock*>,
llvm::detail::DenseMapPair<llvm::MachineBasicBlock*,
llvm::DomTreeBuilder::SemiNCAInfo<llvm::DominatorTreeBase<llvm::MachineBasicBlock,
false> >::InfoRec> >::clear()
8552      0.9312  llc                      llvm::BasicBlock::getTerminator()
const
7664      0.8345  llc
llvm::rdf::PhysicalRegisterInfo::getAliasSet(unsigned int) const
[...]

The first RDF-related symbol is at the bottom with 0.8% of the entire execution
time (this is not the entire list).  Do you have a file that you could share
where the RDF liveness is taking a significant time?  I have some improvements
that should reduce the complexity of the ordering stage, but I'm not seeing any
difference with the files I tried.
Quuxplusone commented 4 years ago

Attached bigfunc.ll.zip (371766 bytes, application/x-zip-compressed): Big function compiled from Rust source

Quuxplusone commented 4 years ago

Attached bigfunc-lvi.ll.zip (371806 bytes, application/x-zip-compressed): Big function compiled from Rust source with LVI mitigations

Quuxplusone commented 4 years ago
Perf report when using llc to compile bigfunc-lvi.ll:

  22.16%  llc      llc                  [.] std::_Rb_tree<std::pair<unsigned int, llvm::LaneBitmask>, std::pair<unsigned int, llvm::LaneBitmask>, std::_Identity<std::pair<unsigned int, llvm::LaneBitmask> >, std::less<std::pair<unsigned int,
  20.15%  llc      llc                  [.] llvm::rdf::Liveness::computePhiInfo
   9.43%  llc      llc                  [.] llvm::rdf::RegisterAggr::makeRegRef
   7.18%  llc      llc                  [.] llvm::rdf::RegisterAggr::makeRegRef() const::$_0::operator()
   5.47%  llc      libc-2.31.so         [.] malloc
   5.35%  llc      llc                  [.] llvm::BitVector::find_first_in
   5.35%  llc      libc-2.31.so         [.] _int_free
   3.97%  llc      libstdc++.so.6.0.28  [.] std::_Rb_tree_increment
   3.84%  llc      llc                  [.] llvm::rdf::RegisterAggr::insert
   3.77%  llc      llc                  [.] llvm::rdf::RegisterAggr::clearIn
   1.66%  llc      libc-2.31.so         [.] cfree@GLIBC_2.2.5
   1.21%  llc      libc-2.31.so         [.] __memset_avx2_unaligned_erms
   0.74%  llc      libstdc++.so.6.0.28  [.] std::_Rb_tree_decrement
   0.39%  llc      llc                  [.] memset@plt
   0.34%  llc      llc                  [.] free@plt
   0.34%  llc      llc                  [.] malloc@plt
   0.33%  llc      llc                  [.] llvm::rdf::PhysicalRegisterInfo::getAliasSet
   0.31%  llc      llc                  [.] llvm::LiveRange::verify
   0.27%  llc      libc-2.31.so         [.] _int_malloc
   0.17%  llc      libstdc++.so.6.0.28  [.] std::_Rb_tree_insert_and_rebalance
   0.13%  llc      llc                  [.] std::_Rb_tree_increment@plt
   0.13%  llc      llc                  [.] llvm::rdf::DataFlowGraph::releaseBlock
   0.13%  llc      llc                  [.] llvm::SelectionDAG::Combine
   0.12%  llc      llc                  [.] std::__detail::_Map_base<unsigned int, std::pair<unsigned int const, llvm::rdf::DataFlowGraph::DefStack>, std::allocator<std::pair<unsigned int const, llvm::rdf::DataFlowGraph::DefStack> >, std::_
   0.10%  llc      llc                  [.] llvm::sys::unicode::columnWidthUTF8
Quuxplusone commented 4 years ago

Note that I am also using the patch to fix the performance issue in the LVI mitigations: https://reviews.llvm.org/D84471. So the remaining overhead appears to be caused by RDF.

Quuxplusone commented 4 years ago
I'm surprised by the execution times that you're getting.  There is indeed a
compile-time increase related to RDF, but in my experiments it's not 16-fold
(1.2s -> 20s), but in the order of 15%.

On my machine:

--- master

$ time llc bigfunc.ll
real    1m28.141s
user    1m21.665s
sys     0m6.464s

$ time llc bigfunc-lvi.ll
real    2m9.032s
user    2m2.475s
sys     0m6.549s

--- master + D84471

$ time llc bigfunc.ll
real    1m27.341s
user    1m20.780s
sys     0m6.556s

$ time llc bigfunc-lvi.ll
real    1m51.290s
user    1m45.120s
sys     0m6.168s

--- master + D84471 + RDF wip fixes

$ time llc bigfunc.ll
real    1m27.275s
user    1m20.714s
sys     0m6.548s

$ time llc bigfunc-lvi.ll
real    1m40.499s
user    1m34.199s
sys     0m6.292s
Quuxplusone commented 4 years ago
(In reply to Krzysztof Parzyszek from comment #8)
> I'm surprised by the execution times that you're getting.  There is indeed a
> compile-time increase related to RDF, but in my experiments it's not 16-fold
> (1.2s -> 20s), but in the order of 15%.
>
> On my machine:
>
> --- master
>
> $ time llc bigfunc.ll
> real    1m28.141s
> user    1m21.665s
> sys     0m6.464s
>
> $ time llc bigfunc-lvi.ll
> real    2m9.032s
> user    2m2.475s
> sys     0m6.549s
>
> --- master + D84471
>
> $ time llc bigfunc.ll
> real    1m27.341s
> user    1m20.780s
> sys     0m6.556s
>
> $ time llc bigfunc-lvi.ll
> real    1m51.290s
> user    1m45.120s
> sys     0m6.168s
>
> --- master + D84471 + RDF wip fixes
>
> $ time llc bigfunc.ll
> real    1m27.275s
> user    1m20.714s
> sys     0m6.548s
>
> $ time llc bigfunc-lvi.ll
> real    1m40.499s
> user    1m34.199s
> sys     0m6.292s

I am even more surprised, since the original bug reporter
(https://github.com/rust-lang/rust/issues/74632) recently re-ran his own tests
and observed a ~24-fold increase with D84471, following a 180-fold increase
without D84471. I am at a total loss to explain how you are observing such a
small slowdown. What is the exact LLVM commit that you're building on, so we
can compare apples-to-apples?
Quuxplusone commented 4 years ago

I may have expensive checks enabled in my clang build. Let me rebuild everything and try again.

Quuxplusone commented 4 years ago
That turned out to be the case...  Revised times:

--- master

$ time llc bigfunc.ll
real    0m1.832s
user    0m1.784s
sys     0m0.049s

$ time llc bigfunc-lvi.ll
real    0m41.508s
user    0m41.406s
sys     0m0.101s

--- master + D84471

$ time llc bigfunc.ll
real    0m1.811s
user    0m1.759s
sys     0m0.053s

$ time llc bigfunc-lvi.ll
real    0m27.714s
user    0m27.645s
sys     0m0.069s

--- master + D84471 + RDF wip fixes

$ time llc bigfunc.ll
real    0m1.802s
user    0m1.765s
sys     0m0.037s

$ time llc bigfunc-lvi.ll
real    0m15.475s
user    0m15.398s
sys     0m0.077s
Quuxplusone commented 4 years ago
I committed a handful of patches that get the ~50% reduction in compile-time.
Any further improvements would likely require changing the algorithm, but I
don't have that much time at the moment (such a change would need to be
thoroughly tested).

Now I get

$ time llc bigfunc-lvi.ll
real    0m12.970s
user    0m12.884s
sys     0m0.077s
Quuxplusone commented 4 years ago
The patches

https://reviews.llvm.org/rG09897b146a8a
https://reviews.llvm.org/rG47fe1b63f449
https://reviews.llvm.org/rGf0f467aeecfc
https://reviews.llvm.org/rG4b25f672998f
https://reviews.llvm.org/rG9521704553e8

Not all of these are strictly performance-related, but there may be
dependencies between them.