Quuxplusone / LLVMBugzillaTest

0 stars 0 forks source link

Reordering IR-level optimization passes. #17808

Open Quuxplusone opened 10 years ago

Quuxplusone commented 10 years ago
Bugzilla Link PR17809
Status NEW
Importance P normal
Reported by Andrew Trick (atrick@apple.com)
Reported on 2013-11-04 19:47:01 -0800
Last modified on 2014-06-02 22:03:44 -0700
Version trunk
Hardware PC All
CC hfinkel@anl.gov, llvm-bugs@lists.llvm.org, rafael@espindo.la
Fixed by commit(s)
Attachments systemz-loop.ll (2140 bytes, text/plain)
Blocks
Blocked by
See also PR16116, PR18449
In July, I proposed a new pass order pipeline that was fairly well received
http://thread.gmane.org/gmane.comp.compilers.llvm.devel/63921

(There is a natural conflict between loop-nest-optimization and an otherwise
efficient optimization pipieline, which this proposal does not solve).

However, the work is stalled. To make progress, we would need to begin taking
incremental steps. It might be helpful if the new PassManager design is
introduced first, but that isn't a requirement.

I'm filing a bug report now so people can add test cases or relate this to
other bugs. That will help motivate the work.

Here is the original proposal:

Canonicalization passes are designed to normalize the IR in order to expose
opportunities to subsequent machine independent passes. This simplifies writing
machine independent optimizations and improves the quality of the compiler.

An important property of these passes is that they are repeatable. The may be
invoked multiple times after inlining and should converge to a canonical form.
They should not destructively transform the IR in a way that defeats subsequent
analysis.

Canonicalization passes can make use of data layout, but are otherwise target
independent. Adding target specific hooks to these passes can defeat the
purpose of canonical IR.

IR Canonicalization Pipeline:

Function Passes {
  SimplifyCFG
  SROA-1
  EarlyCSE
}
Call-Graph SCC Passes {
  Inline
  Function Passes {
    EarlyCSE
    SimplifyCFG
    InstCombine
    Early Loop Opts {
      LoopSimplify
      Rotate (when obvious)
      Full-Unroll (when obvious)
    }
    SROA-2
    InstCombine
    GVN
    Reassociate
    Generic Loop Opts {
      LICM (Rotate on-demand)
      Unswitch
    }
    SCCP
    InstCombine
    JumpThreading
    CorrelatedValuePropagation
    AggressiveDCE
  }
}

IR optimizations that require target information or destructively modify the IR
can run in a separate pipeline. This helps make a more a clean distinction
between passes that may and may not use TargetTransformInfo.

TargetTransformInfo encapsultes legal types and operation costs. IR instruction
costs are approximate and relative. They do not represent def-use latencies nor
do they distinguish between latency and cpu resources requirements--that level
of machine modeling needs to be done in MI passes.

IR Lowering Pipeline:

Function Passes {
  Target SimplifyCFG (OptimizeCFG?)
  Target InstCombine (InstOptimize?)
  Target Loop Opts {
    SCEV
    IndvarSimplify (mainly sxt/zxt elimination)
    Vectorize/Unroll
    LSR (move LFTR here too)
  }
  SLP Vectorize
  LowerSwitch
  CodeGenPrepare
}
---

The above pass ordering is roughly something I think we can live
with. Notice that I have:
  Full-Unroll -> SROA-2 -> GVN -> Loop-Opts
since that solves some issues we have today.

I don't currently have any reason to reorder the "late" IR optimization passes
(those after generic loop opts). We do either need a GVN-util that late loops
opts and lowering passes may call on-demand after code motion, or we can rerun
a non-iterative GVN-lite as a cleanup after lowering passes.

If anyone can think of important dependencies between IR passes, this would be
good time to point it out.

We could probably make a minor adjustment to the opt driver so that the user
can specify any mix of canonical and lowering passes. The first lowering pass
and subsequent passes would run in the lowering function pass manager.
Quuxplusone commented 10 years ago

Attached systemz-loop.ll (2140 bytes, text/plain): SystemZ testcase that benefits from earlier GVN