Open Quuxplusone opened 5 years ago
Bugzilla Link | PR42987 |
Status | NEW |
Importance | P enhancement |
Reported by | agner@agner.org (agner@agner.org) |
Reported on | 2019-08-13 10:22:36 -0700 |
Last modified on | 2020-01-19 20:54:30 -0800 |
Version | 6.0 |
Hardware | PC All |
CC | blitzrakete@gmail.com, chandlerc@gmail.com, craig.topper@gmail.com, dgregor@apple.com, erik.pilkington@gmail.com, florian_hahn@apple.com, hfinkel@anl.gov, jdoerfert@anl.gov, llvm-bugs@lists.llvm.org, llvm-dev@redking.me.uk, llvm@meinersbur.de, richard-llvm@metafoo.co.uk, spatel+llvm@rotateright.com |
Fixed by commit(s) | |
Attachments | |
Blocks | |
Blocked by | |
See also | PR26532, PR44593 |
We have, indeed, always considered full loop unrolling part of the early
canonicalization process. We do, essentially, limit the partial unrolling
factor based on an estimate of the number of uops and the target-specific size
of the uop cache (the thresholds are now set by the LoopMicroOpBufferSize
variable in the various lib/Target/X86/X86Sched*.td files). Full unrolling,
however, we don't limit in the same way. I believe the rationale was that full
unrolling tends to enable other optimizations, and so we do this limited only
by some heuristic practicality threshold.
It might be interesting to conduct the following experiment. In
include/llvm/CodeGen/BasicTTIImpl.h, in getUnrollingPreferences, where we have
this:
UP.PartialThreshold = MaxOps;
add:
UP.Threshold = MaxOps;
and see how that affects things.
As Hal mentioned, (full) unrolling may enable other optimizations. I think of
folding an index expression such as i*8+1 into a constant, remove a constant
table lookup, or the SLP vectorizer might vectorize a stream of instructions
that the LoopVectorizer cannot.
However, this requires different heuristics than we currently have which
estimates the code size in instructions. The inliner heusristic e.g. also takes
into account parameters which become constant.