Open Quuxplusone opened 4 years ago
Bugzilla Link | PR43835 |
Status | NEW |
Importance | P enhancement |
Reported by | Roman Lebedev (lebedev.ri@gmail.com) |
Reported on | 2019-10-29 05:42:17 -0700 |
Last modified on | 2021-08-16 12:24:27 -0700 |
Version | trunk |
Hardware | PC Linux |
CC | efriedma@quicinc.com, florian_hahn@apple.com, hfinkel@anl.gov, johannes@jdoerfert.de, listmail@philipreames.com, llvm-bugs@lists.llvm.org, max.kazantsev@azul.com |
Fixed by commit(s) | |
Attachments | |
Blocks | |
Blocked by | |
See also | PR43836 |
Or more globally, this was the original pattern i was looking at:
https://godbolt.org/z/Bzn5Lz
#include <array>
#include <algorithm>
void sink(int* row, int* predRow);
void bad(int* data, int len, int width) {
for(int i = 1; i < len; i++) {
int *row = data + width*i;
int *predRow = data + width*(i-1);
sink(row, predRow);
}
}
void good(int* data, int len, int width) {
int* predRow = data;
for(int i = 1; i < len; i++, data+=width, predRow+=width) {
sink(data, predRow);
}
}
In first case there are two gep's, in second there is an extra phi
and only one gep.
Interesting. I suppose that IndVarSimplify doesn't catch this. Which form is actually better, in some general sense, is target dependent (depending on whethr a target has various pre/post increment operations), but LSR won't create new PHIs, and that's another complication in this space.
Surely sticking with the current status-quo leaves the performance on the floor
for the code written in the opposite style from the one optimal for given
$target.
I would really guess that the integer induction should be canonical?
For which targets would pointer induction be better?
From an analysis standpoint, it doesn't really matter much; the SCEV expressions are equivalent either way. And there aren't really any interesting optimizations if we canonicalize early, I think. IndVarSimplify used to canonicalize aggressively, but we turned that off a while back because it was messing up optimized loops without any obvious benefit.
If LSR can't treat the two forms as equivalent, that's obviously a problem. Not sure why that's happening here, though, at first glance.
The reason i'm revisiting this is https://reviews.llvm.org/D107935
I'm basically thinking about having a pass to rewrite all pointer math so that:
1. loop-invariant part of address computation should be done in preheader
2. canonicalize to index induction
There already exists SeparateConstOffsetFromGEP pass,
and it seems to be run by a few backends already.
Not sure if we can extend & use it.
Feel free to try, but if you seriously pursue this, I suspect you're going to end up spending a lot of time looking at LSR...
Please make sure you take some time to look at the history of disable-iv-rewrite.
Unfortunately, I agree with Eli that this is going to be delicate and very complicated to get tuned just right. The first step will be tuning LSR to ensure that the (proposed) canonical form gets turned into the optimal sequence on each target reliably. LSR is fragile and annoying, so that's a significant amount of work and risk to take on.
Unless you are strongly motivated by this case, I'd honestly advise against it.