Open Quuxplusone opened 8 years ago
Bugzilla Link | PR25581 |
Status | REOPENED |
Importance | P normal |
Reported by | Carrot (carrot@google.com) |
Reported on | 2015-11-19 18:09:59 -0800 |
Last modified on | 2017-05-26 17:07:29 -0700 |
Version | trunk |
Hardware | PC Linux |
CC | echristo@gmail.com, ehsanamiri@gmail.com, hfinkel@anl.gov, llvm-bugs@lists.llvm.org |
Fixed by commit(s) | |
Attachments | |
Blocks | |
Blocked by | |
See also |
Is this a backend or a mid-end issue? Can you please provide some corresponding IR?
(In reply to comment #1)
> Is this a backend or a mid-end issue? Can you please provide some
> corresponding IR?
FYI: I can reproduce, but I'll also note that the problem does not appear when
compiling with -O3.
What happens at O3 is that a transformation called SeparateConstOffsetFromGEP in llc, commons some part of address for h[j] and h[j+1]. The optimization will work when we have GEPs with constant distance from each other.
This is some sort of commoning. Can't we turn that on at O2?
We can look at this from another angle as well. The code that we are currently discussing is corresponding to accesses to h[j] and h[j+1] in the first if statement.
There is a subsequent if statement that checks h[i] and h[j]. The optimization above does not work here because i and j do not have a constant distance from each other. So even in O3 code we have two extsw and two sldi for the second if statement.
We can think of an algorithm that can understand both i and j are indices to an array and promote them to 64 bit before the loop. This will remove all extsw instns inside the loop. I suspect that we already do this for canonicalized loops, so this will be useful only for loops that cannot be canonicalized or code that is not inside a loop.
There might be benefit in following both paths.
SeparateConstOffsetFromGEP is controlled in PPCPassConfig::addIRPasses(). Currently it requires O3. We can turn it on, with no impact on other targets. We need to discuss about all three passes in PPCPassConfig::addIRPasses() whether we want them to be turned on at O2 or not. (I think LICM should be safe to move to O2, I have more doubts about other two).
(In reply to comment #4)
> SeparateConstOffsetFromGEP is controlled in PPCPassConfig::addIRPasses().
> Currently it requires O3. We can turn it on, with no impact on other
> targets. We need to discuss about all three passes in
> PPCPassConfig::addIRPasses() whether we want them to be turned on at O2 or
> not. (I think LICM should be safe to move to O2, I have more doubts about
> other two).
We have SeparateConstOffsetFromGEP, EarlyCSE (which is designed to be cheap),
and LICM. All of these can be used at -O2. I have no problem with that. I put
them at -O3 only because I was being conservative with
SeparateConstOffsetFromGEP (which was a new optimization at the time). Now that
we have a use case for them to be at -O2, please go ahead with enabling them at
-O2.
Regarding developing a new widening transformations for induction variables in
non-canonical loops, that certainly could be useful.
I ran the test suite with the three transformations above enable at O3 vs. O2. The difference in compile time is negligible. Will submit the patch to make the change.
patch http://reviews.llvm.org/D18562 committed
Definitely doesn't look fixed:
10072360: a0 00 6c e9 ld 11, 160(12)
10072364: a0 00 be eb ld 29, 160(30)
10072368: 40 e8 2b 7c cmpld 11, 29
1007236c: 5e 30 67 7d isel 11, 7, 6, 1
10072370: 00 00 0b 28 cmplwi 11, 0
10072374: 9e 20 8a 7c isel 4, 10, 4, 2
10072378: b4 07 0a 7d extsw 10, 8
1007237c: b4 07 8b 7c extsw 11, 4
10072380: 24 1f 4a 79 sldi 10, 10, 3
10072384: 24 1f 6b 79 sldi 11, 11, 3
10072388: 2a 50 89 7d ldx 12, 9, 10
1007238c: 6a 58 69 7d ldux 11, 9, 11
10072390: 98 2c 0c 7c lxsdx 0, 12, 5
10072394: 98 2c 2b 7c lxsdx 1, 11, 5
10072398: 18 09 00 f0 xscmpudp 0, 0, 1
1007239c: 58 00 80 41 bt 0, .+88
just different. Investigating.
To be clear this is mostly that the code isn't great since it still has the two extsw/sldi with the addition now of some isel, rather than the GEP offset pass not working.
(In reply to Eric Christopher from comment #9)
> To be clear this is mostly that the code isn't great since it still has the
> two extsw/sldi with the addition now of some isel, rather than the GEP
> offset pass not working.
If you read bug description and comment 3, GEP offset removes other pairs of
extsw/sldi. Bug description is about a very specific problem and did not extend
to any potential improvement in the code.