Open Quuxplusone opened 8 years ago
In some cases, in addition to add and compare instruction, a mr instruction is
generated too, and we use two registers to keep track of induction variable.
The code pattern looks like this (intervening instructions removed):
mr r22,r21
addi r21,r22,4
cmplw cr6,r12,r21
bne cr6,10001120
Scalar Evolution makes it to ScalarEvolution::computeExitLimitFromICmp where it
creates SCEVs for LHS and RHS of the compare instruction. These expressions is
sound reasonable (loop is unrolled by a factor of 8) :
{8,+,8}<%for.body>
((-1 * (zext i3 (trunc i32 %m to i3) to i32))<nsw> + %m)
The subsequent analysis encounters a problem and concludes that the upper bound
cannot be computed. I have to look at the rest of the work in
computeExitLimitFromICmp to see what goes wrong.
A couple of earlier passes seem to introduce some questionable "trunc"s in the
loop control logic which confuses SCEV.
The first one is ind var simplify. Sanjoy D suggsted the following code change.
s/IsSigned == Cmp->isSigned()/IsSigned == Cmp->isSigned() || Cmp->isEquality()/
in WidenIV::widenLoopCompare.
The second one is in loop strength reduction.
Adding Sanjoy's suggested fix, and disabling LSR fixes the problem.
(In reply to comment #3)
Ignore this comment. I had problems reproducing it. It seems that the truncs generated during LFTR cause the problem.
fixed with
https://reviews.llvm.org/rL278421 (D23075)
The problem still exists for some simple loops such as this:
for( i = 0 ; i < n ; i++ )
{
a[i] += b[i] + d[i];;
}
(all array data types float)
again SCEV is unable to compute the iteration count of the loop: (from debug
output):
Exit Count for Loop at depth 1 containing: %vector.body<header><latch><exiting>
from block vector.body: ***COULDNOTCOMPUTE***
(In reply to comment #6)
> The problem still exists for some simple loops such as this:
>
> for( i = 0 ; i < n ; i++ )
> {
> a[i] += b[i] + d[i];;
> }
> (all array data types float)
>
> again SCEV is unable to compute the iteration count of the loop: (from debug
> output):
>
> Exit Count for Loop at depth 1 containing:
> %vector.body<header><latch><exiting>
> from block vector.body: ***COULDNOTCOMPUTE***
That looks like a vectorized loop, so it isn't too surprising that
SCEV could not compute the trip count here. Have you tried looking at
debug output after disabling vectorization and unrolling?
> That looks like a vectorized loop, so it isn't too surprising that
> SCEV could not compute the trip count here. Have you tried looking at
> debug output after disabling vectorization and unrolling?
Yes, if I disable both vectorization and unrolling, then bdnz is generated.
Disabling vectorization alone is not enough.
Do we know why SCEV cannot compute the iteration count when the loop is
unrolled/vectorized?
(In reply to comment #8)
> > That looks like a vectorized loop, so it isn't too surprising that
> > SCEV could not compute the trip count here. Have you tried looking at
> > debug output after disabling vectorization and unrolling?
>
> Yes, if I disable both vectorization and unrolling, then bdnz is generated.
> Disabling vectorization alone is not enough.
>
> Do we know why SCEV cannot compute the iteration count when the loop is
> unrolled/vectorized?
I will have to take a look at the loop to give a concrete reason. Can
you please extract out the loop with the missing trip count and either
put it on this bug or on its own bug?
Also, I'm not familiar with PPC; can you specify the relation between
BDNZ and the trip count? Is BDNZ a loop instruction, like x86's REP?
> I will have to take a look at the loop to give a concrete reason. Can
> you please extract out the loop with the missing trip count and either
> put it on this bug or on its own bug?
>
> Also, I'm not familiar with PPC; can you specify the relation between
> BDNZ and the trip count? Is BDNZ a loop instruction, like x86's REP?
Thanks Sanjoy. I am not familiar with REP so here is a description of what BDNZ
is:
There is a register CTR on PPC that you can initialize before the loop. Then
each bdnz instruction will decrement that register and will branch if it is non-
zero. So we save a couple of instructions and use one less register. (The
instruction has some variants as well. I use bdnz to represent any proper insn
in this category).
So by extracting the loop, do you mean copying loops'IR just before the
transformation that generates BDNZ? If yes, I will put that in the next comment
(or send by email if it is too long, to avoid polluting this item).
(In reply to comment #10)
> So by extracting the loop, do you mean copying loops'IR just before the
> transformation that generates BDNZ? If yes, I will put that in the next
> comment (or send by email if it is too long, to avoid polluting this item).
Yes, that will do as a starting point. Basically I'm looking for
whatever IR illustrates the problem you have with trip counts, without
involving the PPC backend.
Note: if the IR file is large, you can always attach it to the bug
instead of pasting it into a comment. That will be better than a
one-off personal email.
I have to run now. Will attach the source code and IR later.(In reply to
comment #11)
> (In reply to comment #10)
> > So by extracting the loop, do you mean copying loops'IR just before the
> > transformation that generates BDNZ? If yes, I will put that in the next
> > comment (or send by email if it is too long, to avoid polluting this item).
>
> Yes, that will do as a starting point. Basically I'm looking for
> whatever IR illustrates the problem you have with trip counts, without
> involving the PPC backend.
>
> Note: if the IR file is large, you can always attach it to the bug
> instead of pasting it into a comment. That will be better than a
> one-off personal email.
Thanks. Yes, I have to run now. Will attach IR later.
Attached vec.ll
(9622 bytes, application/octet-stream): vectorized code
Attached novec.ll
(6439 bytes, application/octet-stream): compiled with -fno-vectorize
LSR creates an IV, and relies on SCEVExpander::expandIVInc to create IV bump insn. The calls to CreateAdd and CreateSub do not pass any value for nsw and nuw flags. So it defaults to false.
The same is true for loop vectorizer when it creates an IV in InnerLoopVectorizer::createInductionVariable.
When I force the calls to CreateAdd in these places to set nsw and nuw, to true, we generate BDNZ.
I don't know, if this is an oversight, or there is a good reason for avoiding setting this flag.
vec.ll
(9622 bytes, application/octet-stream)novec.ll
(6439 bytes, application/octet-stream)