[LoopAccessAnalysis / IndVarSimplify]

Quuxplusone commented 6 years ago


Bugzilla Link	PR38935
Status	NEW
Importance	P enhancement
Reported by	Jonas Paulsson (paulsson@linux.vnet.ibm.com)
Reported on	2018-09-13 07:36:22 -0700
Last modified on	2019-01-28 07:17:46 -0800
Version	trunk
Hardware	PC Linux
CC	a.zaafrani@samsung.com, hfinkel@anl.gov, llvm-bugs@lists.llvm.org, max.kazantsev@azul.com, paulsson@linux.vnet.ibm.com, ulrich.weigand@de.ibm.com
Fixed by commit(s)
Attachments	`tc_indvar_vec.ll` (4503 bytes, text/plain)
Blocks
Blocked by
See also

Created attachment 20876
reduced testcase

The reduced test case gets its loop vectorized on x86 (icelake-server),while on
SystemZ (z14), this fails (actually in the reduced test case VF 1 is selected,
but the point is that it is *possible* for only one of the targets).

On SystemZ, the IV is rewritten from i32 to i64, which seems to somehow result
in 'LAA: SCEV could not compute the loop exit count.'

It seems like a bug somewhere that when IndVarSimplify makes a transformation,
the SCEV becomes uncomputable.

Could this be something in ScalarEvolution to fix, perhaps?

opt tc_indvar_vec.ll -mtriple=i686-unknown-linux-gnu -mcpu=icelake-server -S -o
/dev/null -O3 -debug-only=loop-accesses |& grep "exit count"

opt tc_indvar_vec.ll -mtriple=systemz-unknown -mcpu=z14 -S -o /dev/null -O3 -
debug-only=loop-accesses |& grep "exit count"
LAA: SCEV could not compute the loop exit count.
LAA: SCEV could not compute the loop exit count.

Quuxplusone commented 6 years ago

Attached tc_indvar_vec.ll (4503 bytes, text/plain): reduced testcase

Quuxplusone commented 6 years ago

It seems that IndVarSimplify is dropping the nsw flag on the IV add after
converting it from i32 to i64. I am guessing perhaps the nsw should not be
dropped and that SCEV would succeed if it hadn't been?

...
.lr.ph.preheader:                                 ; preds = %6
  br label %.lr.ph

.lr.ph:                                           ; preds = %.lr.ph.preheader,
%.lr.ph
  %.112 = phi fp128* [ %12, %.lr.ph ], [ %.0, %.lr.ph.preheader ]
  %.11011 = phi i32 [ %13, %.lr.ph ], [ %.09, %.lr.ph.preheader ]
  %9 = sext i32 %.11011 to i64
  %10 = getelementptr inbounds fp128, fp128* %5, i64 %9
  %11 = load fp128, fp128* %10, align 8, !tbaa !5
  %12 = getelementptr inbounds fp128, fp128* %.112, i64 1
  store fp128 %11, fp128* %.112, align 8, !tbaa !5
  %13 = add nsw i32 %7, %.11011
  %14 = icmp slt i32 %13, 0
  br i1 %14, label %.lr.ph, label %._crit_edge.loopexit
...

*** IR Dump After Induction Variable Simplification ***

...
.lr.ph.preheader:                                 ; preds = %6
  %9 = sext i32 %.09 to i64
  %10 = sext i32 %7 to i64
  br label %.lr.ph

; Loop:
.lr.ph:                                           ; preds = %.lr.ph.preheader,
%.lr.ph
  %indvars.iv = phi i64 [ %9, %.lr.ph.preheader ], [ %indvars.iv.next, %.lr.ph ]
  %.112 = phi fp128* [ %13, %.lr.ph ], [ %.0, %.lr.ph.preheader ]
  %11 = getelementptr inbounds fp128, fp128* %5, i64 %indvars.iv
  %12 = load fp128, fp128* %11, align 8, !tbaa !5
  %13 = getelementptr inbounds fp128, fp128* %.112, i64 1
  store fp128 %12, fp128* %.112, align 8, !tbaa !5
  %indvars.iv.next = add i64 %indvars.iv, %10
  %14 = icmp slt i64 %indvars.iv.next, 0
  br i1 %14, label %.lr.ph, label %._crit_edge.loopexit
...

Quuxplusone commented 6 years ago

I experimented further with this and found that first of all, adding -verify-
indvars actually triggers the assert:

/IndVarSimplify.cpp:2688: bool {anonymous}::IndVarSimplify::run(llvm::Loop*):
Assertion `BackedgeTakenCount == NewBECount && "indvars must preserve SCEV"'
failed.

So it seems that something is broken here.

Furthermore, I stepped through this a bit in gdb (with some minor experimental
hacks in IndVarSimplify.cpp), and found that if I called forgetLoop() after the
widening of the induction variable, and then asked for the backedge taken
count, it seemed the UniqueSCEVs map actually wasn't cleared. When
getAddRecExpr(StartVal, Accum, L, Flags) is called in createAddRecFromPHI(),
the SCEV is actually not created (recomputed) but looked up and found in
UniqueSCEVs map. I forced it to create a new SCEVAddRecExpr at
ScalarEvolution.cpp:3418 by setting *S to nullptr in gdb, and it then returned
the same SCEV but without the nsw flag!

Is there some way to really empty the maps and rebuild the trip count from
scratch independently of the SE-using algorithm (IndVarSimplify)?

Quuxplusone commented 6 years ago

I built SPEC-2006 on SystemZ with -verify-indvars and found that the even the code that was supposed to verify the BackEdge taken count itself crashed since NewBECount was a SCEVCouldNotCompute (when the original BackedgeTakenCount was not). This happened 30 times.

I then changed the actual assert to an if so that I could get the message without aborting. It then turns out that this assert would have triggered on +2000 loops!

So it seems there may be an area of potential improvement here, or?

Thankful for any help or hint.

/Jonas

Quuxplusone commented 5 years ago

I took another look at the amount of failures of -verify-indvars, and my new
numbers are making a bit more sense. I tried this both on SystemZ and X86 and
for the handled loops from SPEC-2006:

z13:
26775 IndVarSimplify: Same SCEV!
 2322 IndVarSimplify: Different SCEVs!
   30 IndVarSimplify: could-not-compute NewBECount!

x86-64:
23057 IndVarSimplify: Same SCEV!
 2261 IndVarSimplify: Different SCEVs!
   28 IndVarSimplify: could-not-compute NewBECount!

The third row respectively gives the count for (where my test case previously
posted is one of them), the cases where a new IV is produced that results in SE-
>getBackedgeTakenCount(L) returning a SCEVCouldNotCompute.

The second row is the number of loops where the -verify-indvars triggers the
assert

assert(BackedgeTakenCount == NewBECount && "indvars must preserve SCEV");

In my test case, I see this assert triggering with

(gdb) p BackedgeTakenCount->dump()
((-1 + (-1 * %.09)) /u %7)

(gdb) p NewBECount->dump()
((-1 + (-1 * (sext i32 %.09 to i64))<nsw>)<nsw> /u (sext i32 %7 to i64))

-> getTruncateOrNoop()

(gdb) p NewBECount->dump()
(trunc i64 ((-1 + (-1 * (sext i32 %.09 to i64))<nsw>)<nsw> /u (sext i32 %7 to
i64)) to i32)

It seems expected at this point that getTruncateOrNoop(NewBECount) should
return the original SCEV (BackedgeTakenCount), but this does not happen. Is
there perhaps some function that should be called to check if they are
equivalent although not equal? Or is it really expected that the smaller SCEV
should be returned if that is the case?

So, I have now two questions:

1. Does anybody know where to start with handling the case where the trip count
becomes in-computable? I have made a description for the test case already
previously. This seems to happen in a small number of cases both on SystemZ and
X86, at least.

2. Is the assert under -verify-indvars correct, or should it allow an
equivalent  SCEV somehow?

Quuxplusone / LLVMBugzillaTest

[LoopAccessAnalysis / IndVarSimplify] #37908