Open Quuxplusone opened 3 years ago
Bugzilla Link | PR51103 |
Status | NEW |
Importance | P enhancement |
Reported by | Simon Pilgrim (llvm-dev@redking.me.uk) |
Reported on | 2021-07-15 02:03:59 -0700 |
Last modified on | 2021-09-28 01:09:12 -0700 |
Version | trunk |
Hardware | PC Windows NT |
CC | a.bataev@hotmail.com, anton.a.afanasyev@gmail.com, efriedma@quicinc.com, florian_hahn@apple.com, jeroen.dobbelaere@synopsys.com, lebedev.ri@gmail.com, llvm-bugs@lists.llvm.org, nikita.ppv@gmail.com, ryta1203@gmail.com |
Fixed by commit(s) | |
Attachments | |
Blocks | |
Blocked by | |
See also | PR50933, PR51108 |
*** IR Dump Before LoopVectorizePass on _Z7computeP3foo ***
; Function Attrs: mustprogress nofree norecurse nosync nounwind uwtable
define dso_local void @_Z7computeP3foo(%struct.foo* nocapture %0)
local_unnamed_addr #0 {
br label %.preheader
.preheader: ; preds = %1, %4
%indvars.iv14 = phi i64 [ 0, %1 ], [ %indvars.iv.next15, %4 ]
%2 = getelementptr inbounds %struct.foo, %struct.foo* %0, i64 0, i32 1, i64 %indvars.iv14
br label %5
3: ; preds = %4
ret void
4: ; preds = %5
%indvars.iv.next15 = add nuw nsw i64 %indvars.iv14, 1
%exitcond16.not = icmp eq i64 %indvars.iv.next15, 10
br i1 %exitcond16.not, label %3, label %.preheader, !llvm.loop !3
5: ; preds = %.preheader, %5
%indvars.iv = phi i64 [ 0, %.preheader ], [ %indvars.iv.next, %5 ]
%6 = load float, float* %2, align 4, !tbaa !6
%7 = fmul float %6, 0x3FB99999A0000000
%8 = getelementptr inbounds %struct.foo, %struct.foo* %0, i64 0, i32 0, i64 %indvars.iv14, i64 %indvars.iv
%9 = load float, float* %8, align 4, !tbaa !6
%10 = fadd float %9, %7
store float %10, float* %8, align 4, !tbaa !6
%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
%exitcond.not = icmp eq i64 %indvars.iv.next, 10
br i1 %exitcond.not, label %4, label %5, !llvm.loop !10
}
LV: Checking a loop in "_Z7computeP3foo" from /tmp/test.ll
LV: Loop hints: force=? width=0 interleave=1
LV: Found a loop:
LV: Found an induction variable.
LV: Found FP op with unsafe algebra.
LV: Found FP op with unsafe algebra.
LAA: Found a loop in _Z7computeP3foo:
LAA: Processing memory accesses...
AST: Alias Set Tracker: 1 alias sets for 2 pointer values.
AliasSet[0x1538cc0, 2] may alias, No access Pointers: (float* %8, unknown before-or-after), (float* %2, unknown before-or-after)
LAA: Accesses(3):
%8 = getelementptr inbounds %struct.foo, %struct.foo* %0, i64 0, i32 0, i64 %indvars.iv14, i64 %indvars.iv (write)
%2 = getelementptr inbounds %struct.foo, %struct.foo* %0, i64 0, i32 1, i64 %indvars.iv14 (read-only)
%8 = getelementptr inbounds %struct.foo, %struct.foo* %0, i64 0, i32 0, i64 %indvars.iv14, i64 %indvars.iv (read)
Underlying objects for pointer %8 = getelementptr inbounds %struct.foo,
%struct.foo* %0, i64 0, i32 0, i64 %indvars.iv14, i64 %indvars.iv
%struct.foo* %0
Underlying objects for pointer %8 = getelementptr inbounds %struct.foo,
%struct.foo* %0, i64 0, i32 0, i64 %indvars.iv14, i64 %indvars.iv
%struct.foo* %0
Underlying objects for pointer %2 = getelementptr inbounds %struct.foo,
%struct.foo* %0, i64 0, i32 1, i64 %indvars.iv14
%struct.foo* %0
LAA: Found a runtime check ptr: %8 = getelementptr inbounds %struct.foo,
%struct.foo* %0, i64 0, i32 0, i64 %indvars.iv14, i64 %indvars.iv
LAA: Found a runtime check ptr: %2 = getelementptr inbounds %struct.foo,
%struct.foo* %0, i64 0, i32 1, i64 %indvars.iv14
LAA: We need to do 0 pointer comparisons.
LAA: May be able to perform a memory runtime check if needed.
LAA: Checking memory dependencies
LAA: Bad stride - Not striding over innermost loop %2 = getelementptr
inbounds %struct.foo, %struct.foo* %0, i64 0, i32 1, i64 %indvars.iv14 SCEV:
{(400 + %0)<nuw>,+,4}<nuw><%.preheader>
LAA: Src Scev: {(400 + %0)<nuw>,+,4}<nuw><%.preheader>Sink Scev:
{{%0,+,40}<nuw><%.preheader>,+,4}<nuw><%5>(Induction step: 0)
LAA: Distance for %6 = load float, float* %2, align 4, !tbaa !6 to store
float %10, float* %8, align 4, !tbaa !6: {{-400,+,36}<%.preheader>,+,4}<nw><%5>
Pointer access with non-constant stride
LAA: Src Scev: {{%0,+,40}<nuw><%.preheader>,+,4}<nuw><%5>Sink Scev:
{{%0,+,40}<nuw><%.preheader>,+,4}<nuw><%5>(Induction step: 1)
LAA: Distance for %9 = load float, float* %8, align 4, !tbaa !6 to store
float %10, float* %8, align 4, !tbaa !6: 0
Total Dependences: 2
LAA: unsafe dependent memory operations in loop
LV: Can't vectorize due to memory conflicts
LV: Not vectorizing: Cannot prove legality.
Alias seems to be overly conservative here, or tbaa isn't working correctly, as
the reported alias set:
%2 = getelementptr inbounds %struct.foo, %struct.foo* %0, i64 0, i32 1, i64
%indvars.iv14 (read-only)
%8 = getelementptr inbounds %struct.foo, %struct.foo* %0, i64 0, i32 0, i64 %indvars.iv14, i64 %indvars.iv (read)
are not overlapping accesses based on type, but this seems to only be an issue
with structs (ie, using the same base pointer?), as it can also vectorize:
#define N 10
float a[N][N];
float b[N];
void foo() {
for (int i = 0; i < N; ++i) {
for (int j = 0; j < N; ++j) {
a[i][j] += 0.1f * b[i];
}
}
}
I think we need to view this as LICM failure - p->b[i]
should be hoisted
from the innermost loop, and then we get:
https://llvm.godbolt.org/z/P8fPr5W9s
For that, yes, AA is somehow overly pessimistic here.
Isn't LICM dependent on the AA here? So I think this seems like an AA issue.
Right for the modified test case:
#define N 10
struct foo {
float a[N][N];
float b[N];
};
void compute(struct foo *p )
{
for (int i = 0; i < N; i++) {
float temp = .1f * p->b[i];
for (int j = 0; j < N; ++j) {
p->a[i][j] += temp;//0.1f * p->b[j];
}
}
}
LLVM is able to vectorize this if p->b[j] is manually hoisted.
I posted on the mailing list but I should post here, we have a partial fix for
this in BasicAliasAnalysis.cpp but it doesn't seem to be complete and may cause
errors, was hoping to get thoughts on it:
unsigned MinNumIndex = std::min(GEP1->getNumIndices(),
GEP2->getNumIndices());
if (MinNumIndex < 2)
return MayAlias;
SmallVector<Value *, 8> IntermediateIndices;
IntermediateIndices.push_back(GEP1->getOperand(1));
// Note,
// 1. The index to struct field can only be literal constant.
// 2. So far all the struct field indexes are the same respectively.
// 3. The array indexes could be different, but they don't matter here.
// 4. The types should be the same, say, different array indexes don't
// matter, and the struct indexes be the same respectively.
// If current type is StructType, and the pair of corresponding indexes
// are not equal, NoAlias
for (unsigned i = 1; i != MinNumIndex - 1; ++i) {
if (isa<StructType>(GetElementPtrInst::getIndexedType(
GEP1->getSourceElementType(), IntermediateIndices))) {
ConstantInt *C1 = dyn_cast<ConstantInt>(GEP1->getOperand(i + 1));
ConstantInt *C2 = dyn_cast<ConstantInt>(GEP2->getOperand(i + 1));
if (C1 && C2 && C1->getSExtValue() != C2->getSExtValue())
return NoAlias;
}
IntermediateIndices.push_back(GEP1->getOperand(i + 1));
}
}
// Note that we fall back the original logic here, there are still some thing
// not covered like p->a[2].x v.s. p->a[1].x.b[2]
So the problem i guess is:
MayAlias: float* %arrayidx, float* %arrayidx8
MayAlias: %0 = load float, float* %arrayidx, align 4, !tbaa !6 <-> store float %add, float* %arrayidx8, align 4, !tbaa !6
https://godbolt.org/z/v67nErrnz
I'm sure this is a duplicate of something. The problem is that BasicAA doesn't know how to prove i < N, and there isn't any relevant TBAA info in the IR.
Roman,
There is this code in aliasSameBasePointerGEPs:
if (GEP1->getNumIndices() != GEP2->getNumIndices() ||
GEP1->getNumIndices() < 2)
{
errs()<<"1\n";
return MayAlias;
}
So if you look at your suggestion, the GEPs:
%arrayidx = getelementptr inbounds %struct.foo, %struct.foo* %p, i64 0, i32 1,
i64 %idxprom
%arrayidx8 = getelementptr inbounds %struct.foo, %struct.foo* %p, i64 0, i32 0,
i64 %idxprom, i64 %idxprom7
This condition seems overly conservative to me.
Eli,
Would it be advantageous to handle the cases of constant indices?
The relevant indices aren't constant in your testcase.
See also https://llvm.org/docs/GetElementPtr.html#what-happens-if-an-array-index-is-out-of-bounds
Eli,
Yes, I understand that, thanks.
If there's some case where we aren't handling constant indices, sure, we should handle them. Not sure what testcase you're thinking of, though.
aliasSameBasePointerGEPs was removed in https://reviews.llvm.org/D92723.