Quuxplusone / LLVMBugzillaTest

0 stars 0 forks source link

Not able to vectorize struct members #50072

Open Quuxplusone opened 3 years ago

Quuxplusone commented 3 years ago
Bugzilla Link PR51103
Status NEW
Importance P enhancement
Reported by Simon Pilgrim (llvm-dev@redking.me.uk)
Reported on 2021-07-15 02:03:59 -0700
Last modified on 2021-09-28 01:09:12 -0700
Version trunk
Hardware PC Windows NT
CC a.bataev@hotmail.com, anton.a.afanasyev@gmail.com, efriedma@quicinc.com, florian_hahn@apple.com, jeroen.dobbelaere@synopsys.com, lebedev.ri@gmail.com, llvm-bugs@lists.llvm.org, nikita.ppv@gmail.com, ryta1203@gmail.com
Fixed by commit(s)
Attachments
Blocks
Blocked by
See also PR50933, PR51108
Reported here: https://lists.llvm.org/pipermail/llvm-dev/2021-July/151796.html

#define N 10
struct foo {
  float a[N][N];
  float b[N];
};

void compute(struct foo *p)
{
  for (int i = 0; i < N; i++) {
    for (int j = 0; j < N; j++) {
      p->a[i][j] += 0.1f * p->b[i];
    }
  }
}

LLVM isn't able to vectorize this code. I'm not sure what the issue is here
as it seems to vectorize this code:

#define N 10
struct foo {
  float a[N];
  float b[N];
};

void compute(struct foo *p)
{
  for (int i = 0; i < N; i++) {
      p->a[i] += 0.1f * p->b[i];
  }
}

Current Codegen: https://llvm.godbolt.org/z/nah3ov9fT
Quuxplusone commented 3 years ago
*** IR Dump Before LoopVectorizePass on _Z7computeP3foo ***
; Function Attrs: mustprogress nofree norecurse nosync nounwind uwtable
define dso_local void @_Z7computeP3foo(%struct.foo* nocapture %0)
local_unnamed_addr #0 {
  br label %.preheader

.preheader:                                       ; preds = %1, %4
  %indvars.iv14 = phi i64 [ 0, %1 ], [ %indvars.iv.next15, %4 ]
  %2 = getelementptr inbounds %struct.foo, %struct.foo* %0, i64 0, i32 1, i64 %indvars.iv14
  br label %5

3:                                                ; preds = %4
  ret void

4:                                                ; preds = %5
  %indvars.iv.next15 = add nuw nsw i64 %indvars.iv14, 1
  %exitcond16.not = icmp eq i64 %indvars.iv.next15, 10
  br i1 %exitcond16.not, label %3, label %.preheader, !llvm.loop !3

5:                                                ; preds = %.preheader, %5
  %indvars.iv = phi i64 [ 0, %.preheader ], [ %indvars.iv.next, %5 ]
  %6 = load float, float* %2, align 4, !tbaa !6
  %7 = fmul float %6, 0x3FB99999A0000000
  %8 = getelementptr inbounds %struct.foo, %struct.foo* %0, i64 0, i32 0, i64 %indvars.iv14, i64 %indvars.iv
  %9 = load float, float* %8, align 4, !tbaa !6
  %10 = fadd float %9, %7
  store float %10, float* %8, align 4, !tbaa !6
  %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
  %exitcond.not = icmp eq i64 %indvars.iv.next, 10
  br i1 %exitcond.not, label %4, label %5, !llvm.loop !10
}

LV: Checking a loop in "_Z7computeP3foo" from /tmp/test.ll
LV: Loop hints: force=? width=0 interleave=1
LV: Found a loop:
LV: Found an induction variable.
LV: Found FP op with unsafe algebra.
LV: Found FP op with unsafe algebra.
LAA: Found a loop in _Z7computeP3foo:
LAA: Processing memory accesses...
  AST: Alias Set Tracker: 1 alias sets for 2 pointer values.
  AliasSet[0x1538cc0, 2] may alias, No access Pointers: (float* %8, unknown before-or-after), (float* %2, unknown before-or-after)

LAA:   Accesses(3):
          %8 = getelementptr inbounds %struct.foo, %struct.foo* %0, i64 0, i32 0, i64 %indvars.iv14, i64 %indvars.iv (write)
          %2 = getelementptr inbounds %struct.foo, %struct.foo* %0, i64 0, i32 1, i64 %indvars.iv14 (read-only)
          %8 = getelementptr inbounds %struct.foo, %struct.foo* %0, i64 0, i32 0, i64 %indvars.iv14, i64 %indvars.iv (read)
Underlying objects for pointer   %8 = getelementptr inbounds %struct.foo,
%struct.foo* %0, i64 0, i32 0, i64 %indvars.iv14, i64 %indvars.iv
  %struct.foo* %0
Underlying objects for pointer   %8 = getelementptr inbounds %struct.foo,
%struct.foo* %0, i64 0, i32 0, i64 %indvars.iv14, i64 %indvars.iv
  %struct.foo* %0
Underlying objects for pointer   %2 = getelementptr inbounds %struct.foo,
%struct.foo* %0, i64 0, i32 1, i64 %indvars.iv14
  %struct.foo* %0
LAA: Found a runtime check ptr:  %8 = getelementptr inbounds %struct.foo,
%struct.foo* %0, i64 0, i32 0, i64 %indvars.iv14, i64 %indvars.iv
LAA: Found a runtime check ptr:  %2 = getelementptr inbounds %struct.foo,
%struct.foo* %0, i64 0, i32 1, i64 %indvars.iv14
LAA: We need to do 0 pointer comparisons.
LAA: May be able to perform a memory runtime check if needed.
LAA: Checking memory dependencies
LAA: Bad stride - Not striding over innermost loop   %2 = getelementptr
inbounds %struct.foo, %struct.foo* %0, i64 0, i32 1, i64 %indvars.iv14 SCEV:
{(400 + %0)<nuw>,+,4}<nuw><%.preheader>
LAA: Src Scev: {(400 + %0)<nuw>,+,4}<nuw><%.preheader>Sink Scev:
{{%0,+,40}<nuw><%.preheader>,+,4}<nuw><%5>(Induction step: 0)
LAA: Distance for   %6 = load float, float* %2, align 4, !tbaa !6 to   store
float %10, float* %8, align 4, !tbaa !6: {{-400,+,36}<%.preheader>,+,4}<nw><%5>
Pointer access with non-constant stride
LAA: Src Scev: {{%0,+,40}<nuw><%.preheader>,+,4}<nuw><%5>Sink Scev:
{{%0,+,40}<nuw><%.preheader>,+,4}<nuw><%5>(Induction step: 1)
LAA: Distance for   %9 = load float, float* %8, align 4, !tbaa !6 to   store
float %10, float* %8, align 4, !tbaa !6: 0
Total Dependences: 2
LAA: unsafe dependent memory operations in loop
LV: Can't vectorize due to memory conflicts
LV: Not vectorizing: Cannot prove legality.
Quuxplusone commented 3 years ago
Alias seems to be overly conservative here, or tbaa isn't working correctly, as
the reported alias set:

%2 = getelementptr inbounds %struct.foo, %struct.foo* %0, i64 0, i32 1, i64
%indvars.iv14 (read-only)
          %8 = getelementptr inbounds %struct.foo, %struct.foo* %0, i64 0, i32 0, i64 %indvars.iv14, i64 %indvars.iv (read)

are not overlapping accesses based on type, but this seems to only be an issue
with structs (ie, using the same base pointer?), as it can also vectorize:

#define N 10

float a[N][N];
float b[N];

void foo() {
  for (int i = 0; i < N; ++i) {
    for (int j = 0; j < N; ++j) {
      a[i][j] += 0.1f * b[i];
    }
  }
}
Quuxplusone commented 3 years ago

I think we need to view this as LICM failure - p->b[i] should be hoisted from the innermost loop, and then we get: https://llvm.godbolt.org/z/P8fPr5W9s

For that, yes, AA is somehow overly pessimistic here.

Quuxplusone commented 3 years ago
Isn't LICM dependent on the AA here? So I think this seems like an AA issue.

Right for the modified test case:

#define N 10
struct foo {
  float a[N][N];
  float b[N];
};

void compute(struct foo *p )
{
  for (int i = 0; i < N; i++) {
    float temp = .1f * p->b[i];
    for (int j = 0; j < N; ++j) {
      p->a[i][j] += temp;//0.1f * p->b[j];
    }
  }
}

LLVM is able to vectorize this if p->b[j] is manually hoisted.

I posted on the mailing list but I should post here, we have a partial fix for
this in BasicAliasAnalysis.cpp but it doesn't seem to be complete and may cause
errors, was hoping to get thoughts on it:

    unsigned MinNumIndex = std::min(GEP1->getNumIndices(),
                                    GEP2->getNumIndices());
    if (MinNumIndex < 2)
      return MayAlias;
    SmallVector<Value *, 8> IntermediateIndices;
    IntermediateIndices.push_back(GEP1->getOperand(1));
    // Note,
    // 1. The index to struct field can only be literal constant.
    // 2. So far all the struct field indexes are the same respectively.
    // 3. The array indexes could be different, but they don't matter here.
    // 4. The types should be the same, say, different array indexes don't
    //    matter, and the struct indexes be the same respectively.
    // If current type is StructType, and the pair of corresponding indexes
    // are not equal, NoAlias
    for (unsigned i = 1; i != MinNumIndex - 1; ++i) {
      if (isa<StructType>(GetElementPtrInst::getIndexedType(
              GEP1->getSourceElementType(), IntermediateIndices))) {
        ConstantInt *C1 = dyn_cast<ConstantInt>(GEP1->getOperand(i + 1));
        ConstantInt *C2 = dyn_cast<ConstantInt>(GEP2->getOperand(i + 1));
        if (C1 && C2 && C1->getSExtValue() != C2->getSExtValue())
          return NoAlias;
      }
      IntermediateIndices.push_back(GEP1->getOperand(i + 1));
    }
  }
  // Note that we fall back the original logic here, there are still some thing
  // not covered like p->a[2].x v.s. p->a[1].x.b[2]
Quuxplusone commented 3 years ago
So the problem i guess is:
  MayAlias: float* %arrayidx, float* %arrayidx8
  MayAlias:   %0 = load float, float* %arrayidx, align 4, !tbaa !6 <->   store float %add, float* %arrayidx8, align 4, !tbaa !6

https://godbolt.org/z/v67nErrnz
Quuxplusone commented 3 years ago

I'm sure this is a duplicate of something. The problem is that BasicAA doesn't know how to prove i < N, and there isn't any relevant TBAA info in the IR.

Quuxplusone commented 3 years ago
Roman,

 There is this code in aliasSameBasePointerGEPs:

if (GEP1->getNumIndices() != GEP2->getNumIndices() ||
      GEP1->getNumIndices() < 2)
  {
    errs()<<"1\n";
    return MayAlias;
  }

So if you look at your suggestion, the GEPs:

%arrayidx = getelementptr inbounds %struct.foo, %struct.foo* %p, i64 0, i32 1,
i64 %idxprom
%arrayidx8 = getelementptr inbounds %struct.foo, %struct.foo* %p, i64 0, i32 0,
i64 %idxprom, i64 %idxprom7

This condition seems overly conservative to me.
Quuxplusone commented 3 years ago
Eli,

  Would it be advantageous to handle the cases of constant indices?
Quuxplusone commented 3 years ago

The relevant indices aren't constant in your testcase.

See also https://llvm.org/docs/GetElementPtr.html#what-happens-if-an-array-index-is-out-of-bounds

Quuxplusone commented 3 years ago
Eli,

 Yes, I understand that, thanks.
Quuxplusone commented 3 years ago

If there's some case where we aren't handling constant indices, sure, we should handle them. Not sure what testcase you're thinking of, though.

aliasSameBasePointerGEPs was removed in https://reviews.llvm.org/D92723.