Open Quuxplusone opened 6 years ago
Reverting the following makes the error go away:
------------------------------------------------------------------------
r319531 | dinar | 2017-12-01 03:10:47 -0800 (Fri, 01 Dec 2017) | 22 lines
[SLPVectorizer] Failure to beneficially vectorize 'copyable' elements in
integer binary ops.
Patch tries to improve vectorization of the following code:
void add1(int * __restrict dst, const int * __restrict src) {
*dst++ = *src++;
*dst++ = *src++ + 1;
*dst++ = *src++ + 2;
*dst++ = *src++ + 3;
}
Allows to vectorize even if the very first operation is not a binary add, but just a load.
Fixed issues related to previous commit.
Reviewers: spatel, mzolotukhin, mkuper, hfinkel, RKSimon, filcab, ABataev
Reviewed By: ABataev, RKSimon
Subscribers: llvm-commits, RKSimon
Differential Revision: https://reviews.llvm.org/D28907
------------------------------------------------------------------------
Reverted in r319550
Bugpoint reduced (too much undef for my taste) IR test:
target triple = "x86_64-unknown-linux-gnu"
define void @PR35497([15 x i64]* %inptr) {
%arrayidx1 = getelementptr inbounds [15 x i64], [15 x i64]* %inptr, i64 0, i64 0
%arrayidx2 = getelementptr inbounds [15 x i64], [15 x i64]* %inptr, i64 0, i64 1
%t0 = load i64, i64* %arrayidx1, align 8
%t1 = load i64, i64* %arrayidx2, align 8
%add = add i64 %t0, -9223372002495037440
%add.1 = add i64 %t1, 9223372002495037440
%arrayidx3 = getelementptr inbounds [15 x i64], [15 x i64]* %inptr, i64 0, i64 4
%arrayidx4 = getelementptr inbounds [15 x i64], [15 x i64]* %inptr, i64 0, i64 5
%add24 = add i64 undef, undef
%add24.1 = add i64 undef, undef
%shr.2 = lshr i64 undef, 16
%add24.2 = add i64 %shr.2, undef
%sub12.4 = sub i64 undef, %add24
%and.4 = shl i64 %add24, 12
%shl.4 = and i64 %and.4, 268431360
%add18.4 = add i64 undef, %shl.4
%sub12.5 = sub i64 %add.1, %add24.1
store i64 %sub12.5, i64* %arrayidx2, align 8
%and.5 = shl i64 %add24.1, 12
%shl.5 = and i64 %and.5, 268431360
%add18.5 = add i64 undef, %shl.5
%add24.5 = add i64 undef, %add18.4
store i64 %add24.5, i64* %arrayidx4, align 8
%sub12.6 = sub i64 %add, %add24.2
store i64 %sub12.6, i64* %arrayidx1, align 8
%add24.6 = add i64 undef, %add18.5
store i64 %add24.6, i64* %arrayidx3, align 8
ret void
}
There's a lot of debug output spew from SLP, but I don't understand it yet.
I've tried to keep the lanes straight by renaming things here. The placement of
the unused instruction is important - that seems to be an anchor that causes
the misplaced instructions:
define void @PR35497([15 x i64]* %inptr) {
%arrayidx1 = getelementptr [15 x i64], [15 x i64]* %inptr, i64 0, i64 0
%arrayidx2 = getelementptr [15 x i64], [15 x i64]* %inptr, i64 0, i64 1
%arrayidx3 = getelementptr [15 x i64], [15 x i64]* %inptr, i64 0, i64 4
%arrayidx4 = getelementptr [15 x i64], [15 x i64]* %inptr, i64 0, i64 5
%ld1 = load i64, i64* %arrayidx1, align 8
%ld2 = load i64, i64* %arrayidx2, align 8
%a1 = add i64 %ld1, -9
%uu2 = add i64 undef, undef
%uu4 = add i64 undef, undef
%unused = add i64 undef, %uu4
%p3 = shl i64 %uu2, 12
%p4 = shl i64 %uu4, 12
%q1 = lshr i64 undef, 16
%q3 = add i64 %p3, 2
%q4 = add i64 %p4, 2
%r1 = add i64 %q1, undef
%r2 = add i64 %ld2, 9
%r3 = add i64 undef, %q3
%r4 = add i64 undef, %q4
%s1 = add i64 %a1, %r1
%s2 = add i64 %r2, %uu2
%s3 = add i64 undef, %r3
%s4 = add i64 undef, %r4
store i64 %s1, i64* %arrayidx1, align 8
store i64 %s2, i64* %arrayidx2, align 8
store i64 %s3, i64* %arrayidx3, align 8
store i64 %s4, i64* %arrayidx4, align 8
ret void
}
The regression was introduced somewhere in the 319524:319532 range.
See https://bugs.chromium.org/p/chromium/issues/detail?id=791046#c2 for preprocessed source and invocation.