Trunk Clang fails to build Chromium with "Instruction does not dominate all uses"

Quuxplusone commented 6 years ago


Bugzilla Link	PR35497
Status	NEW
Importance	P enhancement
Reported by	Hans Wennborg (hans@chromium.org)
Reported on	2017-12-01 08:16:02 -0800
Last modified on	2018-01-12 07:23:00 -0800
Version	trunk
Hardware	PC Linux
CC	dtemirbulatov@gmail.com, llvm-bugs@lists.llvm.org, llvm-dev@redking.me.uk, spatel+llvm@rotateright.com
Fixed by commit(s)
Attachments
Blocks	PR30787
Blocked by
See also

The regression was introduced somewhere in the 319524:319532 range.

See https://bugs.chromium.org/p/chromium/issues/detail?id=791046#c2 for preprocessed source and invocation.

Quuxplusone commented 6 years ago

Reverting the following makes the error go away:

------------------------------------------------------------------------
r319531 | dinar | 2017-12-01 03:10:47 -0800 (Fri, 01 Dec 2017) | 22 lines

[SLPVectorizer] Failure to beneficially vectorize 'copyable' elements in
integer binary ops.

            Patch tries to improve vectorization of the following code:

            void add1(int * __restrict dst, const int * __restrict src) {
              *dst++ = *src++;
              *dst++ = *src++ + 1;
              *dst++ = *src++ + 2;
              *dst++ = *src++ + 3;
            }
            Allows to vectorize even if the very first operation is not a binary add, but just a load.

            Fixed issues related to previous commit.

            Reviewers: spatel, mzolotukhin, mkuper, hfinkel, RKSimon, filcab, ABataev

            Reviewed By: ABataev, RKSimon

            Subscribers: llvm-commits, RKSimon

            Differential Revision: https://reviews.llvm.org/D28907

------------------------------------------------------------------------

Quuxplusone commented 6 years ago

Reverted in r319550

Quuxplusone commented 6 years ago

Bugpoint reduced (too much undef for my taste) IR test:

target triple = "x86_64-unknown-linux-gnu"

define void @PR35497([15 x i64]* %inptr) {
  %arrayidx1 = getelementptr inbounds [15 x i64], [15 x i64]* %inptr, i64 0, i64 0
  %arrayidx2 = getelementptr inbounds [15 x i64], [15 x i64]* %inptr, i64 0, i64 1
  %t0 = load i64, i64* %arrayidx1, align 8
  %t1 = load i64, i64* %arrayidx2, align 8
  %add = add i64 %t0, -9223372002495037440
  %add.1 = add i64 %t1, 9223372002495037440
  %arrayidx3 = getelementptr inbounds [15 x i64], [15 x i64]* %inptr, i64 0, i64 4
  %arrayidx4 = getelementptr inbounds [15 x i64], [15 x i64]* %inptr, i64 0, i64 5
  %add24 = add i64 undef, undef
  %add24.1 = add i64 undef, undef
  %shr.2 = lshr i64 undef, 16
  %add24.2 = add i64 %shr.2, undef
  %sub12.4 = sub i64 undef, %add24
  %and.4 = shl i64 %add24, 12
  %shl.4 = and i64 %and.4, 268431360
  %add18.4 = add i64 undef, %shl.4
  %sub12.5 = sub i64 %add.1, %add24.1
  store i64 %sub12.5, i64* %arrayidx2, align 8
  %and.5 = shl i64 %add24.1, 12
  %shl.5 = and i64 %and.5, 268431360
  %add18.5 = add i64 undef, %shl.5
  %add24.5 = add i64 undef, %add18.4
  store i64 %add24.5, i64* %arrayidx4, align 8
  %sub12.6 = sub i64 %add, %add24.2
  store i64 %sub12.6, i64* %arrayidx1, align 8
  %add24.6 = add i64 undef, %add18.5
  store i64 %add24.6, i64* %arrayidx3, align 8
  ret void
}

Quuxplusone commented 6 years ago

There's a lot of debug output spew from SLP, but I don't understand it yet.
I've tried to keep the lanes straight by renaming things here. The placement of
the unused instruction is important - that seems to be an anchor that causes
the misplaced instructions:

define void @PR35497([15 x i64]* %inptr) {
  %arrayidx1 = getelementptr [15 x i64], [15 x i64]* %inptr, i64 0, i64 0
  %arrayidx2 = getelementptr [15 x i64], [15 x i64]* %inptr, i64 0, i64 1
  %arrayidx3 = getelementptr [15 x i64], [15 x i64]* %inptr, i64 0, i64 4
  %arrayidx4 = getelementptr [15 x i64], [15 x i64]* %inptr, i64 0, i64 5

  %ld1 = load i64, i64* %arrayidx1, align 8
  %ld2 = load i64, i64* %arrayidx2, align 8

  %a1 = add i64 %ld1, -9
  %uu2 = add i64 undef, undef
  %uu4 = add i64 undef, undef

  %unused = add i64 undef, %uu4

  %p3 = shl i64 %uu2, 12
  %p4 = shl i64 %uu4, 12

  %q1 = lshr i64 undef, 16
  %q3 = add i64 %p3, 2
  %q4 = add i64 %p4, 2

  %r1 = add i64 %q1, undef
  %r2 = add i64 %ld2, 9
  %r3 = add i64 undef, %q3
  %r4 = add i64 undef, %q4

  %s1 = add i64 %a1, %r1
  %s2 = add i64 %r2, %uu2
  %s3 = add i64 undef, %r3
  %s4 = add i64 undef, %r4

  store i64 %s1, i64* %arrayidx1, align 8
  store i64 %s2, i64* %arrayidx2, align 8
  store i64 %s3, i64* %arrayidx3, align 8
  store i64 %s4, i64* %arrayidx4, align 8
  ret void
}

Quuxplusone / LLVMBugzillaTest

Trunk Clang fails to build Chromium with "Instruction does not dominate all uses" #34470