[X86] or/add-like mismatch in uchar 'splat' to uint

Quuxplusone commented 3 years ago


Bugzilla Link	PR49055
Status	NEW
Importance	P enhancement
Reported by	Simon Pilgrim (llvm-dev@redking.me.uk)
Reported on	2021-02-05 04:12:56 -0800
Last modified on	2021-02-09 04:48:09 -0800
Version	trunk
Hardware	PC Windows NT
CC	craig.topper@gmail.com, lebedev.ri@gmail.com, llvm-bugs@lists.llvm.org, spatel+llvm@rotateright.com
Fixed by commit(s)
Attachments
Blocks
Blocked by
See also

https://simd.godbolt.org/z/E15MrP

void loop_add(const unsigned char* __restrict pIn, unsigned int* __restrict
pOut, int s) {
  for (int i = 0; i < s; i++) {
    unsigned int pixelChar = pIn[i];
    unsigned int pixel = pixelChar + (pixelChar << 8) + (pixelChar << 16) + (255 << 24);
    pOut[i] = pixel;
  }
}

This successfully combines to a zext+mul+add (I think technically it could be
zext+mul+or)?

But this:

void loop_or(const unsigned char* __restrict pIn, unsigned int* __restrict
pOut, int s) {
  for (int i = 0; i < s; i++) {
    unsigned int pixelChar = pIn[i];
    unsigned int pixel = pixelChar | (pixelChar << 8) | (pixelChar << 16) | (255 << 24);
    pOut[i] = pixel;
  }
}

completely fails and we end up with a long shift+or chain.

Quuxplusone commented 3 years ago

The root question - which of these is canonical?

define i32 @src(i8 %x) {
  %zx = zext i8 %x to i32
  %s1 = shl nuw nsw i32 %zx, 8
  %s2 = shl nuw nsw i32 %zx, 16
  %or = or i32 %s1, %s2
  ret i32 %or
}

define i32 @tgt(i8 %x) {
  %zx = zext i8 %x to i32
  %or = mul i32 %zx, 65792
  ret i32 %or
}

https://alive2.llvm.org/ce/z/dqT39Y

The difference in instcombine is in
InstCombinerImpl::SimplifyUsingDistributiveLaws() ->
getBinOpsForFactorization().

We recognize add-of-shifts as a factorized multiply, but not or-of-shifts.

We're arguing for "mul" in this example, and we already do that partially, so
the backend should be equipped to decompose back to shift+logic.

Interestingly, the -reassociate pass already has a reverse transform from -
instcombine that turns the 'or' into 'add':
"ShouldConvertOrWithNoCommonBitsToAdd"

If we allow shift opcodes there, that's probably the quickest fix.

Quuxplusone commented 3 years ago

A little git digging turns up that the -reassociate logic was added relatively
recently:
https://reviews.llvm.org/rG70472f3
https://reviews.llvm.org/rG93f3d7f

So it seems fine to add 'shl' to the opcode list. It's awkward that we have
opposing canonicalizations in different passes, but I don't have any better
ideas.

I'll add a reassociate test and a phase ordering test to prevent this from
regressing.

Quuxplusone commented 3 years ago

The small fix to -reassociate improves, but does not solve the larger example
shown here:
https://reviews.llvm.org/rG6fd91be35444

We'll need to adjust reassociate and/or instcombine with at least 1 more change
to get this to canonicalize as expected.

Quuxplusone / LLVMBugzillaTest

[X86] or/add-like mismatch in uchar 'splat' to uint #48024