[InstCombine] Incorrect folding of LShr into select instruction

Quuxplusone commented 3 years ago


Bugzilla Link	PR48353
Status	NEW
Importance	P enhancement
Reported by	Congzhe Cao (congzhecao@gmail.com)
Reported on	2020-12-01 14:16:31 -0800
Last modified on	2020-12-22 12:07:02 -0800
Version	trunk
Hardware	PC Linux
CC	juneyoung.lee@sf.snu.ac.kr, lebedev.ri@gmail.com, llvm-bugs@lists.llvm.org, llvm-dev@redking.me.uk, meheff@google.com, nunoplopes@sapo.pt, regehr@cs.utah.edu, spatel+llvm@rotateright.com
Fixed by commit(s)
Attachments
Blocks
Blocked by
See also	PR48435

The IR that exposes this bug is as follows.

; ModuleID = 'test.c'

@m = common dso_local local_unnamed_addr global i32 0, align 4

define dso_local i32 @test() local_unnamed_addr #1 {
entry:
  call fastcc void @foo(i32 51)
  ret i32 0
}

define internal fastcc void @foo(i32 %aj) {
entry:
  %cmp.i = icmp sgt i32 %aj, 1
  %shr.i = select i1 %cmp.i, i32 0, i32 %aj
  %cond.i1 = lshr i32 3, %shr.i
  %conv.i = trunc i32 %cond.i1 to i8
  %0 = and i8 %conv.i, 1
  %cmp4 = icmp ugt i8 %0, 0
  %conv5 = zext i1 %cmp4 to i32
  store i32 %conv5, i32* @m
  ret void
}

Note that at compile time we know the argument of foo() is %aj = 51 thus
"%shr.i = select i1 %cmp.i, i32 0, i32 %aj" will result in 0 and hence LShr
instruction that follows is correct, not resulting in a poison value.

When running "opt -instcombine test.ll -S", the following IR is generated.
Specifically, "%aj.op = lshr i32 3, %aj" is generated which result in a poison
value since %aj is actually greater than 32.

@m = common dso_local local_unnamed_addr global i32 0, align 4

define dso_local i32 @test() local_unnamed_addr #1 {
entry:
  call fastcc void @foo(i32 51)
  ret i32 0
}

define internal fastcc void @o(i32 %aj) {
entry:
  %cmp.i = icmp sgt i32 %aj, 1
  %aj.op = lshr i32 3, %aj
  %.op2 = and i32 %aj.op, 1
  %cmp.i3 = zext i1 %cmp.i to i32
  %conv55 = or i32 %.op2, %cmp.i3
  store i32 %conv55, i32* @m, align 4
  ret void
}

************************

The cause is that during InstCombine, we fold

%shr.i = select i1 %cmp.i, i32 0, i32 %aj
%cond.i1 = lshr i32 3, %shr.i

into

%aj.op = lshr i32 3, %aj
%cond.i1 = select i1 %cmp.i, i32 3, i32 %aj.op,

thus generating the incorrect LShr instruction "%aj.op = lshr i32 3, %aj".

****************************

Our proposed fix:

In function InstCombiner::FoldOpIntoSelect() from
Transforms/InstCombine/InstructionCombining.cpp, add the following check such
that we only fold LShr into select when both the true value and the false value
are known to be constants.

  // If op is a LShr instruction, we can only fold into select when both TV
  // and FV are known to be constants, otherwise they may be evaluated to be
  // greater than the width of the first operand of LShr, resulting in for
  // example:
  //     LShr i32 X Y where Y > 32,
  // which is not suppose to be generated.
  if (Op.getOpcode() == Instruction::LShr &&
      !(isa<Constant>(TV) && isa<Constant>(FV))) {
    return nullptr;
  }

Quuxplusone commented 3 years ago

So the transform that is causing miscompile is

----------------------------------------
define i32 @src(i32 %aj) {
%entry:
  %cmp.i = icmp sgt i32 %aj, 1
  %aj.op = lshr i32 3, %aj
  %.op2 = and i32 %aj.op, 1
  %cmp41 = icmp ne i32 %.op2, 0
  %cmp4 = select i1 %cmp.i, i1 1, i1 %cmp41
  %conv5 = zext i1 %cmp4 to i32
  ret i32 %conv5
}
=>
define i32 @tgt(i32 %aj) {
%entry:
  %cmp.i = icmp sgt i32 %aj, 1
  %aj.op = lshr i32 3, %aj
  %.op2 = and i32 %aj.op, 1
  %cmp41 = icmp ne i32 %.op2, 0
  %cmp4 = or i1 %cmp.i, %cmp41
  %conv5 = zext i1 %cmp4 to i32
  ret i32 %conv5
}
Transformation doesn't verify!
ERROR: Target is more poisonous than source

Example:
i32 %aj = #x00000020 (32)

Source:
i1 %cmp.i = #x1 (1)
i32 %aj.op = poison
i32 %.op2 = poison
i1 %cmp41 = poison
i1 %cmp4 = #x1 (1)
i32 %conv5 = #x00000001 (1)

Target:
i1 %cmp.i = #x1 (1)
i32 %aj.op = poison
i32 %.op2 = poison
i1 %cmp41 = poison
i1 %cmp4 = poison
i32 %conv5 = poison
Source value: #x00000001 (1)
Target value: poison

I.e.

----------------------------------------
define i1 @src(i1 %cmp.i, i1 %cmp41) {
%entry:
  %cmp4 = select i1 %cmp.i, i1 1, i1 %cmp41
  ret i1 %cmp4
}
=>
define i1 @tgt(i1 %cmp.i, i1 %cmp41) {
%entry:
  %cmp4 = or i1 %cmp.i, %cmp41
  ret i1 %cmp4
}
Transformation doesn't verify!
ERROR: Target is more poisonous than source

Example:
i1 %cmp.i = #x1 (1)
i1 %cmp41 = poison

Source:
i1 %cmp4 = #x1 (1)

Target:
i1 %cmp4 = poison
Source value: #x1 (1)
Target value: poison

Quuxplusone commented 3 years ago

... time to apply freeze?

Quuxplusone commented 3 years ago

Would it be a big change if optimizations that deal with or/and are updated to deal with the corresponding select patterns?

Quuxplusone commented 3 years ago

I tend to agree. I don't see much value in turning select -> and/or + freeze. I
would rather keep the select and make sure the backends are happy with it.
I think Sanjay already did some work in SelDAG to make it like selects more
AFAIR.

Quuxplusone commented 3 years ago

Yes, we've done some work in codegen to make it more robust with select in IR.
We've also tried to patch instcombine to make it more flexible about
recognizing patterns with selects.

If I'm seeing it correctly, this is one of the transforms noted here:
https://reviews.llvm.org/D72396

So we need to remove this block:
https://github.com/llvm/llvm-project/blob/f03c21df7b845e2ffcef42f242764f36603fdbb4/llvm/lib/Transforms/InstCombine/InstCombineSelect.cpp#L2627

And see what regresses?

Quuxplusone commented 3 years ago

> So we need to remove this block:
> https://github.com/llvm/llvm-project/blob/
> f03c21df7b845e2ffcef42f242764f36603fdbb4/llvm/lib/Transforms/InstCombine/
> InstCombineSelect.cpp#L2627
>
> And see what regresses?

Sounds like a plan! :)
(that whole block seems incorrect, yes)

Quuxplusone commented 3 years ago

(In reply to Sanjay Patel from comment #5)
> Yes, we've done some work in codegen to make it more robust with select in
> IR. We've also tried to patch instcombine to make it more flexible about
> recognizing patterns with selects.
>
> If I'm seeing it correctly, this is one of the transforms noted here:
> https://reviews.llvm.org/D72396
>
> So we need to remove this block:
> https://github.com/llvm/llvm-project/blob/
> f03c21df7b845e2ffcef42f242764f36603fdbb4/llvm/lib/Transforms/InstCombine/
> InstCombineSelect.cpp#L2627
>
> And see what regresses?

Thanks for the suggestion! (In reply to Sanjay Patel from comment #5)
> Yes, we've done some work in codegen to make it more robust with select in
> IR. We've also tried to patch instcombine to make it more flexible about
> recognizing patterns with selects.
>
> If I'm seeing it correctly, this is one of the transforms noted here:
> https://reviews.llvm.org/D72396
>
> So we need to remove this block:
> https://github.com/llvm/llvm-project/blob/
> f03c21df7b845e2ffcef42f242764f36603fdbb4/llvm/lib/Transforms/InstCombine/
> InstCombineSelect.cpp#L2627
>
> And see what regresses?

Thanks for the suggestion! Are you going to upstream it? We are going to
internally implement a patch as suggested and measure the performance anyways,
so we could just post the bug/solution/result to Phabricator if that makes
things easier. I'd appreciate your comment on it.

Quuxplusone commented 3 years ago

(In reply to Nuno Lopes from comment #4)
> I tend to agree. I don't see much value in turning select -> and/or +
> freeze. I would rather keep the select and make sure the backends are happy
> with it.
> I think Sanjay already did some work in SelDAG to make it like selects more
> AFAIR.

Thanks for the comment. I'm just curious what would be the potential solution
of "turning select -> and/or + freeze", what does it look like? Is it like just
freezing the result of and/or instruction?

Quuxplusone commented 3 years ago

The solution that uses freeze is to insert freeze at the operand or and/or
coming from non-conditional value of select:

select x, true, y -> or x, freeze(y)
select x, y, false -> and x, freeze(y)

Quuxplusone commented 3 years ago

Here is the test: https://alive2.llvm.org/ce/z/dwFA_n

Quuxplusone commented 3 years ago

(In reply to Juneyoung Lee from comment #10)
> Here is the test: https://alive2.llvm.org/ce/z/dwFA_n

Thanks for the demo! it makes perfect sense.

Quuxplusone commented 3 years ago

(In reply to Congzhe Cao from comment #7)
> (In reply to Sanjay Patel from comment #5)
> > Yes, we've done some work in codegen to make it more robust with select in
> > IR. We've also tried to patch instcombine to make it more flexible about
> > recognizing patterns with selects.
> >
> > If I'm seeing it correctly, this is one of the transforms noted here:
> > https://reviews.llvm.org/D72396
> >
> > So we need to remove this block:
> > https://github.com/llvm/llvm-project/blob/
> > f03c21df7b845e2ffcef42f242764f36603fdbb4/llvm/lib/Transforms/InstCombine/
> > InstCombineSelect.cpp#L2627
> >
> > And see what regresses?
>
> Thanks for the suggestion! (In reply to Sanjay Patel from comment #5)
> > Yes, we've done some work in codegen to make it more robust with select in
> > IR. We've also tried to patch instcombine to make it more flexible about
> > recognizing patterns with selects.
> >
> > If I'm seeing it correctly, this is one of the transforms noted here:
> > https://reviews.llvm.org/D72396
> >
> > So we need to remove this block:
> > https://github.com/llvm/llvm-project/blob/
> > f03c21df7b845e2ffcef42f242764f36603fdbb4/llvm/lib/Transforms/InstCombine/
> > InstCombineSelect.cpp#L2627
> >
> > And see what regresses?
>
> Thanks for the suggestion! Are you going to upstream it? We are going to
> internally implement a patch as suggested and measure the performance
> anyways, so we could just post the bug/solution/result to Phabricator if
> that makes things easier. I'd appreciate your comment on it.

Sorry, I missed that comment initially. If you can collect some perf
measurements with that change and/or post to Phab, that would be great.

I just looked at some of the instcombine regression test diffs, and we may need
to accelerate the canonicalization to the min/max intrinsics to avoid problems.
But we should check codegen for various targets to see if those are really
regressions that we must address.

Quuxplusone commented 3 years ago

(In reply to Sanjay Patel from comment #12)
> (In reply to Congzhe Cao from comment #7)
> > (In reply to Sanjay Patel from comment #5)
> > > Yes, we've done some work in codegen to make it more robust with select in
> > > IR. We've also tried to patch instcombine to make it more flexible about
> > > recognizing patterns with selects.
> > >
> > > If I'm seeing it correctly, this is one of the transforms noted here:
> > > https://reviews.llvm.org/D72396
> > >
> > > So we need to remove this block:
> > > https://github.com/llvm/llvm-project/blob/
> > > f03c21df7b845e2ffcef42f242764f36603fdbb4/llvm/lib/Transforms/InstCombine/
> > > InstCombineSelect.cpp#L2627
> > >
> > > And see what regresses?
> >
> > Thanks for the suggestion! (In reply to Sanjay Patel from comment #5)
> > > Yes, we've done some work in codegen to make it more robust with select in
> > > IR. We've also tried to patch instcombine to make it more flexible about
> > > recognizing patterns with selects.
> > >
> > > If I'm seeing it correctly, this is one of the transforms noted here:
> > > https://reviews.llvm.org/D72396
> > >
> > > So we need to remove this block:
> > > https://github.com/llvm/llvm-project/blob/
> > > f03c21df7b845e2ffcef42f242764f36603fdbb4/llvm/lib/Transforms/InstCombine/
> > > InstCombineSelect.cpp#L2627
> > >
> > > And see what regresses?
> >
> > Thanks for the suggestion! Are you going to upstream it? We are going to
> > internally implement a patch as suggested and measure the performance
> > anyways, so we could just post the bug/solution/result to Phabricator if
> > that makes things easier. I'd appreciate your comment on it.
>
> Sorry, I missed that comment initially. If you can collect some perf
> measurements with that change and/or post to Phab, that would be great.
>
> I just looked at some of the instcombine regression test diffs, and we may
> need to accelerate the canonicalization to the min/max intrinsics to avoid
> problems. But we should check codegen for various targets to see if those
> are really regressions that we must address.

Thanks for the reply! I followed the suggestion that removes this block:
https://github.com/llvm/llvm-project/blob/f03c21df7b845e2ffcef42f242764f36603fdbb4/llvm/lib/Transforms/InstCombine/InstCombineSelect.cpp#L2627.

But then the built clang crashes when building some tests. The cause is that,
since that block has been removed, when we now process instructions like
"select i1 Condition, i1 TrueValue, i1 FalseValue", we'll enter the following
code block to optimize it:
https://github.com/llvm/llvm-project/blob/f03c21df7b845e2ffcef42f242764f36603fdbb4/llvm/lib/Transforms/InstCombine/InstCombineSelect.cpp#L2669

And this results in "zext i1 to i1", causing the compiler to crash.

My fix to the crash is that, in the condition checking (line 2669), adding the
following checks to make sure TrueVal and FalseVal are not i1 type, which now
looks like:

 if (SelType->isIntOrIntVectorTy() &&
      CondVal->getType()->isVectorTy() == SelType->isVectorTy() &&
      (TrueVal->getType()->isIntegerTy() &&
       TrueVal->getType()->getIntegerBitWidth() != 1) &&
      (FalseVal->getType()->isIntegerTy() &&
       FalseVal->getType()->getIntegerBitWidth() != 1))

I'm wondering if you have comments on this fix? If it looks fine I'll go ahead
measuring the performance. Thanks!

Quuxplusone commented 3 years ago

Posted the fix suggested to Phabricator as work-in-progress, will update the progress and performance numbers there. Any review is welcome, please just let me know if I happen to miss someone.

https://reviews.llvm.org/D93065

Quuxplusone commented 3 years ago

_Bug 48435 has been marked as a duplicate of this bug._

Quuxplusone / LLVMBugzillaTest

[InstCombine] Incorrect folding of LShr into select instruction #47322