Quuxplusone / LLVMBugzillaTest

0 stars 0 forks source link

[X86] X86ISelLowering combineShiftLeft contains a transform that isn't undef safe. #49437

Open Quuxplusone opened 3 years ago

Quuxplusone commented 3 years ago
Bugzilla Link PR50468
Status NEW
Importance P enhancement
Reported by Craig Topper (craig.topper@gmail.com)
Reported on 2021-05-25 11:55:07 -0700
Last modified on 2021-08-24 08:16:32 -0700
Version trunk
Hardware PC All
CC craig.topper@gmail.com, lebedev.ri@gmail.com, llvm-bugs@lists.llvm.org, llvm-dev@redking.me.uk, pengfei.wang@intel.com, spatel+llvm@rotateright.com
Fixed by commit(s)
Attachments
Blocks
Blocked by
See also
This function contains this transform which increases the uses of V. This isn't
correct if V is undef or could become undef later.

  // Hardware support for vector shifts is sparse which makes us scalarize the
  // vector operations in many cases. Also, on sandybridge ADD is faster than
  // shl.
  // (shl V, 1) -> add V,V
  if (auto *N1BV = dyn_cast<BuildVectorSDNode>(N1))
    if (auto *N1SplatC = N1BV->getConstantSplatNode()) {
      assert(N0.getValueType().isVector() && "Invalid vector shift type");
      // We shift all of the values by one. In many cases we do not have
      // hardware support for this operation. This is better expressed as an ADD
      // of two values.
      if (N1SplatC->isOne())
        return DAG.getNode(ISD::ADD, SDLoc(N), VT, N0, N0);
    }

For i16/i32/i64 we can probably just use an isel pattern instead like we do for
scalar shift left. Not sure how to preserve i8.
Quuxplusone commented 3 years ago
(In reply to Craig Topper from comment #0)
> For i16/i32/i64 we can probably just use an isel pattern instead like we do
> for scalar shift left. Not sure how to preserve i8.

The i8 case is repeated in LowerScalarImmediateShift, so we can address that
one separately.
Quuxplusone commented 3 years ago
Any ideas about how to make this bug visible?
If we fix this in SDAG, does that guarantee that we're safe to asm?
Quuxplusone commented 3 years ago
What about just inserting a ISD::FREEZE node? That would work the vXi8 cases as
well.

  // Hardware support for vector shifts is sparse which makes us scalarize the
  // vector operations in many cases. Also, on sandybridge ADD is faster than
  // shl.
  // (shl V, 1) -> add V,V
  if (auto *N1BV = dyn_cast<BuildVectorSDNode>(N1))
    if (auto *N1SplatC = N1BV->getConstantSplatNode()) {
      assert(N0.getValueType().isVector() && "Invalid vector shift type");
      // We shift all of the values by one. In many cases we do not have
      // hardware support for this operation. This is better expressed as an ADD
      // of two values.
      if (N1SplatC->isOne()) {
        N0 = DAG.getNode(ISD::FREEZE, SDLoc(N), VT, N0);
        return DAG.getNode(ISD::ADD, SDLoc(N), VT, N0, N0);
      }
    }
Quuxplusone commented 3 years ago

Freeze Patch: https://reviews.llvm.org/D106675

Quuxplusone commented 3 years ago

ISEL Patch: https://reviews.llvm.org/D106679 - this only works for vXi16/32/64 vectors - the vXi8 case would still need ISD::FREEZE

Quuxplusone commented 3 years ago

@lebedev.ri also noticed that we have isel patterns doing the same for scalars:

https://github.com/llvm/llvm-project/blob/6f7f5b54c81be59ec7876649d1f9aa6b104658ec/llvm/lib/Target/X86/X86InstrCompiler.td#L1761-L1770

Quuxplusone commented 3 years ago

The vXi8 case has been fixed with FREEZE: https://reviews.llvm.org/D108139