Quuxplusone / LLVMBugzillaTest

0 stars 0 forks source link

x86/GlobalISel broken because TableGen does not understand special case getSubClassWithSubReg #49004

Open Quuxplusone opened 3 years ago

Quuxplusone commented 3 years ago
Bugzilla Link PR50035
Status NEW
Importance P enhancement
Reported by Matt Arsenault (Matthew.Arsenault@amd.com)
Reported on 2021-04-19 17:59:15 -0700
Last modified on 2021-04-19 19:12:17 -0700
Version trunk
Hardware PC Linux
CC craig.topper@gmail.com, llvm-bugs@lists.llvm.org, llvm-dev@redking.me.uk, pengfei.wang@intel.com, spatel+llvm@rotateright.com
Fixed by commit(s)
Attachments
Blocks
Blocked by
See also
This testcase fails the verifier because tablegen does not expect the target to
modify the behavior of getSubClassWithSubReg:

; RUN: llc -global-isel -mtriple=i386-linux-gnu -verify-machineinstrs < %s

target triple = "i386-linux-gnu"

define i32 @foo(i32* %ptr) {
  %load = load volatile i32, i32* %ptr
  %mask = and i32 %load, 255
  ret i32 %mask
}

This is selected by this pattern:

// r & (2^8-1) ==> movz
def : Pat<(and GR32:$src1, 0xff),
          (MOVZX32rr8 (EXTRACT_SUBREG GR32:$src1, sub_8bit))>;

and subsequently fails the verifier:

bb.1 (%ir-block.0):
  %0:gr32 = MOV32rm %fixed-stack.0, 1, $noreg, 0, $noreg :: (invariant load 4 from %fixed-stack.0, align 16)
  %2:gr32 = MOV32rm %0:gr32, 1, $noreg, 0, $noreg :: (volatile load 4 from %ir.ptr)
  %5:gr8 = COPY %2.sub_8bit:gr32
  %4:gr32 = MOVZX32rr8 %5:gr8
  $eax = COPY %4:gr32
  RET 0, implicit $eax

# End machine code for function foo.

*** Bad machine code: Invalid register class for subregister index ***
- function:    foo
- basic block: %bb.1  (0x9d02698)
- instruction: %5:gr8 = COPY %2.sub_8bit:gr32
- operand 1:   %2.sub_8bit:gr32

The equivalent of getSubClassWithSubReg called here ends up returning the
"wrong" answer, so the wrong register class for the subregister index is used:
https://github.com/llvm/llvm-project/blob/fbb9132e71a200c12f416d30f4528b58c6c283f2/llvm/utils/TableGen/GlobalISelEmitter.cpp#L4730

This is because x86 added a hacky case to bypass the ordinary register logic

const TargetRegisterClass *
X86RegisterInfo::getSubClassWithSubReg(const TargetRegisterClass *RC,
                                       unsigned Idx) const {
  // The sub_8bit sub-register index is more constrained in 32-bit mode.
  // It behaves just like the sub_8bit_hi index.
  if (!Is64Bit && Idx == X86::sub_8bit)
    Idx = X86::sub_8bit_hi;

  // Forward to TableGen's default version.
  return X86GenRegisterInfo::getSubClassWithSubReg(RC, Idx);
}

I think the solution should be to avoid this special case, but I'm not sure
what that should look like. We currently don't have a way to indicate any
special treatment for register, register classes, or subregister index per
subtarget. Is there a way to restructure the x86 register definitions to avoid
this without one?
Quuxplusone commented 3 years ago
Is this related to the ConstrainForSubReg class in the EXTRACT_SUBREG handling
in InstrEmitter.cpp?

I recall we have these patterns split between 32-bit and 64-bit. I tried to
remove the COPY_TO_REGCLASS once, but it broke GlobalISel.

def : Pat<(i8 (trunc GR32:$src)),
          (EXTRACT_SUBREG (i32 (COPY_TO_REGCLASS GR32:$src, GR32_ABCD)),
                          sub_8bit)>,
      Requires<[Not64BitMode]>;
def : Pat<(i8 (trunc GR16:$src)),
          (EXTRACT_SUBREG (i16 (COPY_TO_REGCLASS GR16:$src, GR16_ABCD)),
                          sub_8bit)>,
      Requires<[Not64BitMode]>;

Do we need to add more COPY_TO_REGCLASS to more patterns?
Quuxplusone commented 3 years ago
(In reply to Craig Topper from comment #1)
> Is this related to the ConstrainForSubReg class in the EXTRACT_SUBREG
> handling in InstrEmitter.cpp?

This is essentially the same thing, yes. However GlobalISelEmitter just uses
the register constraints it knows and requests directly emitting a COPY with
the subreg index in the matcher table.

>
> I recall we have these patterns split between 32-bit and 64-bit. I tried to
> remove the COPY_TO_REGCLASS once, but it broke GlobalISel.
>
> def : Pat<(i8 (trunc GR32:$src)),
>
>           (EXTRACT_SUBREG (i32 (COPY_TO_REGCLASS GR32:$src, GR32_ABCD)),
>
>                           sub_8bit)>,
>
>       Requires<[Not64BitMode]>;
>
> def : Pat<(i8 (trunc GR16:$src)),
>
>           (EXTRACT_SUBREG (i16 (COPY_TO_REGCLASS GR16:$src, GR16_ABCD)),
>
>                           sub_8bit)>,
>
>       Requires<[Not64BitMode]>;
>
>
> Do we need to add more COPY_TO_REGCLASS to more patterns?

This is a bit ugly and I don't think should be needed. I think it would make
sense for tablegen to be smart enough to know to constrain the source register
class to be one compatible with the index, but that again would require
understanding this x86 specific hack in this case.

Are there only a small number of cases that need to worry about sub_8bit?
Quuxplusone commented 3 years ago
Are these patterns broken too?

    def : Pat<(sra GR32:$src1, GR8:$src2),
              (SARX32rr GR32:$src1,
                        (INSERT_SUBREG
                          (i32 (IMPLICIT_DEF)), GR8:$src2, sub_8bit))>;
    def : Pat<(sra GR64:$src1, GR8:$src2),
              (SARX64rr GR64:$src1,
                        (INSERT_SUBREG
                          (i64 (IMPLICIT_DEF)), GR8:$src2, sub_8bit))>;

    def : Pat<(srl GR32:$src1, GR8:$src2),
              (SHRX32rr GR32:$src1,
                        (INSERT_SUBREG
                          (i32 (IMPLICIT_DEF)), GR8:$src2, sub_8bit))>;
    def : Pat<(srl GR64:$src1, GR8:$src2),
              (SHRX64rr GR64:$src1,
                        (INSERT_SUBREG
                          (i64 (IMPLICIT_DEF)), GR8:$src2, sub_8bit))>;

    def : Pat<(shl GR32:$src1, GR8:$src2),
              (SHLX32rr GR32:$src1,
                        (INSERT_SUBREG
                          (i32 (IMPLICIT_DEF)), GR8:$src2, sub_8bit))>;
    def : Pat<(shl GR64:$src1, GR8:$src2),
              (SHLX64rr GR64:$src1,
                        (INSERT_SUBREG
                          (i64 (IMPLICIT_DEF)), GR8:$src2, sub_8bit))>;
Quuxplusone commented 3 years ago
(In reply to Craig Topper from comment #3)
> Are these patterns broken too?

I think these happen to work because they don't have a virtual register that
needs constraining, and instead copy to a specific physical register
Quuxplusone commented 3 years ago

Those shouldn't be going to a specific physical register. Those are the newer shift instructions that don't update flags and take shift amount from any register. Not the old ones that always used CL as the shift amount.