Quuxplusone / LLVMBugzillaTest

0 stars 0 forks source link

[X86] any_extend is pessimized by selection #41114

Open Quuxplusone opened 5 years ago

Quuxplusone commented 5 years ago
Bugzilla Link PR42144
Status NEW
Importance P enhancement
Reported by Roman Lebedev (lebedev.ri@gmail.com)
Reported on 2019-06-05 12:14:52 -0700
Last modified on 2019-06-05 12:53:04 -0700
Version trunk
Hardware PC Linux
CC craig.topper@gmail.com, llvm-bugs@lists.llvm.org, llvm-dev@redking.me.uk, spatel+llvm@rotateright.com
Fixed by commit(s)
Attachments
Blocks
Blocked by
See also
define dso_local zeroext i1 @_Z2t0jj(i32, i32) local_unnamed_addr #0 {
  %3 = add i32 %1, -1
  %4 = shl i32 1, %3
  %5 = and i32 %4, %0
  %6 = icmp ne i32 %5, 0
  ret i1 %6
}

results in:

# %bb.0:
        decb    %sil
        movzbl  %sil, %eax
        btl     %eax, %edi
        setb    %al
        retq

That movzbl shouldn't be there.

Combining: t0: ch = EntryToken
Optimized legalized selection DAG: %bb.0 '_Z2t0jj:'
SelectionDAG has 16 nodes:
  t0: ch = EntryToken
        t2: i32,ch = CopyFromReg t0, Register:i32 %0
              t4: i32,ch = CopyFromReg t0, Register:i32 %1
            t21: i8 = truncate t4
          t23: i8 = add t21, Constant:i8<-1>
        t27: i32 = any_extend t23
      t29: i32 = X86ISD::BT t2, t27
    t30: i8 = X86ISD::SETCC Constant:i8<2>, t29
  t17: ch,glue = CopyToReg t0, Register:i8 $al, t30
  t18: ch = X86ISD::RET_FLAG t17, TargetConstant:i32<0>, Register:i8 $al, t17:1

ISEL: Starting selection on root node: t27: i32 = any_extend t23
ISEL: Starting pattern match
  Initial Opcode index to 126894
  Match failed at index 126898
  Continuing at 127264
  TypeSwitch[i32] from 127267 to 127270
  Skipped scope entry (due to false predicate) at index 127276, continuing at 127285
  Morphed node: t27: i32 = MOVZX32rr8 t23
ISEL: Match complete!

===== Instruction selection ends:
Selected selection DAG: %bb.0 '_Z2t0jj:'
SelectionDAG has 18 nodes:
  t0: ch = EntryToken
          t2: i32,ch = CopyFromReg t0, Register:i32 %0
                t4: i32,ch = CopyFromReg t0, Register:i32 %1
              t21: i8 = EXTRACT_SUBREG t4, TargetConstant:i32<1>
            t23: i8,i32 = DEC8r t21
          t27: i32 = MOVZX32rr8 t23
        t29: i32 = BT32rr t2, t27
      t33: ch,glue = CopyToReg t0, Register:i32 $eflags, t29
    t30: i8 = SETCCr TargetConstant:i8<2>, t33:1
  t17: ch,glue = CopyToReg t0, Register:i8 $al, t30
  t18: ch = RET TargetConstant:i32<0>, Register:i8 $al, t17, t17:1

I'm not quite sure where to look though
Quuxplusone commented 5 years ago
Not sure if it's the same, but I was looking at a similar problem that led me
to this bit of X86InstrCompiler.td:

// anyext. Define these to do an explicit zero-extend to
// avoid partial-register updates.

To avoid this, the big change that I think we want -- but probably requires
many intermediate fixes -- is to promote all 8-bit ops to 32-bit.
Quuxplusone commented 5 years ago
SelectionDAGBuilder inserted a trunc before the shl during DAG construction.
This caused DAGCombine to shrink the add -1 to 8-bits because
sTypeDesirableForOp didn't tell it not to. Lowering for the icmp created the BT
from the and/shl, but needed to extend the register type back to 32-bits to
match the register size required for the BT count input.

I'm pretty sure that even though BT only needs 5 or 6 bits for the count, the
partial register handling in the frontend and register renaming does track it
as an access to the full register 16/32/64-bit register. This means on Sandy
Bridge and later, it will trigger an H-register merge if needed. I think the
shift amounts for SHLX/SARX/SHRX are the same. The old shift instructions that
always use CL will not trigger an H-register merge.

Promote all the things is probably the best way to fix this.