Quuxplusone / LLVMBugzillaTest

0 stars 0 forks source link

instcombine creates i96 icmp #19799

Open Quuxplusone opened 10 years ago

Quuxplusone commented 10 years ago
Bugzilla Link PR19800
Status NEW
Importance P normal
Reported by Dan Gohman (llvm@sunfishcode.online)
Reported on 2014-05-19 18:35:25 -0700
Last modified on 2014-05-20 13:06:51 -0700
Version trunk
Hardware All All
CC alonzakai@gmail.com, llvm-bugs@lists.llvm.org, rnk@google.com
Fixed by commit(s)
Attachments
Blocks
Blocked by
See also
For this C++ code:

struct T
{
    unsigned a :4;
    unsigned b :1;
    unsigned c :1;
    unsigned d :10;
    unsigned e :16;
    unsigned f :15;
    unsigned g :1;
    unsigned h :15;
    unsigned i :1;
    unsigned j :15;
    unsigned k :1;
    unsigned l :15;
    unsigned m :1;
};

bool foo(T *p)
{
    return p->a >= 3;
}

clang -O2 -emit-llvm emits LLVM IR that does this:

  %bf.load = load i96* %0, align 4
  %bf.cast = and i96 %bf.load, 15
  %cmp = icmp ugt i96 %bf.cast, 2

even on targets (such as x86-64) which have explicit native integer types in
their datalayout (such as -n8:16:32:64). The i96 icmp is semantically correct,
but surprising, given the general convention that instcombine isn't supposed to
propagate non-legal types through expression trees.

In practice, SelectionDAG codegen optimizes this away by noticing that the
input to the icmp is an and with lots of leading zero bits. However, it seems
awkward to create such illegal comparisons and require SelectionDAG to optimize
them away. It's also problematic for Emscripten, which does not currently use
SelectionDAG, so doesn't currently have code to optimize away the i96
comparison.

Is it intended that instcombine promote comparisons up to illegal types like
this?
Quuxplusone commented 10 years ago
Clang does this, not instcombine:

$ clang -cc1 t.cpp -emit-llvm -o - -triple x86_64-linux | grep i96
  %1 = bitcast %struct.T* %0 to i96*
  %bf.load = load i96* %1, align 4
  %bf.clear = and i96 %bf.load, 15
  %bf.cast = trunc i96 %bf.clear to i32

The logic I've heard is, the wider the load, the more freedom we have to
optimize.  It's always legal to slice up a load, but it's hard to prove we can
widen one.
Quuxplusone commented 10 years ago
(In reply to comment #1)
> Clang does this, not instcombine:
>
> $ clang -cc1 t.cpp -emit-llvm -o - -triple x86_64-linux | grep i96
>   %1 = bitcast %struct.T* %0 to i96*
>   %bf.load = load i96* %1, align 4
>   %bf.clear = and i96 %bf.load, 15
>   %bf.cast = trunc i96 %bf.clear to i32

Clang doesn't emit the icmp with i96 though. It emits this:

  %bf.load = load i96* %1, align 4
  %bf.clear = and i96 %bf.load, 15
  %bf.cast = trunc i96 %bf.clear to i32
  %cmp = icmp sge i32 %bf.cast, 3

It's instcombine that's creating the i96 icmp.
Quuxplusone commented 10 years ago

Woops. I agree, it's probably more useful to form the smallest (or at least most efficient) ALU ops possible, while keep thing the memory operations as wide as possible.