Open Quuxplusone opened 10 years ago
Bugzilla Link | PR19800 |
Status | NEW |
Importance | P normal |
Reported by | Dan Gohman (llvm@sunfishcode.online) |
Reported on | 2014-05-19 18:35:25 -0700 |
Last modified on | 2014-05-20 13:06:51 -0700 |
Version | trunk |
Hardware | All All |
CC | alonzakai@gmail.com, llvm-bugs@lists.llvm.org, rnk@google.com |
Fixed by commit(s) | |
Attachments | |
Blocks | |
Blocked by | |
See also |
Clang does this, not instcombine:
$ clang -cc1 t.cpp -emit-llvm -o - -triple x86_64-linux | grep i96
%1 = bitcast %struct.T* %0 to i96*
%bf.load = load i96* %1, align 4
%bf.clear = and i96 %bf.load, 15
%bf.cast = trunc i96 %bf.clear to i32
The logic I've heard is, the wider the load, the more freedom we have to
optimize. It's always legal to slice up a load, but it's hard to prove we can
widen one.
(In reply to comment #1)
> Clang does this, not instcombine:
>
> $ clang -cc1 t.cpp -emit-llvm -o - -triple x86_64-linux | grep i96
> %1 = bitcast %struct.T* %0 to i96*
> %bf.load = load i96* %1, align 4
> %bf.clear = and i96 %bf.load, 15
> %bf.cast = trunc i96 %bf.clear to i32
Clang doesn't emit the icmp with i96 though. It emits this:
%bf.load = load i96* %1, align 4
%bf.clear = and i96 %bf.load, 15
%bf.cast = trunc i96 %bf.clear to i32
%cmp = icmp sge i32 %bf.cast, 3
It's instcombine that's creating the i96 icmp.
Woops. I agree, it's probably more useful to form the smallest (or at least most efficient) ALU ops possible, while keep thing the memory operations as wide as possible.