Open Quuxplusone opened 5 years ago
Bugzilla Link | PR41110 |
Status | NEW |
Importance | P enhancement |
Reported by | David Bolvansky (david.bolvansky@gmail.com) |
Reported on | 2019-03-17 03:46:18 -0700 |
Last modified on | 2019-08-05 14:44:32 -0700 |
Version | trunk |
Hardware | PC Linux |
CC | andrea.dibiagio@gmail.com, craig.topper@gmail.com, llvm-bugs@lists.llvm.org, llvm-dev@redking.me.uk, spatel+llvm@rotateright.com |
Fixed by commit(s) | |
Attachments | |
Blocks | |
Blocked by | |
See also |
Yeah, we shouldn't be repeating code like that, although depending on whether we end up with a branch or cmov probably affects how/when we merge.
We have been finding cases where cmov isn't great if we have access to prof metadata that hints that its not 50:50 so gcc
I think we had decent IR at one point that should have gotten us close to the gcc output, but then canEvaluateSExtd got a little over excited and turned everything into 64-bit. Inserting the shl 32 and sar 32 to do so.
I think canEvaluateSExtd decided that because it found a truncate from i64 to i32 earlier in the computation, that it was a good idea to promote everything to i64. Even though the truncate had other users and we had to use shifts to still do the sign extension in the i64 value.
We might also need to teach InstCombine's visitTruncate to simplify (trunc (phi (sext X), Y)) to (phi X, (trunc Y)).
Probably doesn't do anything for the motivating problem, but this 64-bit
constant op:
movabs rcx, -4294967296
shl rax, 32
add rax, rcx
...is bogus. We should be moving 'add' before 'shl' late in combining to get a
32-bit constant:
leal -1(%rdi), %eax
shlq $32, %rax
That likely requires carving out an exception using the
isDesirableToCommuteWithShift() hook. Otherwise, we'll have opposing transforms.
https://rise4fun.com/Alive/f9a
// (add (shl X, C2), C1) -> shl (add X, C1 >> C2), C2
// iff (C1 & ((1<<C2)-1)) == 0
ConstantSDNode *N1C = isConstOrConstSplat(N1);
if (N0.getOpcode() == ISD::SHL && N1C &&
TLI.isDesirableToCommuteWithShift(N0.getNode(), Level)) {
ConstantSDNode *ShiftAmt = isConstOrConstSplat(N0.getOperand(1));
unsigned BitWidth = N1C->getAPIntValue().getBitWidth();
if (ShiftAmt && ((N1C->getAPIntValue() &
(APInt::getSignMask(ShiftAmt->getSExtValue()).zextOrTrunc(BitWidth) -
1)) == 0)) {
SDValue Shift = DAG.FoldConstantArithmetic(ISD::SRA, DL, VT, N1.getNode(),
N0.getOperand(1).getNode());
SDValue Add = DAG.getNode(ISD::ADD, DL, VT, N0.getOperand(0), Shift);
AddToWorklist(Shl0.getNode());
return DAG.getNode(ISD::SHL, DL, VT, Add, N0.getOperand(1));
}
}
With
define i64 @ok1(i64 %x) {
%shl = shl i64 %x, 32
%add = add i64 %shl, -4294967296
ret i64 %add
}
Clang produces:
leal -1(%rdi), %eax
shlq $32, %rax
Sadly, with:
define i64 @ok2(i64 %x) {
%add = shl i64 %x, 5
%shl = add i64 %add, 32
ret i64 %shl
}
Clang/LLC goes to infinite loop (one pattern always reverses other one).
Any advice how to avoid / break infinite loop?
* AddToWorklist(Add.getNode());
(In reply to David Bolvansky from comment #4)
> Any advice how to avoid / break infinite loop?
I don't see a way to do this in general (DAGCombiner or X86ISelLowering)
because reversing the order would break a lot of existing folds. It might be
possible to add a late and limited match for this pattern to
X86ISelDAGToDAG.cpp (X86DAGToDAGISel class).