Closed Quuxplusone closed 3 years ago
Bugzilla Link | PR49147 |
Status | RESOLVED FIXED |
Importance | P enhancement |
Reported by | Simon Pilgrim (llvm-dev@redking.me.uk) |
Reported on | 2021-02-11 06:30:21 -0800 |
Last modified on | 2021-02-27 02:54:47 -0800 |
Version | trunk |
Hardware | PC Windows NT |
CC | llvm-bugs@lists.llvm.org, spatel+llvm@rotateright.com |
Fixed by commit(s) | |
Attachments | |
Blocks | |
Blocked by | |
See also |
It should be easy enough to get these in instcombine.
There's an open question about exactly which FMF should be matched. Candidates
are any/all of:
reassoc
arcp
afn
https://llvm.org/docs/LangRef.html#fast-math-flags
In practical terms, once the programmer allows "reassoc", they should be
prepared for anything, but we try to be safer on most fdiv transforms and also
check "arcp".
https://reviews.llvm.org/D96648
I left "powi" off of that because I'm not sure how we should deal with the case where the exponent is 0x8000000 (signed min i32). gcc doesn't seem to care about that possibility?
(In reply to Sanjay Patel from comment #2)
> https://reviews.llvm.org/D96648
>
> I left "powi" off of that because I'm not sure how we should deal with the
> case where the exponent is 0x8000000 (signed min i32). gcc doesn't seem to
> care about that possibility?
Looking at the code where these snippets came from, technically we could use
valuetracking to check the min/max bounds of the int - although I think more
likely we'll end up having to convert to a float and negate that, which lose us
any perf benefit.
(In reply to Simon Pilgrim from comment #3)
> (In reply to Sanjay Patel from comment #2)
> > https://reviews.llvm.org/D96648
> >
> > I left "powi" off of that because I'm not sure how we should deal with the
> > case where the exponent is 0x8000000 (signed min i32). gcc doesn't seem to
> > care about that possibility?
>
> Looking at the code where these snippets came from, technically we could use
> valuetracking to check the min/max bounds of the int - although I think more
> likely we'll end up having to convert to a float and negate that, which lose
> us any perf benefit.
Make sure I'm seeing it correctly - in the general case, we need to decide
which of these has better perf:
callq __powisf2@PLT
divss %xmm0, %xmm1
Or:
pxor %xmm1, %xmm1
cvtsi2ssl %edi, %xmm1
xorps .LC0(%rip), %xmm1
jmp powf
So we have to know if the divss costs more than the presumed savings of the
"powi" call vs. the "powf" call.
In IR (instcombine), we could say that getting rid of the fdiv is worth adding
2 extra instructions (fneg + sitofp).
I don't know where in the gcc pipeline this is implemented, but that's what
they seem to have done:
https://simd.godbolt.org/z/WjWrv1
Interestingly, I'm not sure compiler-rt accounts for MIN_INT in powi:
https://github.com/llvm/llvm-project/blob/main/compiler-rt/lib/builtins/powisf2.c
powi() isn't part of the standard math lib, so anything goes?
https://gcc.gnu.org/onlinedocs/gcc-10.2.0/gcc/Other-Builtins.html#Other-Builtins
— Built-in Function: double __builtin_powi (double, int)
Returns the first argument raised to the power of the second. Unlike the pow function no guarantees about precision and rounding are made.
We probably want to add at least these 2 related folds for exp() and exp2()?
diff --git a/llvm/lib/Transforms/InstCombine/InstCombineMulDivRem.cpp
b/llvm/lib/Transforms/InstCombine/InstCombineMulDivRem.cpp
index 9bc566ed3523..702f76da2774 100644
--- a/llvm/lib/Transforms/InstCombine/InstCombineMulDivRem.cpp
+++ b/llvm/lib/Transforms/InstCombine/InstCombineMulDivRem.cpp
@@ -1326,8 +1326,9 @@ Instruction *InstCombinerImpl::visitFDiv(BinaryOperator
&I) {
if (match(Op1, m_FDiv(m_SpecificFP(1.0), m_Value(Y))))
return BinaryOperator::CreateFMulFMF(Y, Op0, &I);
- // Negate the exponent of pow to fold division-by-pow() into multiply:
+ // Negate the exponent of pow/exp to fold division-by-pow() into multiply:
// Z / pow(X, Y) --> Z * pow(X, -Y)
+ // Z / exp{2}(Y) --> Z * exp{2}(-Y)
// In the general case, this creates an extra instruction, but fmul allows
// for better canonicalization and optimization than fdiv.
if (match(Op1,
@@ -1336,6 +1337,16 @@ Instruction *InstCombinerImpl::visitFDiv(BinaryOperator
&I) {
Value *Pow = Builder.CreateBinaryIntrinsic(Intrinsic::pow, X, NegY, &I);
return BinaryOperator::CreateFMulFMF(Op0, Pow, &I);
}
+ if (match(Op1, m_OneUse(m_Intrinsic<Intrinsic::exp>(m_Value(Y))))) {
+ Value *NegY = Builder.CreateFNegFMF(Y, &I);
+ Value *Pow = Builder.CreateUnaryIntrinsic(Intrinsic::exp, NegY, &I);
+ return BinaryOperator::CreateFMulFMF(Op0, Pow, &I);
+ }
+ if (match(Op1, m_OneUse(m_Intrinsic<Intrinsic::exp2>(m_Value(Y))))) {
+ Value *NegY = Builder.CreateFNegFMF(Y, &I);
+ Value *Pow = Builder.CreateUnaryIntrinsic(Intrinsic::exp2, NegY, &I);
+ return BinaryOperator::CreateFMulFMF(Op0, Pow, &I);
+ }
}
if (I.hasAllowReassoc() && Op0->hasOneUse() && Op1->hasOneUse()) {
(In reply to Sanjay Patel from comment #7)
> We probably want to add at least these 2 related folds for exp() and exp2()?
gcc has those, so I'll add some tests and push that:
https://simd.godbolt.org/z/8KMGTf
Adding the exp/exp2 cases would be awesome - thanks.
It would be useful to support powi if at all possible, given the vagueness of the builtin - maybe if we just add something to the langref 'powi(x, INT_MIN) is undefined' and we always do the negation?
(In reply to Simon Pilgrim from comment #9)
> Adding the exp/exp2 cases would be awesome - thanks.
Added with:
https://reviews.llvm.org/rGe772618f1ee2
> It would be useful to support powi if at all possible, given the vagueness
> of the builtin - maybe if we just add something to the langref 'powi(x,
> INT_MIN) is undefined' and we always do the negation?
I'm still not clear on when it is optimal to convert powi to powf. I haven't
tried to benchmark it yet, but I thought powi is (much) faster given that it's
just a loop of fmul. Is there a larger example where we see powf as the winner?
(In reply to Sanjay Patel from comment #10)
> (In reply to Simon Pilgrim from comment #9)
> > Adding the exp/exp2 cases would be awesome - thanks.
>
> Added with:
> https://reviews.llvm.org/rGe772618f1ee2
>
> > It would be useful to support powi if at all possible, given the vagueness
> > of the builtin - maybe if we just add something to the langref 'powi(x,
> > INT_MIN) is undefined' and we always do the negation?
>
> I'm still not clear on when it is optimal to convert powi to powf. I haven't
> tried to benchmark it yet, but I thought powi is (much) faster given that
> it's just a loop of fmul. Is there a larger example where we see powf as the
> winner?
Whats still stopping us just mapping 1/powi(x,i) -> powi(x,-i) ?
(In reply to Simon Pilgrim from comment #11)
> (In reply to Sanjay Patel from comment #10)
> > (In reply to Simon Pilgrim from comment #9)
> > > Adding the exp/exp2 cases would be awesome - thanks.
> >
> > Added with:
> > https://reviews.llvm.org/rGe772618f1ee2
> >
> > > It would be useful to support powi if at all possible, given the vagueness
> > > of the builtin - maybe if we just add something to the langref 'powi(x,
> > > INT_MIN) is undefined' and we always do the negation?
> >
> > I'm still not clear on when it is optimal to convert powi to powf. I haven't
> > tried to benchmark it yet, but I thought powi is (much) faster given that
> > it's just a loop of fmul. Is there a larger example where we see powf as the
> > winner?
>
> Whats still stopping us just mapping 1/powi(x,i) -> powi(x,-i) ?
I think that case is fine except for the potential INT_MIN corner case. Note
that gcc is not getting that transform in the example here. So I was confused
in my earlier comments - we're never going to convert powi to powf; gcc is
missing an opportunity to convert powf into powi.
I don't know how to prove this, but I wonder if we could say that for any
supported LLVM FP types powi(X, 0x80000000) is either 0.0, 1.0, or Inf (and so
the reciprocal is Inf, 1.0, or 0.0).
If that holds, it's safe to do the transform with -ffast-math because that
implies 'ninf'?
I think the combined disclaimers on powi and FMF allow this:
https://reviews.llvm.org/rGa7cee55762c6
So that's the last item for this bug on my list.
Awesome - thanks @spatel!