Quuxplusone / LLVMBugzillaTest

0 stars 0 forks source link

Misc incorrect folds with fabs #30675

Open Quuxplusone opened 7 years ago

Quuxplusone commented 7 years ago
Bugzilla Link PR31702
Status NEW
Importance P normal
Reported by Eli Friedman (efriedma@quicinc.com)
Reported on 2017-01-19 19:46:09 -0800
Last modified on 2017-01-20 12:01:22 -0800
Version trunk
Hardware PC Windows NT
CC justin.lebar@gmail.com, jyknight@google.com, llvm-bugs@lists.llvm.org, llvm-dev@redking.me.uk, Matthew.Arsenault@amd.com, scanon@apple.com, spatel+llvm@rotateright.com
Fixed by commit(s)
Attachments
Blocks
Blocked by
See also
Testcase (opt -instcombine):

define double @f1(double %x) local_unnamed_addr #0 {
entry:
  %call = tail call double @llvm.fabs.f64(double %x)
  %add = fadd double %call, 0x7FF8000000000000
  %abs = tail call double @llvm.fabs.f64(double %add)
  ret double %abs
}

define double @f2(double %x) local_unnamed_addr #1 {
entry:
  %call1 = tail call double @llvm.maxnum.f64(double 0x7FF8000000000000, double %x)
  %call2 = tail call double @llvm.fabs.f64(double %call1)
  ret double %call1
}

define double @f3(double %x, i32 %n) local_unnamed_addr #1 {
entry:
  %call1 = tail call double @llvm.powi.f64(double -0.0, i32 %n)
  %call2 = tail call double @llvm.fabs.f64(double %call1)
  ret double %call1
}

declare double @llvm.maxnum.f64(double, double)
declare double @llvm.fabs.f64(double)
declare double @llvm.powi.f64(double, i32)

instcombine eliminates fabs calls from all of these, so the sign bit could be
wrong.  I think a few others are wrong too.

See https://reviews.llvm.org/D28927.
Quuxplusone commented 7 years ago

Thanks for filing this bug -- I understand this better now, I think.

IEEE 754 section 6.3 says:

When either an input or result [of an operation] is NaN, this standard does not interpret the sign of a NaN. Note, however, that operations on bit strings—copy, negate, abs, copySign—specify the sign bit of a NaN result, sometimes based upon the sign bit of a NaN operand. The logical predicate totalOrder is also affected by the sign bit of a NaN operand. For all other operations, this standard does not specify the sign bit of a NaN result, even when there is only one input NaN, or when the NaN is produced from an invalid operation.

So because the sign bit of the NaN returned by most ops is unspecified, but because abs operates on the bit-level representation, essentially any time a NaN is produced, we can't make any assumptions about the sign bit, and we must keep the fabs around.

Is that right?

Quuxplusone commented 7 years ago
Wow, thank you, Bugzilla.  Properly line-wrapped citation:

> When either an input or result [of an operation] is NaN, this standard does
> not interpret the sign of a NaN. Note, however, that operations on bit
> strings—copy, negate, abs, copySign—specify the sign bit of a NaN result,
> sometimes based upon the sign bit of a NaN operand. The logical predicate
> totalOrder is also affected by the sign bit of a NaN operand. For all other
> operations, this standard does not specify the sign bit of a NaN result, even
> when there is only one input NaN, or when the NaN is produced from an invalid
> operation.
Quuxplusone commented 7 years ago
(In reply to comment #0)
> Testcase (opt -instcombine):
>

> define double @f2(double %x) local_unnamed_addr #1 {
> entry:
>   %call1 = tail call double @llvm.maxnum.f64(double 0x7FF8000000000000,
> double %x)
>   %call2 = tail call double @llvm.fabs.f64(double %call1)
>   ret double %call1
> }

For #2 I would expect the maxnum to be eliminated to just x
Quuxplusone commented 7 years ago
(In reply to comment #3)
> For #2 I would expect the maxnum to be eliminated to just x

Yeah, but instead the whole function gets eliminated to "ret %x", which
is...wow.
Quuxplusone commented 7 years ago

Typo: in f2, should be "ret %call2".

Quuxplusone commented 7 years ago
> instcombine eliminates fabs calls from all of these

With the following corrected testcase, I only see us eliminating fabs from @f1.

define double @f1(double %x) local_unnamed_addr #0 {
entry:
  %call = tail call double @llvm.fabs.f64(double %x)
  %add = fadd double %call, 0x7FF8000000000000
  %abs = tail call double @llvm.fabs.f64(double %add)
  ret double %abs
}

define double @f2(double %x) local_unnamed_addr #1 {
entry:
  %call1 = tail call double @llvm.maxnum.f64(double 0x7FF8000000000000, double %x)
  %call2 = tail call double @llvm.fabs.f64(double %call1)
  ret double %call2
}

define double @f3(double %x, i32 %n) local_unnamed_addr #1 {
entry:
  %call1 = tail call double @llvm.powi.f64(double -0.0, i32 %n)
  %call2 = tail call double @llvm.fabs.f64(double %call1)
  ret double %call2
}

declare double @llvm.maxnum.f64(double, double)
declare double @llvm.fabs.f64(double)
declare double @llvm.powi.f64(double, i32)
Quuxplusone commented 7 years ago

In #1, I don't see why it's incorrect to eliminate the first fabs. We know that the result of the fadd is the NaN with some sign bit. So the value of %call doesn't actually matter.

Since the #2 and #3 also operate correctly once we fix the typos, I am not sure that this testcase illustrates any bugs.

Quuxplusone commented 7 years ago
To summarize an argument I made to my coworkers about why we have to
conservatively leave fabs in a lot more places than we currently do:

Suppose you have llvm.fabs(x*x), and suppose you lower the x*x to "mul x, x",
where "mul" is a fp multiply instruction on some IEEE 754-compliant processor.
When x is NaN, our compliant processor could choose to return a NaN with either
a positive or negative sign bit, irrespective of the sign bit of x.  But the
result of fabs is required to have the sign bit set to 0, and it is legal for
user code to observe this property, even of a NaN.

Therefore unless we know that x is not NaN, or unless we know something about
the processor we are targeting beyond simply that it's IEEE 754-compliant, we
cannot omit the fabs in fabs(x*x).

Some transformations we make that I think may do the wrong thing on a
hypothetical compliant-but-evil processor+libm:

* exp/exp2/exp10(x) are assumed to have a sign bit of 0.  But if they return
NaN, the spec says this NaN may have any sign bit.

* FAdd, FDiv, FMul, and FRem currently assume that if both inputs are positive
NaNs, the output is a positive NaN.  This is incorrect; if the operation
returns NaN, the resulting NaN can have any sign bit according to the spec.

* minnum/maxnum currently assumes that if both/either input is positive NaN,
the output is a positive NaN.  I'm not sure if this is correct; the spec says
the result is a "canonicalized" value, but as far as I can tell the sign bit on
a canonical NaN can be either positive or negative.

* FMA and FMULADD assume that in x*x+y, if x is not NaN and y is positive NaN,
the result has a 0 sign bit.  I didn't read the FMA portion of the spec
carefully, but I suspect this is wrong for the same reasons as above.

* pow(x, y) assumes that, if y is an integer, the result has a positive sign
bit if x has a positive sign bit.  Again, we have the same problem of pow(NaN)
possibly changing the sign bit of the NaN.

In addition, there's one transformation that I think *will* do the wrong thing
on any compliant processor+libm:

* pow(x, y) assumes that, if y is an integer, the result is either NaN or >= -0
if x is either NaN or >= -0.  This is incorrect, because pow(-0, y) = -inf if y
is negative odd.

My guess (which may be wrong; I have only looked at the x86 ISA) is that most
processors+libms are not evil, and will not switch the sign of NaNs on us.  If
this is true, most of the above transformations are correct for most
processors.  But unless they're correct for all processors we would ever want
to target, it seems to me we should have a (TTI?) hook to check whether or not
the target is evil.

I suspect we won't want to "fix" these issues until we have such a TTI check,
since doing so would be a performance regression on non-evil platforms.
Quuxplusone commented 7 years ago
(In reply to comment #8)

> * minnum/maxnum currently assumes that if both/either input is positive NaN,
> the output is a positive NaN.  I'm not sure if this is correct; the spec
> says the result is a "canonicalized" value, but as far as I can tell the
> sign bit on a canonical NaN can be either positive or negative.
>

minnum/maxnum says "Otherwise it is either x or y, canonicalized (this means
results might differ among implementations"
Quuxplusone commented 7 years ago
> (In reply to comment #9)

Right, and in section 3.5.2, it says

> A NaN is in its preferred (canonical) representation if the bits G6
> through Gw+4 are zero and the encoding of the payload is canonical.

As I read it (and I'm new to all this), the sign bit is not in the range G6-
Gw+4, nor is it a payload bit.  Therefore ISTM that a canonical NaN can have
either sign.
Quuxplusone commented 7 years ago
Oops, I really messed up testcases 2 and 3.

Fixed version of testcase 2 (I don't think anyone's mentioned this issue yet):

define double @f2(double %x, i1 %b) local_unnamed_addr #1 {
entry:
  %imm = select i1 %b, double 0x7FF8000000000000, double 1.0
  %call1 = tail call double @llvm.maxnum.f64(double %imm, double %x)
  %call2 = tail call double @llvm.fabs.f64(double %call1)
  ret double %call2
}

Fixed version of f3 (the powi problem you pointed out):

define i1 @f3(double %x, i32 %n) local_unnamed_addr #1 {
entry:
  %call1 = tail call double @llvm.powi.f64(double -0.0, i32 %n)
  %c = fcmp uge double %call1, 0.0
  ret i1 %c
}

Both of these transforms are pretty clearly wrong regardless of how the target
handles the sign bit of nans.

It's also worth noting that we can be a lot more aggressive with eliminating
fabs operations if we can prove all the users are floating-point operations:
they don't preserve the sign bit on the NaN anyway.