Quuxplusone / LLVMBugzillaTest

0 stars 0 forks source link

"[CodeGen] convert math libcalls/builtins to equivalent LLVM intrinsics" change overriding "__asm__ specifier #34645

Open Quuxplusone opened 6 years ago

Quuxplusone commented 6 years ago
Bugzilla Link PR35672
Status NEW
Importance P normal
Reported by Chris Chrulski (chris.chrulski@yahoo.com)
Reported on 2017-12-15 07:53:49 -0800
Last modified on 2018-08-13 13:05:35 -0700
Version trunk
Hardware PC Linux
CC denis.bakhvalov@intel.com, efriedma@quicinc.com, hfinkel@anl.gov, llvm-bugs@lists.llvm.org, llvm-dev@redking.me.uk, spatel+llvm@rotateright.com
Fixed by commit(s)
Attachments convert_math_calls_email.txt (3865 bytes, text/plain)
Blocks
Blocked by
See also PR36879, PR37082, PR38547
Created attachment 19558
email conversion with change author

There was a behavior change in how the math headers are processed following the
checkin of "[CodeGen] convert math libcalls/builtins to equivalent LLVM
intrinsics" on Dec 1, 2017 which looks like it may not have been intended. It
appears the conversion to intrinsics is taking precedence over the processing
of __asm__ specifiers.

#include <stdio.h>
#include<math.h>
// With -ffast-math, header defines the following line:
//extern double exp (double) __asm__ ("" "__exp_finite") __attribute__
((__nothrow__ ));

int main()
{
    double i = 0.0;
    double r;
    for (i = 0.0; i < 1.0; i += 0.1) {
        r = exp(i);
        printf("%g: exp = %g\n", i, r);
    }
    return 0;
}

With this code, when building with -ffast-math on Linux, the following used to
create this for the 'exp' call:
%9 = call fast double @__exp_finite(double %8) #3

After the commit, it is generating:
%2 = call fast double @llvm.exp.f64(double %1)

More detailed discussion is in the attachment.
Quuxplusone commented 6 years ago

Attached convert_math_calls_email.txt (3865 bytes, text/plain): email conversion with change author

Quuxplusone commented 6 years ago
Hrmmm. Yes. Also, we should be generating calls to __exp_finite with a *-gnu
triple for call fast double @llvm.exp.f64. Both are true (i.e., Clang should
check the asm name, but it should still generate the intrinsic in this case
because LLVM should do the right thing in this case too).
Quuxplusone commented 6 years ago
For reference, the commit was:
https://reviews.llvm.org/rL319593

And that was part of a sequence of commits listed in:
https://reviews.llvm.org/D28335
...which describes the larger problem that I was trying to fix.

I thought we had some LibCallSimplifier support for this, so let me look at
that first.
Quuxplusone commented 6 years ago

We do manage to constant fold the whole thing at -O2...so at least that works...I hope. :)

Quuxplusone commented 6 years ago
I've been looking at how the math intrinsics are handled in the backend, and
it's not clear to me where we should do the translation to the 'finite'
variants. This may also relate to the proposed fix for saving/restoring errno
from D28335.

Potential options:
1. In IR as part of a pre-DAG IR codegen pass. A candidate for
extension/renaming is "partially-inline-libcalls".

2. At DAG creation time in SelectionDAGBuilder::visitIntrinsicCall(). See for
example the transforms that occur for a limited precision target such as in
expandExp().

3. At DAG legalization time. See SelectionDAGLegalize::ConvertNodeToLibcall().
Note that we've lost fast-math-flags on the DAG nodes at this point, so we
should try to make progress on https://reviews.llvm.org/D37686 (not sure if
anything is actually holding that up at this point).
Quuxplusone commented 6 years ago
(In reply to Sanjay Patel from comment #4)
> I've been looking at how the math intrinsics are handled in the backend, and
> it's not clear to me where we should do the translation to the 'finite'
> variants. This may also relate to the proposed fix for saving/restoring
> errno from D28335.
>
> Potential options:
> 1. In IR as part of a pre-DAG IR codegen pass. A candidate for
> extension/renaming is "partially-inline-libcalls".
>
> 2. At DAG creation time in SelectionDAGBuilder::visitIntrinsicCall(). See
> for example the transforms that occur for a limited precision target such as
> in expandExp().
>
> 3. At DAG legalization time. See
> SelectionDAGLegalize::ConvertNodeToLibcall(). Note that we've lost
> fast-math-flags on the DAG nodes at this point, so we should try to make
> progress on https://reviews.llvm.org/D37686 (not sure if anything is
> actually holding that up at this point).

I think (3) is right. I agree, we should make progress on the SDAG patch.
Quuxplusone commented 6 years ago
(In reply to Hal Finkel from comment #5)
> > 3. At DAG legalization time. See
> > SelectionDAGLegalize::ConvertNodeToLibcall(). Note that we've lost
> > fast-math-flags on the DAG nodes at this point, so we should try to make
> > progress on https://reviews.llvm.org/D37686 (not sure if anything is
> > actually holding that up at this point).
>
> I think (3) is right. I agree, we should make progress on the SDAG patch.

Here's a start on that fix by using the function attributes for now:
https://reviews.llvm.org/D41338

Note: I won't be able to iterate on that patch for about a week at least. If we
need this fixed immediately, please commandeer.
Quuxplusone commented 6 years ago
Chris - can you verify that you get the expected end result in your real code
after:
https://reviews.llvm.org/rL322087 ?

There's still a clang bug here from what I can see:

#include <stdio.h>
// This is not the exp() you were looking for...
extern double exp (double) __asm__ ("" "explode") __attribute__ ((__nothrow__
));

int main()
{
    double i = 0.0;
    double r;
    for (i = 0.0; i < 1.0; i += 0.1) {
        r = exp(i);
        printf("%g: exp = %g\n", i, r);
    }
    return 0;
}
Quuxplusone commented 6 years ago
For the record, Chris replied via email that we are getting the expected
*_finite output now. But let's leave this open to track the clang bug that's
still out there.
Quuxplusone commented 6 years ago
I found that it only works for non-LTO builds. For LTO build issue still exist.

$ cat a.c
#include <math.h>
int main()
{
  return exp(1.0);
}

1) Non-LTO build (notice call to '__exp_finite'):
$ clang -O0 -ffast-math a.c –lm
$ objdump -d a.out | grep "call.*exp"
  400567:   e8 c4 fe ff ff             callq  400430 <__exp_finite@plt>

2) LTO build (notice call to 'exp'):
$ clang -O0 -flto -ffast-math a.c -lm
$ objdump -d a.out | grep "call.*exp"
  400547:   e8 b4 fe ff ff          callq  400400 <exp@plt>

We can see that if we add ‘-flto’ we stop generating the call to finite version
of ‘exp’ function, although we could, because ‘-ffast-math’ is provided.

I debugged this issue and found the root cause. I can share the details if
someone is interested, however I don’t know how it can be fixed immediately.
Quuxplusone commented 6 years ago
(In reply to Denis Bakhvalov from comment #9)
> I found that it only works for non-LTO builds. For LTO build issue still
> exist.
...
> I debugged this issue and found the root cause. I can share the details if
> someone is interested, however I don’t know how it can be fixed immediately.

I haven't looked at LTO, but please do post any analysis that you've done.
Preferably, open a new bug report for that, so we can track it separately from
the remaining known issue here.
Quuxplusone commented 6 years ago
(In reply to Sanjay Patel from comment #10)
>
> I haven't looked at LTO, but please do post any analysis that you've done.
> Preferably, open a new bug report for that, so we can track it separately
> from the remaining known issue here.

I opened new issue: https://bugs.llvm.org/show_bug.cgi?id=36879
I will post my observations in this tracker.