Quuxplusone / LLVMBugzillaTest

0 stars 0 forks source link

IR for isfinite() could be improved #27144

Open Quuxplusone opened 8 years ago

Quuxplusone commented 8 years ago
Bugzilla Link PR27145
Status NEW
Importance P normal
Reported by Sanjay Patel (spatel+llvm@rotateright.com)
Reported on 2016-03-30 17:10:43 -0700
Last modified on 2017-11-19 07:36:01 -0800
Version trunk
Hardware PC All
CC hfinkel@anl.gov, llvm-bugs@lists.llvm.org
Fixed by commit(s)
Attachments
Blocks
Blocked by
See also PR27164
#include <math.h>
#include <stdbool.h>

bool foo(float x) { return isfinite(x); }

$ ./clang -O1 isnan.c -S -o -  -emit-llvm
...
define zeroext i1 @foo(float %x) #0 {
entry:
  %cmp.i = fcmp ord float %x, 0.000000e+00
  %0 = tail call float @llvm.fabs.f32(float %x) #2
  %cmp1.i = fcmp une float %0, 0x7FF0000000000000
  %1 = and i1 %cmp.i, %cmp1.i
  ret i1 %1
}

Can't we use 'fcmp one' here to eliminate the first compare?
"one: ordered and not equal"
Quuxplusone commented 8 years ago

What system is this on? How does this compare to what Clang generates from __builtin_isfinite?

Quuxplusone commented 8 years ago
(In reply to comment #1)
> What system is this on?

Sorry, forgot this can vary with OS:
target triple = "x86_64-apple-macosx10.11.0"

> How does this compare to what Clang generates from __builtin_isfinite?

It's identical from what I can tell except the builtin has better instruction
names. :)

bool try_builtin_isfinite(float x) { return __builtin_isfinite(x); }

define zeroext i1 @try_builtin_isfinite(float %x) #0 {
entry:
  %iseq = fcmp ord float %x, 0.000000e+00
  %0 = tail call float @llvm.fabs.f32(float %x) #3
  %isinf = fcmp une float %0, 0x7FF0000000000000
  %and = and i1 %iseq, %isinf
  ret i1 %and
}
Quuxplusone commented 8 years ago
I think I can answer my own question with a 'yes' by looking at the IR produced
for isinf():

define zeroext i1 @compiler_isinf(float %x) #0 {
  %1 = tail call float @llvm.fabs.f32(float %x) #1
  %2 = fcmp oeq float %1, 0x7FF0000000000000
  ret i1 %2
}

define zeroext i1 @try_builtin_isinf(float %x) #0 {
  %1 = tail call float @llvm.fabs.f32(float %x) #2
  %2 = fcmp oeq float %1, 0x7FF0000000000000
  ret i1 %2
}
Quuxplusone commented 8 years ago

The fix for the builtin seems easy enough. I'll post a patch for review.

Quuxplusone commented 8 years ago
For completeness, here's an overkill yet lame checker program I used on OSX x86
to show the equivalence. Note that the x86 asm is still not optimal. Backend
bug for that coming up...

isfinite_tester.c:

// compile with:
// ./clang isfinite_tester.c isfinite.s

#include <stdio.h>
#include <string.h>

extern unsigned isfinite_slow(float);
extern unsigned isfinite_fast(float);

int main() {
  int i;
  for (i = 0; i < 0xffffffff; i++) {  // misses the last value...close enough
    float f;
    memcpy(&f, &i, 4);
    unsigned slow = isfinite_slow(f);
    unsigned fast = isfinite_fast(f);
    if (slow != fast)
      printf("ERROR: %x %u %u\n", i, slow, fast);
  }
  return 0;
}

isfinite.s:

  .section  __TEXT,__text,regular,pure_instructions
  .macosx_version_min 10, 11
  .section  __TEXT,__literal16,16byte_literals
  .p2align  4
LCPI0_0:
  .long 2147483647              ## 0x7fffffff
  .long 2147483647              ## 0x7fffffff
  .long 2147483647              ## 0x7fffffff
  .long 2147483647              ## 0x7fffffff
  .section  __TEXT,__literal4,4byte_literals
  .p2align  2
LCPI0_1:
  .long 2139095040              ## float +Inf
  .section  __TEXT,__text,regular,pure_instructions
  .globl  _isfinite_slow
  .p2align  4, 0x90
_isfinite_slow:
  .cfi_startproc
## BB#0:
  ucomiss %xmm0, %xmm0
  setnp %cl
  andps LCPI0_0(%rip), %xmm0
  ucomiss LCPI0_1(%rip), %xmm0
  sbbb  %al, %al
  andb  %cl, %al
  retq
  .cfi_endproc

  .section  __TEXT,__literal16,16byte_literals
  .p2align  4
LCPI1_0:
  .long 2147483647              ## 0x7fffffff
  .long 2147483647              ## 0x7fffffff
  .long 2147483647              ## 0x7fffffff
  .long 2147483647              ## 0x7fffffff
  .section  __TEXT,__literal4,4byte_literals
  .p2align  2
LCPI1_1:
  .long 2139095040              ## float +Inf
  .section  __TEXT,__text,regular,pure_instructions
  .globl _isfinite_fast
  .p2align  4, 0x90
_isfinite_fast:
  .cfi_startproc
## BB#0:
  andps LCPI1_0(%rip), %xmm0
  ucomiss LCPI1_1(%rip), %xmm0
  setne %al
  retq
  .cfi_endproc

.subsections_via_symbols
Quuxplusone commented 8 years ago

http://reviews.llvm.org/D18648

Quuxplusone commented 8 years ago
The builtin IR should be fixed with:
http://reviews.llvm.org/rL265675

I think InstCombine has to simplify the OS macro implementation to that same IR.
Quuxplusone commented 8 years ago
(In reply to comment #7)
> I think InstCombine has to simplify the OS macro implementation to that same
> IR.

Or stub out the problem at the source:
http://reviews.llvm.org/D18639
Quuxplusone commented 7 years ago
Regardless of what we do in the front-end, I think we need to solve a more
general problem in IR that should also fold this particular case (if it is not
solved in the front-end). So step 1:
https://reviews.llvm.org/D37427

Assuming that's reasonable, I think the follow-up to solve this bug will be
some small enhancements to foldLogicOfFCmps().
Quuxplusone commented 6 years ago
Another step towards optimizing this pattern generally in IR:
https://reviews.llvm.org/D40130
https://reviews.llvm.org/rL318627