Quuxplusone / LLVMBugzillaTest

0 stars 0 forks source link

Constant folding for trivial loop does not occur for float and double #30903

Open Quuxplusone opened 7 years ago

Quuxplusone commented 7 years ago
Bugzilla Link PR31930
Status NEW
Importance P normal
Reported by drraph@gmail.com
Reported on 2017-02-10 10:56:20 -0800
Last modified on 2017-02-13 01:55:44 -0800
Version trunk
Hardware PC Linux
CC drraph@gmail.com, hfinkel@anl.gov, llvm-bugs@lists.llvm.org
Fixed by commit(s)
Attachments
Blocks
Blocked by
See also
Consider:

float f(float x[]) {
  float p = 1.0;
  for (int i = 0; i < 960; i++)
    p += 1;
  return p;
}

When compiled with  -march=core-avx2 -O3 -ffast-math the assembly loops round
adding until it gets to 961.

However:

int f(int x[]) {
  int p = 1;
  for (int i = 0; i < 960; i++)
    p += 1;
  return p;
}

gives:

f:                                      # @f
        mov     eax, 961
        ret

I don't know how hard it would be to add the same optimization for floats and
double.

As a side note, there are in fact a number of interesting details with the
first (float) loop. First, if we reduce the i < 960 limit to i < 959 the loop
is optimized out. Second if we change the type to 'double' this upper limit
goes down to i < 479.  My guess is that this corresponds to an unpeeling cost
model that is incorporated into the compiler.
Quuxplusone commented 7 years ago
If we take:

float f(int x[]) {
  float p = 1;
  #pragma unroll
  for (int i = 0; i < 960; i++)
    p += 1;
  return p;
}

and compile simply with -O and no other flags, we get:

.LCPI0_0:
        .long   1148207104              # float 961
f:                                      # @f
        movss   xmm0, dword ptr [rip + .LCPI0_0] # xmm0 = mem[0],zero,zero,zero
        ret