Missed optimization in math expression: sqrt(sqrt(a)) == pow(a, 1/4)

Quuxplusone / LLVMBugzillaTest

0 stars 0 forks source link

Missed optimization in math expression: sqrt(sqrt(a)) == pow(a, 1/4) #34573

Open Quuxplusone opened 6 years ago

Quuxplusone commented 6 years ago


Bugzilla Link	PR35600
Status	NEW
Importance	P enhancement
Reported by	Alexander Zaitsev (zamazan4ik@tut.by)
Reported on	2017-12-10 06:52:19 -0800
Last modified on	2018-07-31 09:27:58 -0700
Version	trunk
Hardware	PC All
CC	hfinkel@anl.gov, llvm-bugs@lists.llvm.org, llvm-dev@redking.me.uk, spatel+llvm@rotateright.com
Fixed by commit(s)
Attachments
Blocks	PR35611
Blocked by
See also

clang(trunk) with '--std=c++17 -O3 -march=native -ffast-math' flags for this
code:

#include <cmath>

double test(double a)
{
    return sqrt(sqrt(a));
}

generates this assembly:

test(double): # @test(double)
  vsqrtsd xmm0, xmm0, xmm0
  vsqrtsd xmm0, xmm0, xmm0
  ret

But there is formula: sqrt(sqrt(a)) == pow(a, 1/4). And it can be compiled in
faster way.

Quuxplusone commented 6 years ago

Is calling pow actually faster? I'd think that the optimization here might actually go the other way?

Quuxplusone commented 6 years ago

if you will check this code:

#include <cmath>

double test(double a)
{
    return sqrt(sqrt(sqrt(sqrt(sqrt(sqrt(sqrt(a)))))));
}

clang(trunk) with '--std=c++17 -O3 -march=native -ffast-math' generates here:

test(double): # @test(double)
  vsqrtsd xmm0, xmm0, xmm0
  vsqrtsd xmm0, xmm0, xmm0
  vsqrtsd xmm0, xmm0, xmm0
  vsqrtsd xmm0, xmm0, xmm0
  vsqrtsd xmm0, xmm0, xmm0
  vsqrtsd xmm0, xmm0, xmm0
  vsqrtsd xmm0, xmm0, xmm0
  ret

but gcc(trunk) with '--std=c++17 -O3 -march=native -ffast-math' generates:

test(double):
        vandpd  xmm0, xmm0, XMMWORD PTR .LC1[rip]
        vmovsd  xmm1, QWORD PTR .LC0[rip]
        jmp     __pow_finite
.LC0:
        .long   0
        .long   1065353216
.LC1:
        .long   4294967295
        .long   2147483647
        .long   0
        .long   0

And here i am sure that pow is faster than chain of sqrt functions.

Quuxplusone commented 6 years ago

Yes, many sqrts might be slower - but the 2*sqrt case is very likely to be
quicker.

Some GPUs might be able to do this with a dedicated pow instruction, and a soft
pow implementation might avoid a bottleneck on a sqrt unit but trying to
compare costs of hw instructions AND libm implementations is likely to be a
nightmare.

Quuxplusone commented 6 years ago

See a lot of optimizations here: https://github.com/gcc-mirror/gcc/blob/07b69d3f1cd3dd8ebb0af1fbff95914daee477d2/gcc/match.pd

Quuxplusone commented 6 years ago

There's a proposal to convert pow(x, 0.25) to sqrt(sqrt(x)) here:
https://reviews.llvm.org/D49306