Open Quuxplusone opened 6 years ago
Bugzilla Link | PR35600 |
Status | NEW |
Importance | P enhancement |
Reported by | Alexander Zaitsev (zamazan4ik@tut.by) |
Reported on | 2017-12-10 06:52:19 -0800 |
Last modified on | 2018-07-31 09:27:58 -0700 |
Version | trunk |
Hardware | PC All |
CC | hfinkel@anl.gov, llvm-bugs@lists.llvm.org, llvm-dev@redking.me.uk, spatel+llvm@rotateright.com |
Fixed by commit(s) | |
Attachments | |
Blocks | PR35611 |
Blocked by | |
See also |
Is calling pow actually faster? I'd think that the optimization here might actually go the other way?
if you will check this code:
#include <cmath>
double test(double a)
{
return sqrt(sqrt(sqrt(sqrt(sqrt(sqrt(sqrt(a)))))));
}
clang(trunk) with '--std=c++17 -O3 -march=native -ffast-math' generates here:
test(double): # @test(double)
vsqrtsd xmm0, xmm0, xmm0
vsqrtsd xmm0, xmm0, xmm0
vsqrtsd xmm0, xmm0, xmm0
vsqrtsd xmm0, xmm0, xmm0
vsqrtsd xmm0, xmm0, xmm0
vsqrtsd xmm0, xmm0, xmm0
vsqrtsd xmm0, xmm0, xmm0
ret
but gcc(trunk) with '--std=c++17 -O3 -march=native -ffast-math' generates:
test(double):
vandpd xmm0, xmm0, XMMWORD PTR .LC1[rip]
vmovsd xmm1, QWORD PTR .LC0[rip]
jmp __pow_finite
.LC0:
.long 0
.long 1065353216
.LC1:
.long 4294967295
.long 2147483647
.long 0
.long 0
And here i am sure that pow is faster than chain of sqrt functions.
Yes, many sqrts might be slower - but the 2*sqrt case is very likely to be
quicker.
Some GPUs might be able to do this with a dedicated pow instruction, and a soft
pow implementation might avoid a bottleneck on a sqrt unit but trying to
compare costs of hw instructions AND libm implementations is likely to be a
nightmare.
See a lot of optimizations here: https://github.com/gcc-mirror/gcc/blob/07b69d3f1cd3dd8ebb0af1fbff95914daee477d2/gcc/match.pd
There's a proposal to convert pow(x, 0.25) to sqrt(sqrt(x)) here:
https://reviews.llvm.org/D49306