Closed Leonerd-04 closed 3 years ago
When making claims about "optimizations", please always provide the benchmark (hopefully JMH) that validates that claim.
The fact is that Math.sqrt()
cannot be improved on when using today's CPUs which all provide dedicated circuitry for computing the square root. And the HotSpot JVM will use an intrinsic that makes use of the respective CPU instruction.
Chances are that your benchmark was ill-formed that the JVM optimized your square root function away because it was statically given the same input every time.
Here is a solid JMH-based benchmark which shows other timings:
import org.openjdk.jmh.annotations.*;
import org.openjdk.jmh.runner.Runner;
import org.openjdk.jmh.runner.options.OptionsBuilder;
import java.util.concurrent.ThreadLocalRandom;
import static java.util.concurrent.TimeUnit.MILLISECONDS;
import static java.util.concurrent.TimeUnit.NANOSECONDS;
import static org.openjdk.jmh.annotations.Mode.AverageTime;
@OutputTimeUnit(NANOSECONDS)
@Warmup(iterations = 5, time = 1000, timeUnit = MILLISECONDS)
@Measurement(iterations = 5, time = 1000, timeUnit = MILLISECONDS)
@BenchmarkMode(AverageTime)
@Fork(value = 1)
public class Bench {
@State(Scope.Benchmark)
public static class BenchState {
public double value;
@Setup(Level.Trial)
public void setUp() {
value = ThreadLocalRandom.current().nextDouble();
}
}
static double sqrtApprox(double square){
//Initial approximation
double approx = 3.5;
//10 iterations of Newton's method to improve the approximation
for(int i = 0; i < 10; i++) {
approx = (approx + square / approx) / 2;
}
return approx;
}
@Benchmark
public double javaSqrt(BenchState state) {
return java.lang.Math.sqrt(state.value);
}
@Benchmark
public double approxSqrt(BenchState state) {
return sqrtApprox(state.value);
}
public static void main(String[] args) throws Exception {
new Runner(new OptionsBuilder()
.include(Bench.class.getName())
.build()
).run();
}
}
Results:
Benchmark Mode Cnt Score Error Units
Bench.approxSqrt avgt 5 17,355 ± 0,342 ns/op
Bench.javaSqrt avgt 5 2,404 ± 0,767 ns/op
Here's the test I was originally using to test the algorithm:
import static java.lang.Math.*;
public class FastSqrt {
static double sqrtApprox(double square){
//Initial approximation
double approx = 3.5;
//10 iterations of Newton's method to improve the approximation
for(int i = 0; i < 10; i++) {
approx = (approx + square / approx) / 2;
}
return approx;
}
public static void main(String[] args) {
long startTime;
long endTime;
double numberDouble = 1.0/10000;
long tests = 1000000000;
tests *= 3;
startTime = System.nanoTime();
for(long i = 0; i < tests; i++){
double test = sqrt(numberDouble);
}
endTime = System.nanoTime();
long diffOriginal = endTime - startTime;
System.out.print(sqrt(numberDouble));
System.out.println("\t Took " + diffOriginal/1000000.0 + " ms");
startTime = System.nanoTime();
for(long i = 0; i < tests; i++){
double test = sqrtApprox(numberDouble);
}
endTime = System.nanoTime();
long diffMine = endTime - startTime;
System.out.print(sqrtApprox(numberDouble));
System.out.println("\t Took " + diffMine/1000000.0 + " ms");
System.out.println("\nMy algorithm is about " + ((double)diffOriginal/diffMine) + " times faster" );
System.out.println("And off by about " + (int)(100 * (1 - sqrt(numberDouble)/sqrtApprox(numberDouble))) + "%");
}
}
Indeed, I'd used a static input and didn't know or understand how that would be optimized away by the java compiler. I also had no idea about JMH before this conversation, and didn't know in general how compiler optimizations could throw a wrench in optimization results. So, I used the benchmark you provided to improve the approximation instead:
static double sqrtApprox(double square){
//okay-ish approximation for sqrt(x) that takes little computation is log base 2(x + 2)
//aka roughly the exponent of the double x + 2
double approx = java.lang.Math.getExponent(square + 2);
//2 iterations of Newton's method to improve the original approximation
approx = (approx + square / approx) * 0.5;
approx = (approx + square / approx) * 0.5;
return approx;
}
The benchmark results (with 30 iterations rather than 5) now look like this:
Benchmark Mode Cnt Score Error Units
Bench.approxSqrt avgt 30 6.030 ± 0.302 ns/op
Bench.javaSqrt avgt 30 6.710 ± 0.645 ns/op
I've pushed the new approximation to the fork I used for this PR, as well as the changes to the if() statements that control whether to use it or Java's sqrt() depending on the input, as unfortunately it is now much less accurate.
You likely did not enable the Options.FASTMATH
flag (which your implementation is now using) in the benchmark, so both benchmark methods are obviously testing the same java.lang.Math.sqrt() method (as can be seen when looking at the +/- error of JMH).
When actually benchmarking your implementation, I get the following timings on a JDK 17:
Benchmark Mode Cnt Score Error Units
Bench.approxSqrt avgt 5 4,188 ± 0,043 ns/op
Bench.javaSqrt avgt 5 2,044 ± 0,048 ns/op
Like I said: It is not possible to speed up sqrt() because there is already hardware support for that! And don't you think that if there was a possible faster approximation of something as simple as sqrt() then people would have found that like 40 years ago already and googling for it would show up that already?
I am going to close this PR now.
I benchmarked this completely separately from the actual implementation I made; it would have nothing to do with Options.FastMath.
I understand you closing my PR because my algorithm isn't faster than java's sqrt() function in your testing. However, if you don't want anyone so much as attempting because hardware is faster, please make that clear. What prompted me to even try was this line:
/* Other math functions not yet approximated */
The actual algorithm compute times were somewhat inconsistent, but the approximation was generally 5-15% faster than Java's sqrt() function in my testing.