JOML-CI / JOML

A Java math library for OpenGL rendering calculations
MIT License
726 stars 104 forks source link

Added an approximation for sqrt(), and had it apply to invsqrt() as well #299

Closed Leonerd-04 closed 3 years ago

Leonerd-04 commented 3 years ago

The actual algorithm compute times were somewhat inconsistent, but the approximation was generally 5-15% faster than Java's sqrt() function in my testing.

httpdigest commented 3 years ago

When making claims about "optimizations", please always provide the benchmark (hopefully JMH) that validates that claim.

The fact is that Math.sqrt() cannot be improved on when using today's CPUs which all provide dedicated circuitry for computing the square root. And the HotSpot JVM will use an intrinsic that makes use of the respective CPU instruction. Chances are that your benchmark was ill-formed that the JVM optimized your square root function away because it was statically given the same input every time.

Here is a solid JMH-based benchmark which shows other timings:

import org.openjdk.jmh.annotations.*;
import org.openjdk.jmh.runner.Runner;
import org.openjdk.jmh.runner.options.OptionsBuilder;
import java.util.concurrent.ThreadLocalRandom;
import static java.util.concurrent.TimeUnit.MILLISECONDS;
import static java.util.concurrent.TimeUnit.NANOSECONDS;
import static org.openjdk.jmh.annotations.Mode.AverageTime;
@OutputTimeUnit(NANOSECONDS)
@Warmup(iterations = 5, time = 1000, timeUnit = MILLISECONDS)
@Measurement(iterations = 5, time = 1000, timeUnit = MILLISECONDS)
@BenchmarkMode(AverageTime)
@Fork(value = 1)
public class Bench {
    @State(Scope.Benchmark)
    public static class BenchState {
        public double value;
        @Setup(Level.Trial)
        public void setUp() {
            value = ThreadLocalRandom.current().nextDouble();
        }
    }

    static double sqrtApprox(double square){
        //Initial approximation
        double approx = 3.5;
        //10 iterations of Newton's method to improve the approximation
        for(int i = 0; i < 10; i++) {
            approx = (approx + square / approx) / 2;
        }
        return approx;
    }

    @Benchmark
    public double javaSqrt(BenchState state) {
        return java.lang.Math.sqrt(state.value);
    }
    @Benchmark
    public double approxSqrt(BenchState state) {
        return sqrtApprox(state.value);
    }

    public static void main(String[] args) throws Exception {
        new Runner(new OptionsBuilder()
                .include(Bench.class.getName())
                .build()
        ).run();
    }
}

Results:

Benchmark         Mode  Cnt   Score   Error  Units
Bench.approxSqrt  avgt    5  17,355 ± 0,342  ns/op
Bench.javaSqrt    avgt    5   2,404 ± 0,767  ns/op
Leonerd-04 commented 3 years ago

Here's the test I was originally using to test the algorithm:

import static java.lang.Math.*;

public class FastSqrt {

    static double sqrtApprox(double square){
        //Initial approximation
        double approx = 3.5;
        //10 iterations of Newton's method to improve the approximation
        for(int i = 0; i < 10; i++) {
            approx = (approx + square / approx) / 2;
        }
        return approx;
    }

    public static void main(String[] args) {

        long startTime;
        long endTime;
        double numberDouble = 1.0/10000;
        long tests = 1000000000;
        tests *= 3;

        startTime = System.nanoTime();
        for(long i = 0; i < tests; i++){
            double test = sqrt(numberDouble);
        }
        endTime = System.nanoTime();
        long diffOriginal = endTime - startTime;
        System.out.print(sqrt(numberDouble));
        System.out.println("\t Took " + diffOriginal/1000000.0 + " ms");

        startTime = System.nanoTime();
        for(long i = 0; i < tests; i++){
            double test = sqrtApprox(numberDouble);
        }
        endTime = System.nanoTime();
        long diffMine = endTime - startTime;
        System.out.print(sqrtApprox(numberDouble));
        System.out.println("\t Took " + diffMine/1000000.0 + " ms");
        System.out.println("\nMy algorithm is about " + ((double)diffOriginal/diffMine) + " times faster" );
        System.out.println("And off by about " + (int)(100 * (1 - sqrt(numberDouble)/sqrtApprox(numberDouble))) + "%");
    }
}

Indeed, I'd used a static input and didn't know or understand how that would be optimized away by the java compiler. I also had no idea about JMH before this conversation, and didn't know in general how compiler optimizations could throw a wrench in optimization results. So, I used the benchmark you provided to improve the approximation instead:

static double sqrtApprox(double square){
        //okay-ish approximation for sqrt(x) that takes little computation is log base 2(x + 2)
        //aka roughly the exponent of the double x + 2
        double approx = java.lang.Math.getExponent(square + 2);
        //2 iterations of Newton's method to improve the original approximation
        approx = (approx + square / approx) * 0.5;
        approx = (approx + square / approx) * 0.5;
        return approx;
    }

The benchmark results (with 30 iterations rather than 5) now look like this:

Benchmark         Mode  Cnt  Score   Error  Units
Bench.approxSqrt  avgt   30  6.030 ± 0.302  ns/op
Bench.javaSqrt    avgt   30  6.710 ± 0.645  ns/op

I've pushed the new approximation to the fork I used for this PR, as well as the changes to the if() statements that control whether to use it or Java's sqrt() depending on the input, as unfortunately it is now much less accurate.

httpdigest commented 3 years ago

You likely did not enable the Options.FASTMATH flag (which your implementation is now using) in the benchmark, so both benchmark methods are obviously testing the same java.lang.Math.sqrt() method (as can be seen when looking at the +/- error of JMH). When actually benchmarking your implementation, I get the following timings on a JDK 17:

Benchmark         Mode  Cnt  Score   Error  Units
Bench.approxSqrt  avgt    5  4,188 ± 0,043  ns/op
Bench.javaSqrt    avgt    5  2,044 ± 0,048  ns/op

Like I said: It is not possible to speed up sqrt() because there is already hardware support for that! And don't you think that if there was a possible faster approximation of something as simple as sqrt() then people would have found that like 40 years ago already and googling for it would show up that already?

I am going to close this PR now.

Leonerd-04 commented 3 years ago

I benchmarked this completely separately from the actual implementation I made; it would have nothing to do with Options.FastMath.

I understand you closing my PR because my algorithm isn't faster than java's sqrt() function in your testing. However, if you don't want anyone so much as attempting because hardware is faster, please make that clear. What prompted me to even try was this line:

/* Other math functions not yet approximated */