Optimize fastAtan2 - Githubissues

Basically replace x * y + z with fma(x, y, z). Should be about 35-45% faster on an Intel processor. Proof of concept: JMH benchmark code:

package org.example;

import org.openjdk.jmh.annotations.*;

import java.io.IOException;
import java.util.concurrent.ThreadLocalRandom;
import java.util.concurrent.TimeUnit;

@OutputTimeUnit(TimeUnit.NANOSECONDS)
@Warmup(iterations = 2, time = 1000, timeUnit = TimeUnit.MILLISECONDS)
@Measurement(iterations = 4, time = 2500, timeUnit = TimeUnit.MILLISECONDS)
@BenchmarkMode(Mode.AverageTime)
@Fork(1)
public class Main {
    public static void main(String[] args) throws IOException {
        org.openjdk.jmh.Main.main(args);
    }

    @State(Scope.Benchmark)
    public static class BenchState {
        public double value;
        @Setup(Level.Trial)
        public void setUp() {
            value = ThreadLocalRandom.current().nextDouble();
        }
    }

    @Benchmark
    public double builtinAtan2(BenchState state) {
        return java.lang.Math.atan2(state.value, state.value);
    }

    @Benchmark
    public double newAtan2(BenchState state) {
        return org.example.Math.fastAtan2new(state.value, state.value);
    }

    @Benchmark
    public double oldAtan2(BenchState state) {
        return org.example.Math.fastAtan2old(state.value, state.value);
    }
}

Results:

Benchmark          Mode  Cnt   Score   Error  Units
Main.builtinAtan2  avgt    4  75,509 ± 3,748  ns/op
Main.newAtan2      avgt    4   1,768 ± 0,179  ns/op
Main.oldAtan2      avgt    4   2,415 ± 0,038  ns/op

Additional information:

# JMH version: 1.36
# VM version: JDK 17.0.7, Java HotSpot(TM) 64-Bit Server VM, 17.0.7+8-LTS-224

JOML-CI / JOML

Optimize fastAtan2 #333