Closed 0x000006 closed 1 year ago
Basically replace x * y + z with fma(x, y, z). Should be about 35-45% faster on an Intel processor. Proof of concept: JMH benchmark code:
package org.example; import org.openjdk.jmh.annotations.*; import java.io.IOException; import java.util.concurrent.ThreadLocalRandom; import java.util.concurrent.TimeUnit; @OutputTimeUnit(TimeUnit.NANOSECONDS) @Warmup(iterations = 2, time = 1000, timeUnit = TimeUnit.MILLISECONDS) @Measurement(iterations = 4, time = 2500, timeUnit = TimeUnit.MILLISECONDS) @BenchmarkMode(Mode.AverageTime) @Fork(1) public class Main { public static void main(String[] args) throws IOException { org.openjdk.jmh.Main.main(args); } @State(Scope.Benchmark) public static class BenchState { public double value; @Setup(Level.Trial) public void setUp() { value = ThreadLocalRandom.current().nextDouble(); } } @Benchmark public double builtinAtan2(BenchState state) { return java.lang.Math.atan2(state.value, state.value); } @Benchmark public double newAtan2(BenchState state) { return org.example.Math.fastAtan2new(state.value, state.value); } @Benchmark public double oldAtan2(BenchState state) { return org.example.Math.fastAtan2old(state.value, state.value); } }
Results:
Benchmark Mode Cnt Score Error Units Main.builtinAtan2 avgt 4 75,509 ± 3,748 ns/op Main.newAtan2 avgt 4 1,768 ± 0,179 ns/op Main.oldAtan2 avgt 4 2,415 ± 0,038 ns/op
Additional information:
# JMH version: 1.36 # VM version: JDK 17.0.7, Java HotSpot(TM) 64-Bit Server VM, 17.0.7+8-LTS-224
Thanks!
Basically replace x * y + z with fma(x, y, z). Should be about 35-45% faster on an Intel processor. Proof of concept: JMH benchmark code:
Results:
Additional information: