colgreen / sharpneat

SharpNEAT - Evolution of Neural Networks. A C# .NET Framework.
https://sharpneat.sourceforge.io/
Other
388 stars 100 forks source link

Revise use of Math.FusedMultiplyAdd() #54

Closed colgreen closed 2 years ago

colgreen commented 2 years ago

Previously I arrived at the conclusion that this method was slow because it did not get substituted with an fma CPU instruction by the JITter. This appears to be wrong (or maybe things have improved since I checked, or perhaps the my old Intel Core i7 6700T didn't have a required CPU instruction(s)(?)).

Anyways, this definitely does result in an FMA like CPU instruction, when the CPU supports it. E.g. see:

https://github.com/dotnet/runtime/issues/34450

sharplab

 public static double Foo(double a, double b, double c)
 {
     return Math.FusedMultiplyAdd(a, b, c);   
 }

C.Foo(Double, Double, Double)
    L0000: vzeroupper
    L0003: vfmadd213sd xmm0, xmm1, xmm2
    L0008: ret
colgreen commented 2 years ago

Math.FusedMultiplyAdd() is being used in the neural net implementations.

I also benchmarked using it in EuclideanDistanceMetric.CalculateDistance(), but it resulted in slower code (.NET 6, Ryzen 7 PRO 5750GE). Maybe the case is that the FMA instruction is a vector instruction (and using vector registers) being used for a scalar operation, and also mixed in with non-vector instructions (which I think can some context switch overhead in some CPUs/scenarios).

Closing.