dpiparo / vdt

Other
42 stars 18 forks source link

adding tanh #5

Closed pseyfert closed 7 years ago

pseyfert commented 7 years ago

as suggested in root-project/root#1044, a copy&paste inclusion of a pade tanh approximation to vdt.

Please check, I mostly grepped around to get an idea where changes are necessary and then copy&pasted.

dpiparo commented 7 years ago

Hi @pseyfert , great job and so quickly done! Since you also updated the vdt benchmarks and tests, could you verify that the code vectorises with a modern gcc version, say, gcc6 or gcc7 ?

dpiparo commented 7 years ago

Now that also tanh and atanh are present, the quality of the functions could be checked via the application of f and f^{-1} in series in order to check how the argument changes after this calculation. Maybe this is something we'd need to do with gtest but these are ideas triggered by this PR but NOT related to it directly :-)

dpiparo commented 7 years ago

Hi @pseyfert , preliminary results, to be confirmed. Ran on a mac (this is an important info: libm implementation of apple is faster than the one of linux, which is the one hep data processing applications rely on), FMA, AVX2:

Function Tanh            : 3.18 +- 0.26 ns
Function Fast_Tanh       : 3.00 +- 0.25 ns
Function Tanhv           : 3.35 +- 0.27 ns
Function Fast_Tanhv      : 2.32 +- 0.19 ns
Function Tanhf           : 3.01 +- 0.25 ns
Function Fast_Tanhf      : 2.57 +- 0.21 ns
Function Tanhfv          : 2.64 +- 0.22 ns
Function Fast_Tanhfv     : 1.36 +- 0.11 ns

The Fast_* symbols are vdt, f stands for float (as opposed to double), v signals autovectorised array signatures. The new implementation beats the old one in all cases. This is remarkable.