Closed Ka-zam closed 8 months ago
I ran all kernels for special values:
magnus@r7950x3d:~/src/kazam/scratch$ ./a.out
x:
-0.0000e+00 0.0000e+00 inf -inf nan -nan 1.0000e-30 1.0000e+30
generic:
-inf inf 0.0000e+00 -0.0000e+00 nan -nan 1.0000e+30 1.0000e-30
a_sse:
-inf inf 0.0000e+00 -0.0000e+00 nan -nan 1.0000e+30 1.0000e-30
a_avx:
-inf inf 0.0000e+00 -0.0000e+00 nan -nan 1.0000e+30 1.0000e-30
a_avx512:
-inf inf 0.0000e+00 -0.0000e+00 nan -nan 9.9999e+29 9.9999e-31
NaN and inf with sign are properly handled for all kernels.
No objections. The broken build should be fixed now with #761 . Merging...
I think this is a better implementation of the reciprocal kernel as it uses the new _mm512_rcp14_ps intrinsic that handles exceptions correctly. It's accurate to tol < 6.2e-5. On a 7950X3D there is a 30% speedup.