gnuradio / volk

The Vector Optimized Library of Kernels
http://libvolk.org
GNU Lesser General Public License v3.0
557 stars 202 forks source link

New AVX512F implementation #760

Closed Ka-zam closed 8 months ago

Ka-zam commented 9 months ago

I think this is a better implementation of the reciprocal kernel as it uses the new _mm512_rcp14_ps intrinsic that handles exceptions correctly. It's accurate to tol < 6.2e-5. On a 7950X3D there is a 30% speedup.

magnus@r7950x3d:~/src/kazam/volk/build$ volk_profile -R reci
RUN_VOLK_TESTS: volk_32f_reciprocal_32f(131071,1987)
generic completed in 20.7839 ms
a_sse completed in 41.2548 ms
a_avx completed in 20.6385 ms
a_avx512 completed in 16.861 ms
u_sse completed in 41.301 ms
u_avx completed in 20.7819 ms
u_avx512 completed in 15.9916 ms
Best aligned arch: u_avx512
Best unaligned arch: u_avx512
Writing /home/magnus/.volk/volk_config...
Ka-zam commented 8 months ago

I ran all kernels for special values:

magnus@r7950x3d:~/src/kazam/scratch$ ./a.out 
x:
 -0.0000e+00   0.0000e+00          inf         -inf          nan         -nan   1.0000e-30   1.0000e+30 
generic:
        -inf          inf   0.0000e+00  -0.0000e+00          nan         -nan   1.0000e+30   1.0000e-30 
a_sse:
        -inf          inf   0.0000e+00  -0.0000e+00          nan         -nan   1.0000e+30   1.0000e-30 
a_avx:
        -inf          inf   0.0000e+00  -0.0000e+00          nan         -nan   1.0000e+30   1.0000e-30 
a_avx512:
        -inf          inf   0.0000e+00  -0.0000e+00          nan         -nan   9.9999e+29   9.9999e-31 

NaN and inf with sign are properly handled for all kernels.

jdemel commented 8 months ago

No objections. The broken build should be fixed now with #761 . Merging...