gnuradio / volk

The Vector Optimized Library of Kernels
http://libvolk.org
GNU Lesser General Public License v3.0
557 stars 202 forks source link

Arctan avx512 #759

Open Ka-zam opened 9 months ago

Ka-zam commented 9 months ago

Added AVX512 kernels and some minor cleanup.

Using AVX512F yields 40% speedup over the AVX2_FMA implementation on my 7950X3D. Compared to the generic atan2 implementation this is a 65x speedup.

                      magnus@r7950x3d:~/src/kazam/volk/build$ volk_profile -R atan
                      RUN_VOLK_TESTS: volk_32f_atan_32f(131071,1987)
                      generic completed in 2010.97 ms
                      polynomial completed in 62.8946 ms
                      a_avx512 completed in 40.8423 ms
                      a_avx2_fma completed in 56.9026 ms
                      a_avx2 completed in 55.9691 ms
                      a_sse4_1 completed in 110.292 ms
                      u_avx512 completed in 39.5152 ms
                      u_avx2_fma completed in 55.4739 ms
                      u_avx2 completed in 55.6364 ms
                      u_sse4_1 completed in 110.009 ms
                      Best aligned arch: u_avx512
                      Best unaligned arch: u_avx512
                      RUN_VOLK_TESTS: volk_32fc_s32f_atan2_32f(131071,1987)
 ------>              generic completed in 4199.28 ms
                      polynomial completed in 95.92 ms
 ------>              a_avx512 completed in 64.0566 ms
                      a_avx2_fma completed in 99.0502 ms
                      a_avx2 completed in 98.3313 ms
 ------>              u_avx512 completed in 63.4753 ms
                      u_avx2_fma completed in 98.3834 ms
                      u_avx2 completed in 98.6633 ms
                      Best aligned arch: u_avx512
                      Best unaligned arch: u_avx512
                      Writing /home/magnus/.volk/volk_config...
Ka-zam commented 9 months ago

I just noticed there's a NaN test as well...

https://github.com/gnuradio/volk/pull/731

Need to update this PR with this as well for AVX512!

jj1bdx commented 9 months ago

@Ka-zam #731 is essential for airspy-fmradion, and I've spent a few weeks solving the NaN issue. Please add the NaN test before completing your implementation.

Ka-zam commented 9 months ago

@Ka-zam #731 is essential for airspy-fmradion, and I've spent a few weeks solving the NaN issue. Please add the NaN test before completing your implementation.

I think it's already in there and should work fine! Wrote a test program and

atan2(0.f, 0.f) == 0.f

for all kernels.

Ka-zam commented 8 months ago

Here's some special values and a sanity check:

magnus@r7950x3d:~/src/kazam/scratch$ ./a.out 
          y :  1.00  -1.00   1.00  -1.00    nan    nan   0.00  -0.00   1.00  -1.00 
          x :  1.00   1.00  -1.00  -1.00   1.00    nan   0.00   0.00   0.00   0.00 
 atan2(y, x):
    generic :  0.79  -0.79   2.36  -2.36    nan    nan   0.00  -0.00   1.57  -1.57 
 polynomial :  0.79  -0.79   2.36  -2.36    nan    nan   0.00  -0.00   1.57  -1.57 
 a_avx512dq :  0.79  -0.79   2.36  -2.36   0.00   0.00   0.00   0.00   1.57  -1.57 

Do we care about the sign of atan2(-0, 0)? What about propagating nan?