gnuradio / volk

The Vector Optimized Library of Kernels
http://libvolk.org
GNU Lesser General Public License v3.0
557 stars 202 forks source link

Re-enable AVX2 convolutional decoder #741

Closed argilo closed 10 months ago

argilo commented 10 months ago

Reverts #458.

Now that the volk_8u_conv_k7_r2puppet_8u kernel has a working test (fixed in #736), we can safely make changes and be confident that the various protokernels are producing identical output. Here I've re-enabled the broken AVX2 convolutional decoder which was commented out in #458. To get identical output to the other protokernels, I made the following changes (each in a separate commit, for easier review):

I tested with many different vector lengths (for v in {0..2048}; do echo $v; apps/volk_profile -n -R k7_r2puppet -v $v -i 1 2>&1 | grep fail; done), and did not observe any test failures.

Performance of the AVX2 protokernel is slightly worse than the spiral protokernel at the default 131071 vector length, but better at shorter vector lengths (e.g. 16384). I suspect that with some further tweaks, AVX2 performance could be improved, but I'll leave that for a future PR. In particular, increasing the metric shift (as was done in #475 but reverted in #736) may reduce the number of expensive re-normalizations that need to be performed. And perhaps the minimum calculation (which is not SIMD-friendly) could be removed from the re-normalization as well.

/cc @Aang23