Don't forget to run volk_profile beforehand to actually reap the benefits of volk's optimized functions! It does yield a substantial speedup, I just forgot to run volk_profile last time.
LGTM. A slight improvement has been observed (and much more consistent than with the current implementation), but there's a bottleneck somewhere else in my machine and I bet is the GUI. Merging now!
Don't forget to run volk_profile beforehand to actually reap the benefits of volk's optimized functions! It does yield a substantial speedup, I just forgot to run volk_profile last time.