Consider this optional - it's a nice code cleanup, but maybe the perf hit is unacceptable?
There is no major change in performance. The new code is slightly slower, but we're talking very small differences here, like maybe ~1%.
Tests run on blocksize 8 and 16.
When looking at the benchmarks, the key is to look at the comparisons between the old and new AVX code. I can provide specific FPS numbers, but I went with the summary below to keep things succinct.
Here are some numbers for 540p and 1080p. Note the mvversion arg for oldavx vs newavx:
Summary
'vspipe -p -e 3000 -o 2 --arg mvversion=oldavx --arg src=test-540p.dgi tester.vpy /dev/null' ran
1.00 ± 0.00 times faster than 'vspipe -p -e 3000 -o 2 --arg mvversion=newavx --arg src=test-540p.dgi tester.vpy /dev/null'
2.69 ± 0.01 times faster than 'vspipe -p -e 3000 -o 1 --arg mvversion=oldavx --arg src=test-540p.dgi tester.vpy /dev/null'
2.70 ± 0.00 times faster than 'vspipe -p -e 3000 -o 1 --arg mvversion=newavx --arg src=test-540p.dgi tester.vpy /dev/null'
3.14 ± 0.01 times faster than 'vspipe -p -e 3000 -o 2 --arg mvversion=oldavx --arg src=test-1080p.dgi tester.vpy /dev/null'
3.15 ± 0.01 times faster than 'vspipe -p -e 3000 -o 2 --arg mvversion=newavx --arg src=test-1080p.dgi tester.vpy /dev/null'
8.88 ± 0.04 times faster than 'vspipe -p -e 3000 -o 1 --arg mvversion=oldavx --arg src=test-1080p.dgi tester.vpy /dev/null'
8.89 ± 0.02 times faster than 'vspipe -p -e 3000 -o 1 --arg mvversion=newavx --arg src=test-1080p.dgi tester.vpy /dev/null'
Same thing for 4k:
Summary
'vspipe -p -e 500 -o 2 --arg mvversion=oldavx --arg src=test-4k.dgi tester.vpy /dev/null' ran
1.01 ± 0.00 times faster than 'vspipe -p -e 500 -o 2 --arg mvversion=newavx --arg src=test-4k.dgi tester.vpy /dev/null'
2.69 ± 0.00 times faster than 'vspipe -p -e 500 -o 1 --arg mvversion=oldavx --arg src=test-4k.dgi tester.vpy /dev/null'
2.69 ± 0.00 times faster than 'vspipe -p -e 500 -o 1 --arg mvversion=newavx --arg src=test-4k.dgi tester.vpy /dev/null'
Consider this optional - it's a nice code cleanup, but maybe the perf hit is unacceptable?
There is no major change in performance. The new code is slightly slower, but we're talking very small differences here, like maybe ~1%.
Tests run on blocksize 8 and 16.
When looking at the benchmarks, the key is to look at the comparisons between the old and new AVX code. I can provide specific FPS numbers, but I went with the summary below to keep things succinct.
Here are some numbers for 540p and 1080p. Note the
mvversion
arg for oldavx vs newavx:Same thing for 4k: