dubhater / vapoursynth-mvtools

Motion compensation and stuff
181 stars 27 forks source link

Cleanup Degrain AVX, similar to SSE2 code. #61

Closed adworacz closed 1 year ago

adworacz commented 1 year ago

Consider this optional - it's a nice code cleanup, but maybe the perf hit is unacceptable?

There is no major change in performance. The new code is slightly slower, but we're talking very small differences here, like maybe ~1%.

Tests run on blocksize 8 and 16.

When looking at the benchmarks, the key is to look at the comparisons between the old and new AVX code. I can provide specific FPS numbers, but I went with the summary below to keep things succinct.

Here are some numbers for 540p and 1080p. Note the mvversion arg for oldavx vs newavx:

Summary
  'vspipe -p -e 3000 -o 2 --arg mvversion=oldavx --arg src=test-540p.dgi tester.vpy /dev/null' ran
    1.00 ± 0.00 times faster than 'vspipe -p -e 3000 -o 2 --arg mvversion=newavx --arg src=test-540p.dgi tester.vpy /dev/null'
    2.69 ± 0.01 times faster than 'vspipe -p -e 3000 -o 1 --arg mvversion=oldavx --arg src=test-540p.dgi tester.vpy /dev/null'
    2.70 ± 0.00 times faster than 'vspipe -p -e 3000 -o 1 --arg mvversion=newavx --arg src=test-540p.dgi tester.vpy /dev/null'
    3.14 ± 0.01 times faster than 'vspipe -p -e 3000 -o 2 --arg mvversion=oldavx --arg src=test-1080p.dgi tester.vpy /dev/null'
    3.15 ± 0.01 times faster than 'vspipe -p -e 3000 -o 2 --arg mvversion=newavx --arg src=test-1080p.dgi tester.vpy /dev/null'
    8.88 ± 0.04 times faster than 'vspipe -p -e 3000 -o 1 --arg mvversion=oldavx --arg src=test-1080p.dgi tester.vpy /dev/null'
    8.89 ± 0.02 times faster than 'vspipe -p -e 3000 -o 1 --arg mvversion=newavx --arg src=test-1080p.dgi tester.vpy /dev/null'

Same thing for 4k:

Summary
  'vspipe -p -e 500 -o 2 --arg mvversion=oldavx --arg src=test-4k.dgi tester.vpy /dev/null' ran
    1.01 ± 0.00 times faster than 'vspipe -p -e 500 -o 2 --arg mvversion=newavx --arg src=test-4k.dgi tester.vpy /dev/null'
    2.69 ± 0.00 times faster than 'vspipe -p -e 500 -o 1 --arg mvversion=oldavx --arg src=test-4k.dgi tester.vpy /dev/null'
    2.69 ± 0.00 times faster than 'vspipe -p -e 500 -o 1 --arg mvversion=newavx --arg src=test-4k.dgi tester.vpy /dev/null'
dubhater commented 1 year ago

Eh, it's fine. If you want more speed, you can try -funroll-loops. :)

Thanks!