Closed stone-d-chen closed 4 weeks ago
I've started some initial work here: https://github.com/stone-d-chen/ffvvc/pull/6
16bpc seems mostly done, I'll run a few more benchmarks before moving onto 8bpc
VVC_HFR_UHDTV2_OpenGOP_7680x4320_100fps_SDR.bit
0.89% 0.88% ffmpeg_g ffmpeg_g [.] ff_vvc_sad_16_16bpc_avx2
+ 0.87% 0.86% vf#0:0 ffmpeg_g [.] ff_sad16_sse2
0.30% 0.29% vf#0:0 ffmpeg_g [.] ff_sad8_mmxext
0.21% 0.20% vf#0:0 ffmpeg_g [.] ff_sad16_approx_xy2_sse2
0.20% 0.20% vf#0:0 ffmpeg_g [.] sad_hpel_motion_search
0.10% 0.10% vf#0:0 ffmpeg_g [.] ff_sad16_x2_sse2
0.08% 0.08% vf#0:0 ffmpeg_g [.] ff_sad16_y2_sse2
0.06% 0.06% enc0:0:mpeg4 ffmpeg_g [.] ff_sad16_sse2
0.02% 0.02% enc0:0:mpeg4 ffmpeg_g [.] ff_sad8_mmxext
0.01% 0.01% enc0:0:mpeg4 ffmpeg_g [.] ff_sad16_approx_xy2_sse2
0.01% 0.01% enc0:0:mpeg4 ffmpeg_g [.] sad_hpel_motion_search
0.01% 0.01% enc0:0:mpeg4 ffmpeg_g [.] ff_sad16_x2_sse2
0.01% 0.01% enc0:0:mpeg4 ffmpeg_g [.] ff_sad16_y2_sse2
AVX2:
- vvc_sad.check_vvc_sad_8_16bpc [OK]
- vvc_sad.check_vvc_sad_16_16bpc [OK]
- vvc_sad.check_vvc_sad_32_16bpc [OK]
- vvc_sad.check_vvc_sad_64_16bpc [OK]
- vvc_sad.check_vvc_sad_128_16bpc [OK]
checkasm: all 5 tests passed
vvc_sad_8_16bpc_c: 135.5
vvc_sad_8_16bpc_avx2: 15.5
vvc_sad_16_16bpc_c: 275.5
vvc_sad_16_16bpc_avx2: 25.5
vvc_sad_32_16bpc_c: 1085.5
vvc_sad_32_16bpc_avx2: 85.5
vvc_sad_64_16bpc_c: 4255.5
vvc_sad_64_16bpc_avx2: 375.5
vvc_sad_128_16bpc_c: 17505.5
vvc_sad_128_16bpc_avx2: 1945.5
Hi @nuomi2021 (cc @QSXW ) I've made a pull request here https://github.com/ffvvc/FFmpeg/pull/213
Though I realized that maybe I should've been basing my code off of ffvvc/main and not ffvvc/up?
Thank you for the patch Please use
Hi, I've created a new pull request here https://github.com/ffvvc/FFmpeg/pull/215
I believe I did this correctly but I'm still relatively unfamiliar with git haha.
Based on Add AVX2 assembly code for inter predict #51
DMVR (decoder-side motion vector refinement) computes SAD on PUs with the following constraints
There's 8bit versions of SAD that can be used in pixelutils but, as far as I can tell, no 16bit versions (after loosely searching, it seems all sad implementations use psadbw).
As such adding 16bpc SAD could be beneficial for performance.