ffvvc / FFmpeg

VVC Decoder for ffmpeg
Other
48 stars 12 forks source link

AVX2 implementation of DMVR SAD for VVC #213

Closed stone-d-chen closed 2 months ago

stone-d-chen commented 2 months ago

Adds AVX2 assembly for SAD used in DMVR (decoder-side motion vector refinement). The main difference is that in VVC, SAD is only calculated on even rows of the PU to reduce complexity. Implements SAD via min/max/sub for 16bit values.

DMVR is restricted to PUs whose width >= 8, height >=8 and width * height >= 128 (ie 8x8 is not a valid size).


AVX2:
 - vvc_sad.check_vvc_sad_8_16bpc   [OK]
 - vvc_sad.check_vvc_sad_16_16bpc  [OK]
 - vvc_sad.check_vvc_sad_32_16bpc  [OK]
 - vvc_sad.check_vvc_sad_64_16bpc  [OK]
 - vvc_sad.check_vvc_sad_128_16bpc [OK]
checkasm: all 5 tests passed
vvc_sad_8_16bpc_c: 122.5
vvc_sad_8_16bpc_avx2: 12.5
vvc_sad_16_16bpc_c: 262.5
vvc_sad_16_16bpc_avx2: 22.5
vvc_sad_32_16bpc_c: 1012.5
vvc_sad_32_16bpc_avx2: 92.5
vvc_sad_64_16bpc_c: 3922.5
vvc_sad_64_16bpc_avx2: 372.5
vvc_sad_128_16bpc_c: 16682.5
vvc_sad_128_16bpc_avx2: 1892.5
//before
BQTerrace_1920x1080_60_10_420_22_RA.vvc | 80.3 |
Chimera_8bit_1080P_1000_frames.vvc | 157.3 |
NovosobornayaSquare_1920x1080.bin | 160.0 |
RitualDance_1920x1080_60_10_420_37_RA.266 | 146.7 | 

//after
BQTerrace_1920x1080_60_10_420_22_RA.vvc | 81.3 |
Chimera_8bit_1080P_1000_frames.vvc | 165.0 |
NovosobornayaSquare_1920x1080.bin | 164.7 |
RitualDance_1920x1080_60_10_420_37_RA.266 | 150.0 |

Ran on AMD 7940HS

stone-d-chen commented 2 months ago

Replaced with https://github.com/ffvvc/FFmpeg/pull/214