Adds AVX2 assembly for SAD used in DMVR (decoder-side motion vector refinement). The main difference is that in VVC, SAD is only calculated on even rows of the PU to reduce complexity. Implements SAD via min/max/sub for 16bit values.
DMVR is restricted to PUs whose width >= 8, height >=8 and width * height >= 128 (ie 8x8 is not a valid size).
Adds AVX2 assembly for SAD used in DMVR (decoder-side motion vector refinement). The main difference is that in VVC, SAD is only calculated on even rows of the PU to reduce complexity. Implements SAD via min/max/sub for 16bit values.
DMVR is restricted to PUs whose width >= 8, height >=8 and width * height >= 128 (ie 8x8 is not a valid size).
Ran on AMD 7940HS