ffvvc / FFmpeg

VVC Decoder for ffmpeg
Other
50 stars 12 forks source link

vvc_deblock.asm: chroma vertical implementation #254

Closed stone-d-chen closed 1 month ago

stone-d-chen commented 2 months ago

Some of the ways I wrote the horizontal asm aren't compatible with vertical. Strong calculations currently stores certain calculations to free up registers for later use. This happens in the middle of the computation. This is a problem since vertical needs to transpose the entire set of registers before storing.

Begin moving register stores to m0,..., m7 earlier. e.g. movu m3, m12 free m12 for use. This will prevent the need to clobber m0.

stone-d-chen commented 1 month ago

Hi @nuomi2021, vertical is almost done, just need to fix some issues with 8 bit

vvc_v_loop_filter_chroma_10_mix_no-shift_c: 90.2
vvc_v_loop_filter_chroma_10_mix_no-shift_avx: 80.2
vvc_v_loop_filter_chroma_10_mix_shift_c: 140.0
vvc_v_loop_filter_chroma_10_mix_shift_avx: 80.0
vvc_v_loop_filter_chroma_10_one-side_no-shift_c: 150.2
vvc_v_loop_filter_chroma_10_one-side_no-shift_avx: 60.2
vvc_v_loop_filter_chroma_10_one-side_shift_c: 150.2
vvc_v_loop_filter_chroma_10_one-side_shift_avx: 60.0
vvc_v_loop_filter_chroma_10_strong_no-shift_c: 120.2
vvc_v_loop_filter_chroma_10_strong_no-shift_avx: 60.2
vvc_v_loop_filter_chroma_10_strong_shift_c: 150.0
vvc_v_loop_filter_chroma_10_strong_shift_avx: 80.0
vvc_v_loop_filter_chroma_10_weak_no-shift_c: 90.0
vvc_v_loop_filter_chroma_10_weak_no-shift_avx: 60.2
vvc_v_loop_filter_chroma_10_weak_shift_c: 100.0
vvc_v_loop_filter_chroma_10_weak_shift_avx: 60.2
vvc_v_loop_filter_chroma_12_mix_no-shift_c: 90.2
vvc_v_loop_filter_chroma_12_mix_no-shift_avx: 60.2
vvc_v_loop_filter_chroma_12_mix_shift_c: 130.2
vvc_v_loop_filter_chroma_12_mix_shift_avx: 60.0
vvc_v_loop_filter_chroma_12_one-side_no-shift_c: 130.2
vvc_v_loop_filter_chroma_12_one-side_no-shift_avx: 60.2
vvc_v_loop_filter_chroma_12_one-side_shift_c: 150.2
vvc_v_loop_filter_chroma_12_one-side_shift_avx: 50.2
vvc_v_loop_filter_chroma_12_strong_no-shift_c: 120.2
vvc_v_loop_filter_chroma_12_strong_no-shift_avx: 60.2
vvc_v_loop_filter_chroma_12_strong_shift_c: 150.2
vvc_v_loop_filter_chroma_12_strong_shift_avx: 60.2
vvc_v_loop_filter_chroma_12_weak_no-shift_c: 90.2
vvc_v_loop_filter_chroma_12_weak_no-shift_avx: 60.2
vvc_v_loop_filter_chroma_12_weak_shift_c: 100.2
vvc_v_loop_filter_chroma_12_weak_shift_avx: 60.2
stone-d-chen commented 1 month ago

Hi, @nuomi2021 should be done now!

vvc_v_loop_filter_chroma_8_mix_no-shift_c: 93.8
vvc_v_loop_filter_chroma_8_mix_no-shift_avx: 73.6
vvc_v_loop_filter_chroma_8_mix_shift_c: 143.8
vvc_v_loop_filter_chroma_8_mix_shift_avx: 53.8
vvc_v_loop_filter_chroma_8_one-side_no-shift_c: 223.8
vvc_v_loop_filter_chroma_8_one-side_no-shift_avx: 63.8
vvc_v_loop_filter_chroma_8_one-side_shift_c: 373.8
vvc_v_loop_filter_chroma_8_one-side_shift_avx: 53.8
vvc_v_loop_filter_chroma_8_strong_no-shift_c: 223.8
vvc_v_loop_filter_chroma_8_strong_no-shift_avx: 63.8
vvc_v_loop_filter_chroma_8_strong_shift_c: 333.8
vvc_v_loop_filter_chroma_8_strong_shift_avx: 63.6
vvc_v_loop_filter_chroma_8_weak_no-shift_c: 93.8
vvc_v_loop_filter_chroma_8_weak_no-shift_avx: 63.8
vvc_v_loop_filter_chroma_8_weak_shift_c: 113.8
vvc_v_loop_filter_chroma_8_weak_shift_avx: 63.6
vvc_v_loop_filter_chroma_10_mix_no-shift_c: 143.8
vvc_v_loop_filter_chroma_10_mix_no-shift_avx: 63.8
vvc_v_loop_filter_chroma_10_mix_shift_c: 203.8
vvc_v_loop_filter_chroma_10_mix_shift_avx: 73.6
vvc_v_loop_filter_chroma_10_one-side_no-shift_c: 133.8
vvc_v_loop_filter_chroma_10_one-side_no-shift_avx: 73.6
vvc_v_loop_filter_chroma_10_one-side_shift_c: 163.8
vvc_v_loop_filter_chroma_10_one-side_shift_avx: 63.8
vvc_v_loop_filter_chroma_10_strong_no-shift_c: 133.8
vvc_v_loop_filter_chroma_10_strong_no-shift_avx: 93.8
vvc_v_loop_filter_chroma_10_strong_shift_c: 163.8
vvc_v_loop_filter_chroma_10_strong_shift_avx: 83.8
vvc_v_loop_filter_chroma_10_weak_no-shift_c: 103.8
vvc_v_loop_filter_chroma_10_weak_no-shift_avx: 73.8
vvc_v_loop_filter_chroma_10_weak_shift_c: 113.8
vvc_v_loop_filter_chroma_10_weak_shift_avx: 63.8
vvc_v_loop_filter_chroma_12_mix_no-shift_c: 103.8
vvc_v_loop_filter_chroma_12_mix_no-shift_avx: 83.8
vvc_v_loop_filter_chroma_12_mix_shift_c: 143.8
vvc_v_loop_filter_chroma_12_mix_shift_avx: 63.8
vvc_v_loop_filter_chroma_12_one-side_no-shift_c: 143.6
vvc_v_loop_filter_chroma_12_one-side_no-shift_avx: 63.8
vvc_v_loop_filter_chroma_12_one-side_shift_c: 173.8
vvc_v_loop_filter_chroma_12_one-side_shift_avx: 63.8
vvc_v_loop_filter_chroma_12_strong_no-shift_c: 133.8
vvc_v_loop_filter_chroma_12_strong_no-shift_avx: 73.8
vvc_v_loop_filter_chroma_12_strong_shift_c: 173.6
vvc_v_loop_filter_chroma_12_strong_shift_avx: 63.8
vvc_v_loop_filter_chroma_12_weak_no-shift_c: 93.8
vvc_v_loop_filter_chroma_12_weak_no-shift_avx: 63.8
vvc_v_loop_filter_chroma_12_weak_shift_c: 113.8
vvc_v_loop_filter_chroma_12_weak_shift_avx: 63.8
stone-d-chen commented 1 month ago

Hi @nuomi2021, should I switch to Luma now? versus submitting chroma to the mailing list

nuomi2021 commented 1 month ago

Hi @stone-d-chen , We need to find a way to share code with hevc for chrome. It's better to send the patch with the luma.

I will fully focus on this and collaborate with you in the following weeks