ffvvc / FFmpeg

VVC Decoder for ffmpeg
Other
50 stars 12 forks source link

Deblocking skip optimizations / handle max_len_{p,q} = 0 case #242

Closed stone-d-chen closed 2 months ago

stone-d-chen commented 2 months ago

Fixes performance regression after adding in Strong, One-sided & Weak + other checks.

Instead early exit spatial activity calculation if all max_len_q = 1, and jump straight to the weak calculation.

This also reduces coupling in the spatial activity calculation so that individual calculations (eg tc25 vs beta3) are re-arrangeable.

stone-d-chen commented 2 months ago

Benchmarks

vvc_h_loop_filter_chroma_8_mix_no-shift_c: 59.9
vvc_h_loop_filter_chroma_8_mix_no-shift_avx: 50.2
vvc_h_loop_filter_chroma_8_mix_shift_c: 140.2
vvc_h_loop_filter_chroma_8_mix_shift_avx: 40.2
vvc_h_loop_filter_chroma_8_one-side_no-shift_c: 80.2
vvc_h_loop_filter_chroma_8_one-side_no-shift_avx: 50.2
vvc_h_loop_filter_chroma_8_one-side_shift_c: 220.2
vvc_h_loop_filter_chroma_8_one-side_shift_avx: 59.9
vvc_h_loop_filter_chroma_8_strong_no-shift_c: 89.9
vvc_h_loop_filter_chroma_8_strong_no-shift_avx: 50.2
vvc_h_loop_filter_chroma_8_strong_shift_c: 280.2
vvc_h_loop_filter_chroma_8_strong_shift_avx: 80.2
vvc_h_loop_filter_chroma_8_weak_no-shift_c: 60.2
vvc_h_loop_filter_chroma_8_weak_no-shift_avx: 40.2
vvc_h_loop_filter_chroma_8_weak_shift_c: 80.2
vvc_h_loop_filter_chroma_8_weak_shift_avx: 30.2
vvc_h_loop_filter_chroma_10_mix_no-shift_c: 80.2
vvc_h_loop_filter_chroma_10_mix_no-shift_avx: 40.2
vvc_h_loop_filter_chroma_10_mix_shift_c: 110.2
vvc_h_loop_filter_chroma_10_mix_shift_avx: 30.2
vvc_h_loop_filter_chroma_10_one-side_no-shift_c: 90.2
vvc_h_loop_filter_chroma_10_one-side_no-shift_avx: 49.9
vvc_h_loop_filter_chroma_10_one-side_shift_c: 140.2
vvc_h_loop_filter_chroma_10_one-side_shift_avx: 50.2
vvc_h_loop_filter_chroma_10_strong_no-shift_c: 90.2
vvc_h_loop_filter_chroma_10_strong_no-shift_avx: 49.9
vvc_h_loop_filter_chroma_10_strong_shift_c: 140.2
vvc_h_loop_filter_chroma_10_strong_shift_avx: 40.2
vvc_h_loop_filter_chroma_10_weak_no-shift_c: 60.2
vvc_h_loop_filter_chroma_10_weak_no-shift_avx: 30.2
vvc_h_loop_filter_chroma_10_weak_shift_c: 79.9
vvc_h_loop_filter_chroma_10_weak_shift_avx: 29.9
vvc_h_loop_filter_chroma_12_mix_no-shift_c: 80.2
vvc_h_loop_filter_chroma_12_mix_no-shift_avx: 39.9
vvc_h_loop_filter_chroma_12_mix_shift_c: 99.9
vvc_h_loop_filter_chroma_12_mix_shift_avx: 40.2
vvc_h_loop_filter_chroma_12_one-side_no-shift_c: 89.9
vvc_h_loop_filter_chroma_12_one-side_no-shift_avx: 50.2
vvc_h_loop_filter_chroma_12_one-side_shift_c: 130.2
vvc_h_loop_filter_chroma_12_one-side_shift_avx: 40.2
vvc_h_loop_filter_chroma_12_strong_no-shift_c: 90.2
vvc_h_loop_filter_chroma_12_strong_no-shift_avx: 49.9
vvc_h_loop_filter_chroma_12_strong_shift_c: 130.2
vvc_h_loop_filter_chroma_12_strong_shift_avx: 40.2
vvc_h_loop_filter_chroma_12_weak_no-shift_c: 60.2
vvc_h_loop_filter_chroma_12_weak_no-shift_avx: 39.9
vvc_h_loop_filter_chroma_12_weak_shift_c: 80.2
vvc_h_loop_filter_chroma_12_weak_shift_avx: 29.9
vvc_h_loop_filter_luma_8_skip_c: 40.2
vvc_h_loop_filter_luma_8_skip_avx: 10.2
vvc_h_loop_filter_luma_10_skip_c: 40.2
vvc_h_loop_filter_luma_10_skip_avx: 10.2
vvc_h_loop_filter_luma_12_skip_c: 40.2
vvc_h_loop_filter_luma_12_skip_avx: 10.2
vvc_v_loop_filter_luma_8_skip_c: 40.2
vvc_v_loop_filter_luma_8_skip_avx: 20.2
vvc_v_loop_filter_luma_10_skip_c: 39.9
vvc_v_loop_filter_luma_10_skip_avx: 10.2
vvc_v_loop_filter_luma_12_skip_c: 39.9
vvc_v_loop_filter_luma_12_skip_avx: 20.2
stone-d-chen commented 2 months ago

Hi @nuomi2021, horizontal is finally done and can handle maxlen{p,q} = 0. I'll probably do another pass this week just to clean up the formatting.

nuomi2021 commented 2 months ago

👍, could you remove WIP, so I can merge it firstly