Closed stone-d-chen closed 1 month ago
Hi @nuomi2021, vertical is almost done, just need to fix some issues with 8 bit
vvc_v_loop_filter_chroma_10_mix_no-shift_c: 90.2
vvc_v_loop_filter_chroma_10_mix_no-shift_avx: 80.2
vvc_v_loop_filter_chroma_10_mix_shift_c: 140.0
vvc_v_loop_filter_chroma_10_mix_shift_avx: 80.0
vvc_v_loop_filter_chroma_10_one-side_no-shift_c: 150.2
vvc_v_loop_filter_chroma_10_one-side_no-shift_avx: 60.2
vvc_v_loop_filter_chroma_10_one-side_shift_c: 150.2
vvc_v_loop_filter_chroma_10_one-side_shift_avx: 60.0
vvc_v_loop_filter_chroma_10_strong_no-shift_c: 120.2
vvc_v_loop_filter_chroma_10_strong_no-shift_avx: 60.2
vvc_v_loop_filter_chroma_10_strong_shift_c: 150.0
vvc_v_loop_filter_chroma_10_strong_shift_avx: 80.0
vvc_v_loop_filter_chroma_10_weak_no-shift_c: 90.0
vvc_v_loop_filter_chroma_10_weak_no-shift_avx: 60.2
vvc_v_loop_filter_chroma_10_weak_shift_c: 100.0
vvc_v_loop_filter_chroma_10_weak_shift_avx: 60.2
vvc_v_loop_filter_chroma_12_mix_no-shift_c: 90.2
vvc_v_loop_filter_chroma_12_mix_no-shift_avx: 60.2
vvc_v_loop_filter_chroma_12_mix_shift_c: 130.2
vvc_v_loop_filter_chroma_12_mix_shift_avx: 60.0
vvc_v_loop_filter_chroma_12_one-side_no-shift_c: 130.2
vvc_v_loop_filter_chroma_12_one-side_no-shift_avx: 60.2
vvc_v_loop_filter_chroma_12_one-side_shift_c: 150.2
vvc_v_loop_filter_chroma_12_one-side_shift_avx: 50.2
vvc_v_loop_filter_chroma_12_strong_no-shift_c: 120.2
vvc_v_loop_filter_chroma_12_strong_no-shift_avx: 60.2
vvc_v_loop_filter_chroma_12_strong_shift_c: 150.2
vvc_v_loop_filter_chroma_12_strong_shift_avx: 60.2
vvc_v_loop_filter_chroma_12_weak_no-shift_c: 90.2
vvc_v_loop_filter_chroma_12_weak_no-shift_avx: 60.2
vvc_v_loop_filter_chroma_12_weak_shift_c: 100.2
vvc_v_loop_filter_chroma_12_weak_shift_avx: 60.2
Hi, @nuomi2021 should be done now!
vvc_v_loop_filter_chroma_8_mix_no-shift_c: 93.8
vvc_v_loop_filter_chroma_8_mix_no-shift_avx: 73.6
vvc_v_loop_filter_chroma_8_mix_shift_c: 143.8
vvc_v_loop_filter_chroma_8_mix_shift_avx: 53.8
vvc_v_loop_filter_chroma_8_one-side_no-shift_c: 223.8
vvc_v_loop_filter_chroma_8_one-side_no-shift_avx: 63.8
vvc_v_loop_filter_chroma_8_one-side_shift_c: 373.8
vvc_v_loop_filter_chroma_8_one-side_shift_avx: 53.8
vvc_v_loop_filter_chroma_8_strong_no-shift_c: 223.8
vvc_v_loop_filter_chroma_8_strong_no-shift_avx: 63.8
vvc_v_loop_filter_chroma_8_strong_shift_c: 333.8
vvc_v_loop_filter_chroma_8_strong_shift_avx: 63.6
vvc_v_loop_filter_chroma_8_weak_no-shift_c: 93.8
vvc_v_loop_filter_chroma_8_weak_no-shift_avx: 63.8
vvc_v_loop_filter_chroma_8_weak_shift_c: 113.8
vvc_v_loop_filter_chroma_8_weak_shift_avx: 63.6
vvc_v_loop_filter_chroma_10_mix_no-shift_c: 143.8
vvc_v_loop_filter_chroma_10_mix_no-shift_avx: 63.8
vvc_v_loop_filter_chroma_10_mix_shift_c: 203.8
vvc_v_loop_filter_chroma_10_mix_shift_avx: 73.6
vvc_v_loop_filter_chroma_10_one-side_no-shift_c: 133.8
vvc_v_loop_filter_chroma_10_one-side_no-shift_avx: 73.6
vvc_v_loop_filter_chroma_10_one-side_shift_c: 163.8
vvc_v_loop_filter_chroma_10_one-side_shift_avx: 63.8
vvc_v_loop_filter_chroma_10_strong_no-shift_c: 133.8
vvc_v_loop_filter_chroma_10_strong_no-shift_avx: 93.8
vvc_v_loop_filter_chroma_10_strong_shift_c: 163.8
vvc_v_loop_filter_chroma_10_strong_shift_avx: 83.8
vvc_v_loop_filter_chroma_10_weak_no-shift_c: 103.8
vvc_v_loop_filter_chroma_10_weak_no-shift_avx: 73.8
vvc_v_loop_filter_chroma_10_weak_shift_c: 113.8
vvc_v_loop_filter_chroma_10_weak_shift_avx: 63.8
vvc_v_loop_filter_chroma_12_mix_no-shift_c: 103.8
vvc_v_loop_filter_chroma_12_mix_no-shift_avx: 83.8
vvc_v_loop_filter_chroma_12_mix_shift_c: 143.8
vvc_v_loop_filter_chroma_12_mix_shift_avx: 63.8
vvc_v_loop_filter_chroma_12_one-side_no-shift_c: 143.6
vvc_v_loop_filter_chroma_12_one-side_no-shift_avx: 63.8
vvc_v_loop_filter_chroma_12_one-side_shift_c: 173.8
vvc_v_loop_filter_chroma_12_one-side_shift_avx: 63.8
vvc_v_loop_filter_chroma_12_strong_no-shift_c: 133.8
vvc_v_loop_filter_chroma_12_strong_no-shift_avx: 73.8
vvc_v_loop_filter_chroma_12_strong_shift_c: 173.6
vvc_v_loop_filter_chroma_12_strong_shift_avx: 63.8
vvc_v_loop_filter_chroma_12_weak_no-shift_c: 93.8
vvc_v_loop_filter_chroma_12_weak_no-shift_avx: 63.8
vvc_v_loop_filter_chroma_12_weak_shift_c: 113.8
vvc_v_loop_filter_chroma_12_weak_shift_avx: 63.8
Hi @nuomi2021, should I switch to Luma now? versus submitting chroma to the mailing list
Hi @stone-d-chen , We need to find a way to share code with hevc for chrome. It's better to send the patch with the luma.
I will fully focus on this and collaborate with you in the following weeks
Some of the ways I wrote the horizontal asm aren't compatible with vertical. Strong calculations currently stores certain calculations to free up registers for later use. This happens in the middle of the computation. This is a problem since vertical needs to transpose the entire set of registers before storing.
Begin moving register stores to m0,..., m7 earlier. e.g. movu m3, m12 free m12 for use. This will prevent the need to clobber m0.