ffvvc / FFmpeg

VVC Decoder for ffmpeg
Other
48 stars 12 forks source link

draft - vvc_deblocking - chroma #233

Open stone-d-chen opened 3 weeks ago

stone-d-chen commented 3 weeks ago

Started taking some notes on the differences between the hevc and vvc implementation.

The current plan is to implement the loop_filter_luma_large as a separate asm chunk re-using bits of code for the hevc_deblock.asm

The hevc_deblock.asm hooks hevc_h_loop_filter_chroma which integrates the decision for filter selection and the calculation. This makes it a bit complicated since vvc has new decisions to make (large filter vs original small), the original small filter has slightly different inputs between hevc / vvc, hevc needs to be ported to avx2, and generally my lack of knowledge of deblocking.

Instead the idea is to start with just the loop_filter_luma_large and hook it directly first. I can re-use utility functions from the HEVC implementation (while porting to avx2) and it will be more easily testable. Once that's completed, I should have enough knowledge to work on integrating the original hevc code.

nuomi2021 commented 3 weeks ago

change hevc c code is risky, How about we do things like this

  1. copy or reuse hevc's check asm
  2. make the current hevc test case passed for vvc
  3. add new cases for loop_filter_chroma_strong
  4. implement loop_filter_chroma_strong AVX2
  5. continue steps 3 to 4 for loop_filter_chroma_strong_one_side, loop_filter_luma_large
stone-d-chen commented 3 weeks ago

change hevc c code is risky, How about we do things like this

Oh yeah, I was just moving it around so I could visually compare the differences between the vvc & hevc implementations more easily

  1. copy or reuse hevc's check asm
  2. make the current hevc test case passed for vvc
  3. add new cases for loop_filter_chroma_strong
  4. implement loop_filter_chroma_strong AVX2
  5. continue steps 3 to 4 for loop_filter_chroma_strong_one_side, loop_filter_luma_large

Sounds good! I'll start this

nuomi2021 commented 3 weeks ago

Oh yeah, I was just moving it around so I could visually compare the differences between the vvc & hevc implementations more easily

you can use https://winmerge.org/ or other tools to compare

stone-d-chen commented 3 weeks ago

Oh yeah, I was just moving it around so I could visually compare the differences between the vvc & hevc implementations more easily

you can use https://winmerge.org/ or other tools to compare

Yeah I ended buying a copy of Araxis merge since that's what I use at work haha.

Update: I hacked in the chroma_weak asm and it seems to be working on conformance - just via c&p. I'm going to switch it to AVX2, debug, and then finish adding in the checkasm.

nuomi2021 commented 3 weeks ago

could we change the hevc deblock ASM code directly? we can add code like this, and define VVC to 1 when we use it for vvc

%if VVC
   loop_filter_chroma_strong 
%endif
stone-d-chen commented 3 weeks ago

Yeah I think that works, the end goal is to share a single file for hevc and vvc?

nuomi2021 commented 3 weeks ago

Yes. We can renmae the hevc_deblock to h2645_deblock.asm. most code are there. in the vvc_deblock.asm, we only need two lines

%include h2645_deblock.asm
LOOP_FILTER vvc

in the hevc_deblock.asm, we also only need two lines

%include h2645_deblock.asm
LOOP_FILTER hevc
nuomi2021 commented 3 weeks ago

If possible, check_asm can share one function too. You can think about it :)

nuomi2021 commented 2 weeks ago

Yes. We can renmae the hevc_deblock to h2645_deblock.asm. most code are there. in the vvc_deblock.asm, we only need two lines

%include h2645_deblock.asm
LOOP_FILTER vvc

in the hevc_deblock.asm, we also only need two lines

%include h2645_deblock.asm
LOOP_FILTER hevc

Feel free to schedule a call, if anything is not clear to you

stone-d-chen commented 2 weeks ago

Yes. We can renmae the hevc_deblock to h2645_deblock.asm. most code are there. in the vvc_deblock.asm, we only need two lines

%include h2645_deblock.asm
LOOP_FILTER vvc

in the hevc_deblock.asm, we also only need two lines

%include h2645_deblock.asm
LOOP_FILTER hevc

Yeah that makes sense - I added in the final pieces to adapt chroma_weak for 8 and 12 bits in the current files.

The next piece I am working on is translating the following to asm https://github.com/ffvvc/FFmpeg/blob/e81b6d78fc2ddf8edd53a6a052713354ef8d27c2/libavcodec/vvc/vvc_filter_template.c#L713-L754

So then I can add in the test cases for the strong and one-side strong into the same code path.

I was thinking another approach would be to just start working on the strong/one-sided strong, and work on integrating the cases later.

stone-d-chen commented 2 weeks ago

Also I was looking at how the HEVC luma deblocking works on calculating the weak / strong filter and uses masks to the masks - I think I'll need to do something similar for the chroma strong / one-sided strong.

Feel free to schedule a call, if anything is not clear to you

Yeah it would be good to go over the general approach; I'm free most days 19:00 UTC+8 if that would work.

stone-d-chen commented 1 week ago

Rough implementation of horizontal is done now - it has all the pieces necessary to handle the shift parameter as well

Rough benchmark

AVX:
 - vvc_deblock.chroma [OK]
checkasm: all 1 tests passed
vvc_h_loop_filter_chroma_10_c: 130.0
vvc_h_loop_filter_chroma_10_avx: 40.0

Doesn't seem to be an avx2 benefit which makes sense, not really exploiting the full register.

Next step is to clean-up the implementation then move to vertical / strong strong -- have a few days off from work so hoping to get this wrapped up by next week.

nuomi2021 commented 1 week ago

vvc_h_loop_filter_chroma_10_c: 130.0 vvc_h_loop_filter_chroma_10_avx: 40.0

👍, did you check the hevc performance, is it matched?

thank you

stone-d-chen commented 1 week ago

vvc_h_loop_filter_chroma_10_c: 130.0 vvc_h_loop_filter_chroma_10_avx: 40.0

👍, did you check the hevc performance, is it matched?

thank you

Hi, pulled some of the numbers below -- compared to hevc luma, it's very similar ~3-5x faster compared to the c version -- this is the appropriate comparison since we need to do weak/strong filtering in VVC chroma similar to hevc luma.

hevc_h_loop_filter_luma8_skip_c: 20.0
hevc_h_loop_filter_luma8_skip_avx: 9.7

hevc_h_loop_filter_luma8_strong_c: 160.0
hevc_h_loop_filter_luma8_strong_avx: 30.0

hevc_h_loop_filter_luma8_weak_c: 20.0
hevc_h_loop_filter_luma8_weak_avx: 10.0

hevc_h_loop_filter_luma10_strong_c: 160.0
hevc_h_loop_filter_luma10_strong_avx: 39.7

hevc_h_loop_filter_luma10_weak_c: 80.0
hevc_h_loop_filter_luma10_weak_avx: 30.0

hevc_h_loop_filter_luma12_skip_c: 29.7
hevc_h_loop_filter_luma12_skip_avx: 10.0

hevc_v_loop_filter_luma12_strong_c: 150.0
hevc_v_loop_filter_luma12_strong_avx: 39.7
stone-d-chen commented 5 days ago

Quick update - horizontal is essentially done, I'm currently debugging an edge case failure, I think it's due to how I'm trying to test strong vs weak. I'm porting over the hevc checkasm code to generate more realistic test values. Once that's done, the vertical should be straightforward.

nuomi2021 commented 3 days ago

since the virtual boundaries patch merged. I have more time to check the code. Hope I can review the current patch before next week. thank you

stone-d-chen commented 3 days ago

Sounds good, I fixed the issue (simple mistake forgetting to check max_len_q properly), I'll re-base on main soon. Next step is to clean up the checkasm (I did a messy hack to get the hevc code working).