Open stone-d-chen opened 3 weeks ago
change hevc c code is risky, How about we do things like this
change hevc c code is risky, How about we do things like this
Oh yeah, I was just moving it around so I could visually compare the differences between the vvc & hevc implementations more easily
- copy or reuse hevc's check asm
- make the current hevc test case passed for vvc
- add new cases for loop_filter_chroma_strong
- implement loop_filter_chroma_strong AVX2
- continue steps 3 to 4 for loop_filter_chroma_strong_one_side, loop_filter_luma_large
Sounds good! I'll start this
Oh yeah, I was just moving it around so I could visually compare the differences between the vvc & hevc implementations more easily
you can use https://winmerge.org/ or other tools to compare
Oh yeah, I was just moving it around so I could visually compare the differences between the vvc & hevc implementations more easily
you can use https://winmerge.org/ or other tools to compare
Yeah I ended buying a copy of Araxis merge since that's what I use at work haha.
Update: I hacked in the chroma_weak asm and it seems to be working on conformance - just via c&p. I'm going to switch it to AVX2, debug, and then finish adding in the checkasm.
could we change the hevc deblock ASM code directly? we can add code like this, and define VVC to 1 when we use it for vvc
%if VVC
loop_filter_chroma_strong
%endif
Yeah I think that works, the end goal is to share a single file for hevc and vvc?
Yes. We can renmae the hevc_deblock to h2645_deblock.asm. most code are there. in the vvc_deblock.asm, we only need two lines
%include h2645_deblock.asm
LOOP_FILTER vvc
in the hevc_deblock.asm, we also only need two lines
%include h2645_deblock.asm
LOOP_FILTER hevc
If possible, check_asm can share one function too. You can think about it :)
Yes. We can renmae the hevc_deblock to h2645_deblock.asm. most code are there. in the vvc_deblock.asm, we only need two lines
%include h2645_deblock.asm LOOP_FILTER vvc
in the hevc_deblock.asm, we also only need two lines
%include h2645_deblock.asm LOOP_FILTER hevc
Feel free to schedule a call, if anything is not clear to you
Yes. We can renmae the hevc_deblock to h2645_deblock.asm. most code are there. in the vvc_deblock.asm, we only need two lines
%include h2645_deblock.asm LOOP_FILTER vvc
in the hevc_deblock.asm, we also only need two lines
%include h2645_deblock.asm LOOP_FILTER hevc
Yeah that makes sense - I added in the final pieces to adapt chroma_weak for 8 and 12 bits in the current files.
The next piece I am working on is translating the following to asm https://github.com/ffvvc/FFmpeg/blob/e81b6d78fc2ddf8edd53a6a052713354ef8d27c2/libavcodec/vvc/vvc_filter_template.c#L713-L754
So then I can add in the test cases for the strong and one-side strong into the same code path.
I was thinking another approach would be to just start working on the strong/one-sided strong, and work on integrating the cases later.
Also I was looking at how the HEVC luma deblocking works on calculating the weak / strong filter and uses masks to the masks - I think I'll need to do something similar for the chroma strong / one-sided strong.
Feel free to schedule a call, if anything is not clear to you
Yeah it would be good to go over the general approach; I'm free most days 19:00 UTC+8 if that would work.
Rough implementation of horizontal is done now - it has all the pieces necessary to handle the shift parameter as well
Rough benchmark
AVX:
- vvc_deblock.chroma [OK]
checkasm: all 1 tests passed
vvc_h_loop_filter_chroma_10_c: 130.0
vvc_h_loop_filter_chroma_10_avx: 40.0
Doesn't seem to be an avx2 benefit which makes sense, not really exploiting the full register.
Next step is to clean-up the implementation then move to vertical / strong strong -- have a few days off from work so hoping to get this wrapped up by next week.
vvc_h_loop_filter_chroma_10_c: 130.0 vvc_h_loop_filter_chroma_10_avx: 40.0
👍, did you check the hevc performance, is it matched?
thank you
vvc_h_loop_filter_chroma_10_c: 130.0 vvc_h_loop_filter_chroma_10_avx: 40.0
👍, did you check the hevc performance, is it matched?
thank you
Hi, pulled some of the numbers below -- compared to hevc luma, it's very similar ~3-5x faster compared to the c version -- this is the appropriate comparison since we need to do weak/strong filtering in VVC chroma similar to hevc luma.
hevc_h_loop_filter_luma8_skip_c: 20.0
hevc_h_loop_filter_luma8_skip_avx: 9.7
hevc_h_loop_filter_luma8_strong_c: 160.0
hevc_h_loop_filter_luma8_strong_avx: 30.0
hevc_h_loop_filter_luma8_weak_c: 20.0
hevc_h_loop_filter_luma8_weak_avx: 10.0
hevc_h_loop_filter_luma10_strong_c: 160.0
hevc_h_loop_filter_luma10_strong_avx: 39.7
hevc_h_loop_filter_luma10_weak_c: 80.0
hevc_h_loop_filter_luma10_weak_avx: 30.0
hevc_h_loop_filter_luma12_skip_c: 29.7
hevc_h_loop_filter_luma12_skip_avx: 10.0
hevc_v_loop_filter_luma12_strong_c: 150.0
hevc_v_loop_filter_luma12_strong_avx: 39.7
Quick update - horizontal is essentially done, I'm currently debugging an edge case failure, I think it's due to how I'm trying to test strong vs weak. I'm porting over the hevc checkasm code to generate more realistic test values. Once that's done, the vertical should be straightforward.
since the virtual boundaries patch merged. I have more time to check the code. Hope I can review the current patch before next week. thank you
Sounds good, I fixed the issue (simple mistake forgetting to check max_len_q properly), I'll re-base on main soon. Next step is to clean up the checkasm (I did a messy hack to get the hevc code working).
Started taking some notes on the differences between the hevc and vvc implementation.
The current plan is to implement the loop_filter_luma_large as a separate asm chunk re-using bits of code for the hevc_deblock.asm
The hevc_deblock.asm hooks hevc_h_loop_filter_chroma which integrates the decision for filter selection and the calculation. This makes it a bit complicated since vvc has new decisions to make (large filter vs original small), the original small filter has slightly different inputs between hevc / vvc, hevc needs to be ported to avx2, and generally my lack of knowledge of deblocking.
Instead the idea is to start with just the loop_filter_luma_large and hook it directly first. I can re-use utility functions from the HEVC implementation (while porting to avx2) and it will be more easily testable. Once that's completed, I should have enough knowledge to work on integrating the original hevc code.