Closed stone-d-chen closed 1 month ago
@QSXW please help review. thank you
better split to 3 commits
Hi @nuomi2021 and @QSXW
Thanks for the feedback! I've forced pushed my changes addressing the feedback.
better split to 3 commits
refact the current code. add avx2 provide check asm and benchmark data. thank you
I split into the 3 commits suggested and attached the benchmark data to the checkasm commit message.
@QSXW I've addressed all the formatting changes and commented above with the new lines where I've addressed the macro changes. I've additionally added some %define's like vvc_mc.asm to clean to improve the clarity of the "magic" numbers.
I've done some changes vvc_sad.c to remove mixed declaration and code warnings (no warning anymore).
@stone-d-chen , please also cherry pick https://github.com/ffvvc/FFmpeg/tree/workflow to make sure all conformance test passed like https://github.com/ffvvc/FFmpeg/actions/runs/8766181658/job/24058026459?pr=216
please also cherry pick https://github.com/ffvvc/FFmpeg/tree/workflow to make sure all conformance test passed like https://github.com/ffvvc/FFmpeg/actions/runs/8766181658/job/24058026459?pr=216
I've cherry-picked and pushed the commit
Before: BQTerrace_1920x1080_60_10_420_22_RA.vvc | 80.7 | Chimera_8bit_1080P_1000_frames.vvc | 158.0 | NovosobornayaSquare_1920x1080.bin | 159.7 | RitualDance_1920x1080_60_10_420_37_RA.266 | 146.3 |
After: BQTerrace_1920x1080_60_10_420_22_RA.vvc | 82.7 | Chimera_8bit_1080P_1000_frames.vvc | 167.0 | NovosobornayaSquare_1920x1080.bin | 166.3 | RitualDance_1920x1080_60_10_420_37_RA.266 | 154.0 |
👍, please move this to the second commit(the asm code one)
Hi @nuomi2021,
I've pushed the suggested change.
I've also rebased on the latest ffvvc/up and tested conformance on the latest tests and the failing test passes now locally.
dpb_max_num_reorder_pics.266
sintel_120.266
tiles_720p5994_stockholm_ter.266
total = 315, passed = 315, failed = 0, skipped = 0
Hi @QSXW
I received a question on the ML about why not switching on block width inside the asm. I pushed a recent change that adds vvc_sad that merges together vvc_sad_8, vvc_sad_16, vvc_sad_32_128 and jumps based on the block width.
Let me know what you think and I will clean up this branch and then resubmit to the ML.
Hi @QSXW
I received a question on the ML about why not switching on block width inside the asm. I pushed a recent change that adds vvc_sad that merges together vvc_sad_8, vvc_sad_16, vvc_sad_32_128 and jumps based on the block width.
Let me know what you think and I will clean up this branch and then resubmit to the ML.
Sure. That's what dav1d did. It can be changed into one function and jump based on the block width.
I've pushed the new version with a single function and merging 16_128 into one code path within. I'll send the updated version to the ML soon-ish
Should we also add .
to the label vvc_sad_8 and vvc_sad_16 like .loop_width?
Should we also add
.
to the label vvc_sad_8 and vvc_sad_16 like .loop_width?
The . is syntax for a local label so it feels cleaner to have vvc_sad_8, etc not have the . in front, they will be global labels and .loop_width is the local label within.
Should we also add
.
to the label vvc_sad_8 and vvc_sad_16 like .loop_width?The . is syntax for a local label so it feels cleaner to have vvc_sad_8, etc not have the . in front, they will be global labels and .loop_width is the local label within.
Great. Please send the v2 patch to the maillist.
Should we also add
.
to the label vvc_sad_8 and vvc_sad_16 like .loop_width?The . is syntax for a local label so it feels cleaner to have vvc_sad_8, etc not have the . in front, they will be global labels and .loop_width is the local label within. https://www.nasm.us/doc/nasmdoc3.html#section-3.9
Great. Please send the v2 patch to the maillist.
Sounds good, just sent.
Should we also add
.
to the label vvc_sad_8 and vvc_sad_16 like .loop_width?The . is syntax for a local label so it feels cleaner to have vvc_sad_8, etc not have the . in front, they will be global labels and .loop_width is the local label within.
I just realized that if we define a function in C like void vvc_sad_8
, will it conflict with our label? Can you help confirm that?
Should we also add
.
to the label vvc_sad_8 and vvc_sad_16 like .loop_width?The . is syntax for a local label so it feels cleaner to have vvc_sad_8, etc not have the . in front, they will be global labels and .loop_width is the local label within. https://www.nasm.us/doc/nasmdoc3.html#section-3.9
I just realized that if we define a function in C like
void vvc_sad_8
, will it conflict with our label? Can you help confirm that?
Seems okay, the labels seems to be stripped from the symbol table and only the function entry point is defined
vvc_sad.o: file format elf64-x86-64
SYMBOL TABLE:
0000000000000000 l df *ABS* 0000000000000000 libavcodec/x86/vvc/vvc_sad.asm
0000000000000000 l d .note.GNU-stack 0000000000000000 .note.GNU-stack
0000000000000000 l d .note.gnu.property 0000000000000000 .note.gnu.property
0000000000000000 l d .text 0000000000000000 .text
0000000000000000 g F .text 0000000000000000 .hidden ff_vvc_sad_avx2
Should we also add
.
to the label vvc_sad_8 and vvc_sad_16 like .loop_width?The . is syntax for a local label so it feels cleaner to have vvc_sad_8, etc not have the . in front, they will be global labels and .loop_width is the local label within. https://www.nasm.us/doc/nasmdoc3.html#section-3.9
I just realized that if we define a function in C like
void vvc_sad_8
, will it conflict with our label? Can you help confirm that?Seems okay, the labels seems to be stripped from the symbol table and only the function entry point is defined
vvc_sad.o: file format elf64-x86-64 SYMBOL TABLE: 0000000000000000 l df *ABS* 0000000000000000 libavcodec/x86/vvc/vvc_sad.asm 0000000000000000 l d .note.GNU-stack 0000000000000000 .note.GNU-stack 0000000000000000 l d .note.gnu.property 0000000000000000 .note.gnu.property 0000000000000000 l d .text 0000000000000000 .text 0000000000000000 g F .text 0000000000000000 .hidden ff_vvc_sad_avx2
Okay. Seemed that the compiler would help us strip them. How about the ./configure --disable-stripping --disable-optimizations --enable-debug
?
Should we also add
.
to the label vvc_sad_8 and vvc_sad_16 like .loop_width?The . is syntax for a local label so it feels cleaner to have vvc_sad_8, etc not have the . in front, they will be global labels and .loop_width is the local label within. https://www.nasm.us/doc/nasmdoc3.html#section-3.9
I just realized that if we define a function in C like
void vvc_sad_8
, will it conflict with our label? Can you help confirm that?Seems okay, the labels seems to be stripped from the symbol table and only the function entry point is defined
vvc_sad.o: file format elf64-x86-64 SYMBOL TABLE: 0000000000000000 l df *ABS* 0000000000000000 libavcodec/x86/vvc/vvc_sad.asm 0000000000000000 l d .note.GNU-stack 0000000000000000 .note.GNU-stack 0000000000000000 l d .note.gnu.property 0000000000000000 .note.gnu.property 0000000000000000 l d .text 0000000000000000 .text 0000000000000000 g F .text 0000000000000000 .hidden ff_vvc_sad_avx2
Okay. Seemed that the compiler would help us strip them. How about the
./configure --disable-stripping --disable-optimizations --enable-debug
?
Same objdump
stone@fedora:~/repos/ffvvc_stone$ objdump --syms libavcodec/x86/vvc/vvc_sad.o
libavcodec/x86/vvc/vvc_sad.o: file format elf64-x86-64
SYMBOL TABLE:
0000000000000000 l df *ABS* 0000000000000000 libavcodec/x86/vvc/vvc_sad.asm
0000000000000000 l d .note.GNU-stack 0000000000000000 .note.GNU-stack
0000000000000000 l d .note.gnu.property 0000000000000000 .note.gnu.property
0000000000000000 l d .text 0000000000000000 .text
0000000000000000 g F .text 0000000000000000 .hidden ff_vvc_sad_avx2
Should we also add
.
to the label vvc_sad_8 and vvc_sad_16 like .loop_width?The . is syntax for a local label so it feels cleaner to have vvc_sad_8, etc not have the . in front, they will be global labels and .loop_width is the local label within. https://www.nasm.us/doc/nasmdoc3.html#section-3.9
I just realized that if we define a function in C like
void vvc_sad_8
, will it conflict with our label? Can you help confirm that?Seems okay, the labels seems to be stripped from the symbol table and only the function entry point is defined
vvc_sad.o: file format elf64-x86-64 SYMBOL TABLE: 0000000000000000 l df *ABS* 0000000000000000 libavcodec/x86/vvc/vvc_sad.asm 0000000000000000 l d .note.GNU-stack 0000000000000000 .note.GNU-stack 0000000000000000 l d .note.gnu.property 0000000000000000 .note.gnu.property 0000000000000000 l d .text 0000000000000000 .text 0000000000000000 g F .text 0000000000000000 .hidden ff_vvc_sad_avx2
Okay. Seemed that the compiler would help us strip them. How about the
./configure --disable-stripping --disable-optimizations --enable-debug
?Same objdump
stone@fedora:~/repos/ffvvc_stone$ objdump --syms libavcodec/x86/vvc/vvc_sad.o libavcodec/x86/vvc/vvc_sad.o: file format elf64-x86-64 SYMBOL TABLE: 0000000000000000 l df *ABS* 0000000000000000 libavcodec/x86/vvc/vvc_sad.asm 0000000000000000 l d .note.GNU-stack 0000000000000000 .note.GNU-stack 0000000000000000 l d .note.gnu.property 0000000000000000 .note.gnu.property 0000000000000000 l d .text 0000000000000000 .text 0000000000000000 g F .text 0000000000000000 .hidden ff_vvc_sad_avx2
Great. That should be no issue here. Thanks for the confirmation.
hmm saw the checkasm segfaults on windows .. will need to investigate
hmm saw the checkasm segfaults on windows .. will need to investigate
You can use the VS to debug the checkasm. If the function crashes, it will navigate you to the assembler to check which instruction triggers the exception.
hmm saw the checkasm segfaults on windows .. will need to investigate
You can use the VS to debug the checkasm. If the function crashes, it will navigate you to the assembler to check which instruction triggers the exception.
So the segfault only seems to occur when doing "call_new" calling the function manually doesn't result in a segfault... someone on IRC suggested
it could be something like using a 64bit register where the function prototype only has int32. calling manually may zero-extend or sign-extend the input, whereas call_new() intentionally puts garbage in any registers that aren't guaranteed to be meaningful.
so I'll investigate along these lines
hmm saw the checkasm segfaults on windows .. will need to investigate
You can use the VS to debug the checkasm. If the function crashes, it will navigate you to the assembler to check which instruction triggers the exception.
So the segfault only seems to occur when doing "call_new" calling the function manually doesn't result in a segfault... someone on IRC suggested
it could be something like using a 64bit register where the function prototype only has int32. calling manually may zero-extend or sign-extend the input, whereas call_new() intentionally puts garbage in any registers that aren't guaranteed to be meaningful.
so I'll investigate along these lines
Turns out this is the case, it seems like the offset calculation requires a movsxd since the dx/dy are ints but I'm treating them as quads to do the pointer calculation
Pushed the changes, confirmed working on windows, will send to ML soon.
@stone-d-chen , we have merged the alf asm. remember rebase before you send to upstream.
thank you
@stone-d-chen , we have merged the alf asm. remember rebase before you send to upstream.
thank you
Hi @nuomi2021, yeah the current version here and on ML is rebased against the latest ffvvc:up so it contains the alf asm.
Hi all,
Pushed another version with Ronald's suggested change here and sent to the ML. I might have messed up the send-email, let me know if I need to re-send it.
Thanks, Stone
Hi all,
Pushed another version with Ronald's suggested change here and sent to the ML. I might have messed up the send-email, let me know if I need to re-send it.
Thanks, Stone
No worries. It's very common to resend patches again and again for different comments.
Hi all,
Pushed another version with Ronald's suggested change here and sent to the ML. I might have messed up the send-email, let me know if I need to re-send it.
Thanks, Stone
No worries. This is one reason why open source code quality is often better than closed source.
Latest version (v5) with Ronald's additional feedback pushed and sent to ML. One thing of note was changing the function signature to have intptr_t dx, dy which removed the need for movsxd. I confirmed this passes checkasm on windows as well.
@stone-d-chen , don't get frustrated. Every new version helps to polish your skills.
Oh yeah, I don't mind :) definitely learning a lot with each revision! Will send a new version with latest feedback soon-ish
Adds AVX2 assembly for SAD used in DMVR (decoder-side motion vector refinement). The main difference is that in VVC, SAD is only calculated on even rows of the PU to reduce complexity. Implements SAD via min/max/sub for 16bit values.
DMVR is restricted to PUs whose width >= 8, height >=8 and width * height >= 128 (ie 8x8 is not a valid size).
Based on https://github.com/ffvvc/FFmpeg/pull/213#issue-2246959529 but on top of ffmpeg/master