use tb->max_scan_x and tb->max_scan_y to optimze dct8 and dst7

nuomi2021 commented 1 year ago

by the nature of transformation, We only have a few no zero coeffes. the position is recorded by tb->max_scan_x and tb->max_scan_y.

We can use this to optimize https://github.com/ffvvc/FFmpeg/blob/3cb136dc5fc70d65f9918453a842439323e81908/libavcodec/vvc_itx_1d.c#L146

see https://vcgit.hhi.fraunhofer.de/jvet/VVCSoftware_VTM/-/blob/master/source/Lib/CommonLib/TrQuant_EMT.cpp#L886 for reference.

DataCrusade1999 commented 1 year ago

Hi @nuomi2021, My name is Ashutosh Pandey and I would like to take this issue up as my qualification task for this year's GSoC any advice on how should I set up my development environment for this task? I'm on a Windows machine and I also have WSL enabled.

nuomi2021 commented 1 year ago

Hi @DataCrusade1999 , glad you are willing to do this. Please go ahead. you can refer to our workflow file https://github.com/ffvvc/FFmpeg/blob/main/.github/workflows/makefile.yml

build the ffmpeg
run vvc file.
then insert some debug point around the code.
check what happened for dst2(already optimized) and dst8 and dst7

thank you

DataCrusade1999 commented 1 year ago

Thanks for responding, So I followed the instructions given in makefile.yml and successfully built ffmpeg with few warnings I also ran tests as specified in the makefile.yml and all tests passed except these 6

+++++++++ report +++++++++ failed files: MERGE_I_Qualcomm_2.bit MERGE_J_Qualcomm_2.bit MERGE_H_Qualcomm_2.bit ENTROPY_A_Chipsnmedia_2.bit TREE_C_HHI_3.bit CUBEMAP_A_MediaTek_3.bit

I'm using gcc to compile. I also ran gcc libavcodec/vvc_itx_1d.c and got multiple undefined reference errors to dct8,dst7, and ff_vvc_lfnst_8x8

did you mean to say dct2 or dst2? cause I used git grep -e "dst2" and there are a lot of references but none that I could make sense of regarding this task. and if it is indeed dct2 will the implementation of dct8 and dst7 will be the same as dct2 or a bit different if you could please tell me what to look for in the reference then that would be great.

Thanks

nuomi2021 commented 1 year ago

and all tests passed except these 6

Is it related to WSL? Could you help setup WSL ci(https://github.com/ffvvc/FFmpeg/issues/40)?

I also ran gcc libavcodec/vvc_itx_1d.c

Please use gdb ./ffmpeg_g to debug. you can break at https://github.com/ffvvc/FFmpeg/blob/main/libavcodec/vvc_ctu.c#L1332 to see what happened

did you mean to say dct2 or dst2?

There are 3 transform types in vvc: dct2, dct8, and dst 7. We do vector multiply for dct8 and dst7. Current we will multiple all inputs. But due to nature of codec, the input will be zero after max_scan_x + 1 and max_scan_y + 1, we do not need do multiple for zeros

DataCrusade1999 commented 1 year ago

Is it related to WSL? Could you help setup WSL ci(https://github.com/ffvvc/FFmpeg/issues/40)?

I'm not sure about that. Thanks for the response I'll use gdb now to debug and lookup some more info about transform types in vvc. Thanks again

hamzah-mujawar commented 8 months ago

Hi,

Can I work on this issue, is anyone still working on this?

Thanks, Hamzah Mujawar

nuomi2021 commented 8 months ago

@hamzah-mujawar , yes you can work on this.

https://github.com/ffvvc/FFmpeg/issues/179 is related to this, please consider it too. thank you

hamzah-mujawar commented 8 months ago

Progress Update:

After building ffmpeg, I've been able to debug and set a breakpoint on line number: 88 in vvc_itx_1d.c: I tried to debug this using gdb here are my steps: 1) Ran gdb ./ffmpeg_g 2) Set a breakpoint using vvc_itx_1d.c on line 88 3) Ran ffmpeg using the run command I have attached a screenshot below of these steps:

I would like to ask, what prompt(format: ffmpeg [options] [[infile options] -i infile]... {[outfile options] outfile) can I use to check the program so I can see what is done for dct2?

Also, am I approaching this the right way?

Thanking you for your time, Hamzah Mujawar

nuomi2021 commented 8 months ago

@hamzah-mujawar，you can check our workflow https://github.com/ffvvc/FFmpeg/blob/main/.github/workflows/makefile.yml it will download test clips and test it. most clips has dct2, you can choose a small one

hamzah-mujawar commented 8 months ago

Progress Update:

I finally am able to hit breakpoints on vvc_itx_1d.c :).

I wasn't able to run the workflow, however I downloaded the tests repository, and ran the python3 script. I am now running this in gdb: run -i ~/tests/conformance/passed/v1/8b400_A_Bytedance_2.bit -vsync 0 -f md5 - in gdb.

I will continue down this rabbit hole and report back.

Thanking you for your time, Hamzah Mujawar

hamzah-mujawar commented 8 months ago

Hi,

Could you please point me to the line of code that was shown here in this commit: https://github.com/ffvvc/FFmpeg/blob/main/libavcodec/vvc_ctu.c#L1332

Thanking you, Hamzah Mujawar

frankplow commented 8 months ago

Could you please point me to the line of code that was shown here in this commit: https://github.com/ffvvc/FFmpeg/blob/main/libavcodec/vvc_ctu.c#L1332

This file was moved to libavcodec/vvc/vvc_ctu.c: https://github.com/ffvvc/FFmpeg/blob/e81b6d78fc2ddf8edd53a6a052713354ef8d27c2/libavcodec/vvc/vvc_ctu.c#L1332

hamzah-mujawar commented 8 months ago

Hi,

I've created a pdf document going over the findings and potential changes, and have attached it to this message.

Now that I am able to access tb->max_scan_x, I would appreciate some pointers on how I can use it to optimise matrix_mul.

issue_20.pdf

Thanking you, Hamzah Mujawar

nuomi2021 commented 8 months ago

Hi @hamzah-mujawar , Good progress. We can only give you some hints on this. please:

find which clip will use dct8 or dst7 (Hint: you can run our conformances test on local, and pick the smallest one)
break down at matrix_mul using gdb.
at the same time break down at VTM, forwardMatrixMult function compare the value in 2 and 3, see what is counterparts for "reducedLine" and "numInLines"

thank you.

hamzah-mujawar commented 8 months ago

Hi,

I've been able to do 1 and 2, I have compiled the VTM software. I've tried multiple approaches on trying to set a breakpoint on forwardMatrixMult function and even just printing out the values of reducedLine and numInLines, but I am not able to get any results.

I will try again tomorrow.

Thanking you, Hamzah Mujawar

nuomi2021 commented 8 months ago

set a breakpoint on forwardMatrixMult function and even just printing out the values of reducedLine and numInLines, but I am not able to get any results. you can run the same clip as 2, and add an assert(0) in forwardMatrixMult or his parent function.

hamzah-mujawar commented 8 months ago

Hi,

Sorry to trouble you again.

I'm really confused as I'm not able to hit the assert either, I have built it and ran all the apps show here:

I'm running this command: ./BitstreamExtractorAppStaticd -b ~/tests/conformance/passed/v1/8b400_A_Bytedance_2.bit.

I've checked the stack trace to confirm dst7 is being used on 8b400_A_Bytedance_2.bit:

I've read the documentation in /docs however I haven't found a solution.

Thanking you for your time, Hamzah Mujawar

nuomi2021 commented 8 months ago

please run it with "DecoderAppStaticd -b 8b400_A_Bytedance_2.bit --SIMD=SCALAR" you may need to define JVET_M0497_MATRIX_MULT to 1 and set a breakpoint at fastInverseDST7_B32

nuomi2021 commented 7 months ago

implemented https://github.com/ffvvc/FFmpeg/blob/e81b6d78fc2ddf8edd53a6a052713354ef8d27c2/libavcodec/vvc/vvc_itx_1d.c#L662

thank you all for help on this

ffvvc / FFmpeg

use tb->max_scan_x and tb->max_scan_y to optimze dct8 and dst7 #20