ffvvc / FFmpeg

VVC Decoder for ffmpeg
Other
48 stars 12 forks source link

use tb->max_scan_x and tb->max_scan_y to optimze dct8 and dst7 #20

Closed nuomi2021 closed 7 months ago

nuomi2021 commented 1 year ago

by the nature of transformation, We only have a few no zero coeffes. the position is recorded by tb->max_scan_x and tb->max_scan_y.

We can use this to optimize https://github.com/ffvvc/FFmpeg/blob/3cb136dc5fc70d65f9918453a842439323e81908/libavcodec/vvc_itx_1d.c#L146

see https://vcgit.hhi.fraunhofer.de/jvet/VVCSoftware_VTM/-/blob/master/source/Lib/CommonLib/TrQuant_EMT.cpp#L886 for reference.

DataCrusade1999 commented 1 year ago

Hi @nuomi2021, My name is Ashutosh Pandey and I would like to take this issue up as my qualification task for this year's GSoC any advice on how should I set up my development environment for this task? I'm on a Windows machine and I also have WSL enabled.

nuomi2021 commented 1 year ago

Hi @DataCrusade1999 , glad you are willing to do this. Please go ahead. you can refer to our workflow file https://github.com/ffvvc/FFmpeg/blob/main/.github/workflows/makefile.yml

  1. build the ffmpeg
  2. run vvc file.
  3. then insert some debug point around the code.
  4. check what happened for dst2(already optimized) and dst8 and dst7

thank you

DataCrusade1999 commented 1 year ago

Thanks for responding, So I followed the instructions given in makefile.yml and successfully built ffmpeg with few warnings I also ran tests as specified in the makefile.yml and all tests passed except these 6

+++++++++ report +++++++++ failed files: MERGE_I_Qualcomm_2.bit MERGE_J_Qualcomm_2.bit MERGE_H_Qualcomm_2.bit ENTROPY_A_Chipsnmedia_2.bit TREE_C_HHI_3.bit CUBEMAP_A_MediaTek_3.bit

I'm using gcc to compile. I also ran gcc libavcodec/vvc_itx_1d.c and got multiple undefined reference errors to dct8,dst7, and ff_vvc_lfnst_8x8

did you mean to say dct2 or dst2? cause I used git grep -e "dst2" and there are a lot of references but none that I could make sense of regarding this task. and if it is indeed dct2 will the implementation of dct8 and dst7 will be the same as dct2 or a bit different if you could please tell me what to look for in the reference then that would be great.

Thanks

nuomi2021 commented 1 year ago

and all tests passed except these 6

Is it related to WSL? Could you help setup WSL ci(https://github.com/ffvvc/FFmpeg/issues/40)?

I also ran gcc libavcodec/vvc_itx_1d.c

Please use gdb ./ffmpeg_g to debug. you can break at https://github.com/ffvvc/FFmpeg/blob/main/libavcodec/vvc_ctu.c#L1332 to see what happened

did you mean to say dct2 or dst2?

There are 3 transform types in vvc: dct2, dct8, and dst 7. We do vector multiply for dct8 and dst7. Current we will multiple all inputs. But due to nature of codec, the input will be zero after max_scan_x + 1 and max_scan_y + 1, we do not need do multiple for zeros

DataCrusade1999 commented 1 year ago

Is it related to WSL? Could you help setup WSL ci(https://github.com/ffvvc/FFmpeg/issues/40)?

I'm not sure about that. Thanks for the response I'll use gdb now to debug and lookup some more info about transform types in vvc. Thanks again

hamzah-mujawar commented 8 months ago

Hi,

Can I work on this issue, is anyone still working on this?

Thanks, Hamzah Mujawar

nuomi2021 commented 8 months ago

@hamzah-mujawar , yes you can work on this.

https://github.com/ffvvc/FFmpeg/issues/179 is related to this, please consider it too. thank you

hamzah-mujawar commented 8 months ago

Progress Update:

After building ffmpeg, I've been able to debug and set a breakpoint on line number: 88 in vvc_itx_1d.c: image I tried to debug this using gdb here are my steps: 1) Ran gdb ./ffmpeg_g 2) Set a breakpoint using vvc_itx_1d.c on line 88 3) Ran ffmpeg using the run command I have attached a screenshot below of these steps: image

I would like to ask, what prompt(format: ffmpeg [options] [[infile options] -i infile]... {[outfile options] outfile) can I use to check the program so I can see what is done for dct2?

Also, am I approaching this the right way?

Thanking you for your time, Hamzah Mujawar

nuomi2021 commented 8 months ago

@hamzah-mujawar,you can check our workflow https://github.com/ffvvc/FFmpeg/blob/main/.github/workflows/makefile.yml it will download test clips and test it. most clips has dct2, you can choose a small one

hamzah-mujawar commented 8 months ago

Progress Update:

I finally am able to hit breakpoints on vvc_itx_1d.c :). image

I wasn't able to run the workflow, however I downloaded the tests repository, and ran the python3 script. I am now running this in gdb: run -i ~/tests/conformance/passed/v1/8b400_A_Bytedance_2.bit -vsync 0 -f md5 - in gdb.

I will continue down this rabbit hole and report back.

Thanking you for your time, Hamzah Mujawar

hamzah-mujawar commented 8 months ago

Hi,

Could you please point me to the line of code that was shown here in this commit: https://github.com/ffvvc/FFmpeg/blob/main/libavcodec/vvc_ctu.c#L1332

Thanking you, Hamzah Mujawar

frankplow commented 8 months ago

Could you please point me to the line of code that was shown here in this commit: https://github.com/ffvvc/FFmpeg/blob/main/libavcodec/vvc_ctu.c#L1332

This file was moved to libavcodec/vvc/vvc_ctu.c: https://github.com/ffvvc/FFmpeg/blob/e81b6d78fc2ddf8edd53a6a052713354ef8d27c2/libavcodec/vvc/vvc_ctu.c#L1332

hamzah-mujawar commented 8 months ago

Hi,

I've created a pdf document going over the findings and potential changes, and have attached it to this message.

Now that I am able to access tb->max_scan_x, I would appreciate some pointers on how I can use it to optimise matrix_mul.

issue_20.pdf

Thanking you, Hamzah Mujawar

nuomi2021 commented 8 months ago

Hi @hamzah-mujawar , Good progress. We can only give you some hints on this. please:

  1. find which clip will use dct8 or dst7 (Hint: you can run our conformances test on local, and pick the smallest one)
  2. break down at matrix_mul using gdb.
  3. at the same time break down at VTM, forwardMatrixMult function compare the value in 2 and 3, see what is counterparts for "reducedLine" and "numInLines"

thank you.

hamzah-mujawar commented 8 months ago

Hi,

I've been able to do 1 and 2, I have compiled the VTM software. I've tried multiple approaches on trying to set a breakpoint on forwardMatrixMult function and even just printing out the values of reducedLine and numInLines, but I am not able to get any results.

I will try again tomorrow.

Thanking you, Hamzah Mujawar

nuomi2021 commented 8 months ago

set a breakpoint on forwardMatrixMult function and even just printing out the values of reducedLine and numInLines, but I am not able to get any results. you can run the same clip as 2, and add an assert(0) in forwardMatrixMult or his parent function.

hamzah-mujawar commented 8 months ago

Hi,

Sorry to trouble you again.

I'm really confused as I'm not able to hit the assert either, I have built it and ran all the apps show here: image

I'm running this command: ./BitstreamExtractorAppStaticd -b ~/tests/conformance/passed/v1/8b400_A_Bytedance_2.bit.

I've checked the stack trace to confirm dst7 is being used on 8b400_A_Bytedance_2.bit: image

I've read the documentation in /docs however I haven't found a solution.

Thanking you for your time, Hamzah Mujawar

nuomi2021 commented 8 months ago

please run it with "DecoderAppStaticd -b 8b400_A_Bytedance_2.bit --SIMD=SCALAR" you may need to define JVET_M0497_MATRIX_MULT to 1 and set a breakpoint at fastInverseDST7_B32

nuomi2021 commented 7 months ago

implemented https://github.com/ffvvc/FFmpeg/blob/e81b6d78fc2ddf8edd53a6a052713354ef8d27c2/libavcodec/vvc/vvc_itx_1d.c#L662

thank you all for help on this