fraunhoferhhi / vvdec

VVdeC, the Fraunhofer Versatile Video Decoder
https://www.hhi.fraunhofer.de/en/departments/vca/technologies-and-solutions/h266-vvc.html
BSD 3-Clause Clear License
454 stars 91 forks source link

Reuse pred buf to reduce memory #70

Closed xrayleigh2000 closed 2 years ago

xrayleigh2000 commented 2 years ago

Signed-off-by: xrayleigh2000 xrayleigh2000@gmail.com

adamjw24 commented 2 years ago

Thank you for looking into this.

Did you have a look at the performance? Especially the INTER task is great for parallelization, and we used to rely very much on it (we didn't reevaluate the architecture in quite some time tho, and a lot changed since).

Overall this would save around 60MB for UHD decoding indepdent of threading, right? (two full frames of Pel)

Because of some upcoming dates (CES/vacation) we will only be able to properly have a look at this towards the end of January, so please be patient. If you could in the meantime provide some additional data like memory savings, performance impact, it would be highly appreciated.

Also, the CI seems broken, please have a look.

K-os commented 2 years ago

Hi @xrayleigh2000, thanks for the PR.

I had a quick look into it and tested it on my machine. As @adamjw24 expected, it is much slower than the original implementation, because it significantly reduces parallelism. On my machine running with 10 to 20 threads decoding FullHD content was 1.3 to 1.4 times slower.

xrayleigh2000 commented 2 years ago

Hi @adamjw24 @K-os, Thanks for your reply. Under the current architecture, pred buf is difficult to modify to CU-level size. I am going to close this pull request. For subsequent optimization, I will submit new pull requests and provide more test data.