Closed xrayleigh2000 closed 2 years ago
Thank you for looking into this.
Did you have a look at the performance? Especially the INTER task is great for parallelization, and we used to rely very much on it (we didn't reevaluate the architecture in quite some time tho, and a lot changed since).
Overall this would save around 60MB for UHD decoding indepdent of threading, right? (two full frames of Pel
)
Because of some upcoming dates (CES/vacation) we will only be able to properly have a look at this towards the end of January, so please be patient. If you could in the meantime provide some additional data like memory savings, performance impact, it would be highly appreciated.
Also, the CI seems broken, please have a look.
Hi @xrayleigh2000, thanks for the PR.
I had a quick look into it and tested it on my machine. As @adamjw24 expected, it is much slower than the original implementation, because it significantly reduces parallelism. On my machine running with 10 to 20 threads decoding FullHD content was 1.3 to 1.4 times slower.
Hi @adamjw24 @K-os, Thanks for your reply. Under the current architecture, pred buf is difficult to modify to CU-level size. I am going to close this pull request. For subsequent optimization, I will submit new pull requests and provide more test data.
Signed-off-by: xrayleigh2000 xrayleigh2000@gmail.com