Allowing CTU size to be smaller than 64

fraunhoferhhi / vvenc

VVenC, the Fraunhofer Versatile Video Encoder

https://www.hhi.fraunhofer.de/en/departments/vca/technologies-and-solutions/h266-vvc.html

BSD 3-Clause Clear License

957 stars 172 forks source link

Allowing CTU size to be smaller than 64 #196

Closed ZenKiyoshi closed 2 years ago

ZenKiyoshi commented 2 years ago

x265 max CTU size is 64x64, with my experience, this size only benefits for 4K.

Smaller CTU may help in preserving fine details in static scenes with a cost that is losing a bit of compress.

In early date of x265, comparisons between x265 and x264 at SD resolution show that x265 has struggled to presever detail than x264. It could relate to x264 macro block size is just 16x16.

x265 at the moment allows CTU to be configured with different size: 64 | 32 | 16 I change CTU size differently base on different resolutions. 64 | 32 for 1080p 32 for 720p 16 for SD resolution (480p, 576p) In x265, if I use CTU size smaller than 32, I lost some features but I only use that small CTU at SD resolution.

Smaller CTU also helps in parallel.

VVenC currently only allows CTU size 128 | 64. 128 is default at medium preset, it may be too big size.

** I don't know how exactly VVenC processing, I may be wrong, anyway I put my note here.

adamjw24 commented 2 years ago

Our choice to set one or the other CTU size is mostly motivated in Pareto-Set optimization of the option space (see Brandenburg et al in MMSP'21), for HD and UHD natural content, only with the side condition of not being totally broken for SD/SCC etc. Your use cases might differ, but they should still work fairly good tho.

VVC and VVenC support CTU sizes of 32x32, 64x64 and 128x128. The larger the CTU size, the larger the search space, thus more possible compression and slower runtimes. Thats why 128x128 is only enabled for medium...slower. You can try setting --CTUSize=X in the FFapp or --additional="CTUSize=X" for the simple app to do your own tests.

ZenKiyoshi commented 2 years ago

I made some test.

vspipe vs00.vpy --y4m - | vvencapp -i - --y4m --preset medium --qp 22 --qpa 1 --additional="CTUSize=32" -o "VVenC_medium_qpa1_qp22_ctu32.266"

Result: VVenC-CTU

Images: https://drive.google.com/file/d/1V-qOolPePa04TZSOvTmGJZdhjLirKfaE/view?usp=sharing

CTU 32 is fastest and highest bitrate, CTU 128 is slowest and bitrate is sligthly higher CTU 64. Image quality between CTU size are almost same in static scenes. In motion scenes where QPA affects (at bottom right of image), CTU 32 is slightly gain details a bit, but very very little.

But they are all blurry comparing with x265 as I mention at https://github.com/fraunhoferhhi/vvenc/discussions/137#discussioncomment-3533176 even at static part.

adamjw24 commented 2 years ago

Interesting. CTU32 is faster because of smaller search space, but for your command also because of the improved multi-threading, this explains the very high differences. To have a proper comparison you'd also need to compare the resulting quality.

To reduce blurriness I'd recommend trying one of: --ALF=0, --MCTF=0. It'll significantly increase the bitrate, so you might also just want to try to decrease QP/increase target bitrate.