dubhater / vapoursynth-mvtools

Motion compensation and stuff
181 stars 27 forks source link

Support for blksize=24? #56

Open joakimlemb opened 2 years ago

joakimlemb commented 2 years ago

Any chance to get support for blksize 24?

Like the windows fork has: https://github.com/pinterf/mvtools/blob/mvtools-pfmod/Sources/MVAnalyse.cpp#L161

resistorsytrus commented 5 months ago

If we can load avisynth 2.6 plugins, you can use it for supporting👍

adworacz commented 1 month ago

Can you explain the reason for needing blocksize 24? Or any of the other odd sized blocksizes like 3,6,12, etc?

joakimlemb commented 1 month ago

Can you explain the reason for needing blocksize 24? Or any of the other odd sized blocksizes like 3,6,12, etc?

Same reason as I would use blksize=32 vs blksize=16, speed vs quality. There is a big jump from 16 to 32, having the option to choose any blksize in the increments of 2 (4,6,8,10,12,etc...) rather than in the power of 2 (4,8,16,32,64) would give users more options to choose from when it comes to quality vs speed. Unless there is a valid reason why they need to be in the power of 2?

adworacz commented 1 month ago

Interesting, I can see your point for greater blocksize resolution/granularity. The power of 2 blocksizes are useful as they align with how encoders partition up frames (often with blocksizes of 4 or 8, but sometimes higher), and aligning to those blocksizes can allow for directly addressing artifacts generated by those encoders.

A good example of this is the blocking artifacts generated by MPEG2 encoders, often seen on DVDs.

Modern encoders often use a variety of different blocksizes, even within the same frame, so matching blocksizes becomes less possible.

Lastly, using mod 2 blocksizes better aligns with SIMD / Vectorization instructions, as SIMD lanes are pretty much always mod 2. Using odd numbered blocksizes leads to waste in SIMD lanes. This isn't a huge deal, but it can mean that the performance differences don't scale perfectly linearly with blocksize, as a partially filled SIMD lane and a fully filled SIMD lane operate in the same number of CPU cycles.

I'll keep this in mind for my future MVTools-related work.

joakimlemb commented 1 month ago

That makes sense, thanks for the detailed answer, I didn't consider the performance relation with SIMD optimization and block sizes, but having the data correctly aligned to match those CPU instructions makes sense with how the performance scales with the block sizes.