dubhater / vapoursynth-mvtools

Motion compensation and stuff
181 stars 27 forks source link

Refactor vector clips to not be based on super clip - should be a significant memory/cache use reduction #76

Closed adworacz closed 1 month ago

adworacz commented 1 month ago

Leaving this as a note to myself.

While working on the APIv4 upgrade, I noticed that vector clips are actually copies of the incoming super clip with some extra properties (the actual vector data) attached.

This needs a second pass of investigation but my first pass showed that the frame/plane data in the vector clip is ignored by all filters and only the properties are used. This makes sense and is what I'd expect.

Where the problem lies is that the super clip (and thus clips generated from the super clip) is often huge by comparison to the incoming video, as the super clip intentionally stores multiple layers of resized images for each frame.

This means that all of our vector clips are actually copies of huge amounts of frame data, but the actual frame data is ignored.

This is likely polluting/expanding Vapoursynth's frame cache unnecessarily.

Once I finish the APIv4 rewrite (I'm about 50% done), then I'll pivot to look at shrinking these vector clips down to something like a zero-sized clip, or maybe just 1x1 sized clip (not sure if VS allows zero-sized clips).

It will be interesting to test the before and after of this change, at least from a memory usage standpoint, but possibly also from a raw performance standpoint.

Stefan-Olt commented 1 month ago

Good catch, that sounds very interesting!

I'm wondering if I should make a release right now as a last APIv3 release and a another release in a few weeks (?) with APIv4 and this optimization?

adworacz commented 1 month ago

Hmmm, fair question. I'd kind of considered that we'd roll everything up into one big release with the APIv4 and maybe this change, but I'm not opposed to a quicker release now if you'd like to get something out sooner rather than later.

It would mostly be my Degrain 3-6 support, LTO optimization in the Meson build, some minor optimizations and your ARM changes that would go into a new release as of now.

I'd wager that I should be done with APIv4 in 2 weeks (I'm doing it in the few spare minutes I have from day to day, but each filter takes less than 30 minutes each), just for reference. I'm not sure on the vector clip changes, but they look much smaller than the APIv4 changes as of right now.

Stefan-Olt commented 1 month ago

Yes, a current release would include that and your fix. It's only one part timing, but I'm not sure if there are still users relying on the old API for some reason, to have one last release for those. What's putting me off, I have no idea how the Windows binaries are build.

Take your time for the changes, in my opinion it's better to do it properly than quick & dirty.

adworacz commented 1 month ago

Yeah, Windows builds are something I haven't figured out just yet either. We could take a cross-compilation approach, which I know Meson supports but it would take some research.

There's also just a native Meson install on Windows, which I think requires a mingw installation as well.

Last but not least, there's some potential inspiration we can draw from other VS plugins that use automated github builds for their Windows compilation.

Here's an example: https://github.com/sgt0/vs-placebo/blob/dev/.github/workflows/build.yml#L84-L131

Stefan-Olt commented 1 month ago

Using a Github runner to build it is a great idea! Have not done that for Windows yet, but a good example should help with that.

As you know I'm also working on a project to build all plugins for Linux / macOS using GitHub runners and distribute them with vsrepo

adworacz commented 1 month ago

Yup, a Github runner might be the magic here. I still need to set one up for my own plugins, but they're proving more and more useful, especially in the cross-platform space.

dubhater commented 1 month ago

I was cross-compiling them from Arch Linux, using mingw modified with this script:

#!/bin/sh

# Based on the script from https://wiki.videolan.org/Win32Compile/#Static_compilation_of_plugins
# which doesn't quite work for me (it only found the winpthread files).

packages="mingw-w64-gcc mingw-w64-winpthreads"

pacman -Q $packages || exit 1

for i in $(pacman -Qlq $packages | grep \
    -e 'libstdc++.dll.a' \
    -e 'libstdc++-6.dll' \
    -e 'libgcc_s.a' \
    -e 'libgcc_s_sjlj-1.dll' \
    -e 'libpthread.dll.a' \
    -e 'libwinpthread.dll.a' \
    -e 'libwinpthread-1.dll'); do
    rm -i "$i"
done

for i in i686 x86_64; do
    libgcc_eh=$(pacman -Qlq $packages | grep "$i.*libgcc_eh.a")
    ln -sfv "$libgcc_eh" "${libgcc_eh/_eh/_s}"
    pthread_h=$(pacman -Qlq $packages | grep "$i.*include/pthread.h")
    echo "Fixing header '$pthread_h'."
    sed -i "s/DLL_EXPORT/&zzzzzzz/" "$pthread_h"
done

Not sure if it's still needed/it still works. The script is 6 years old.

Stefan-Olt commented 1 month ago

Thanks, but I got the compilation with msys2 on my virtual Windows machine working at the first try using the instructions from the vs-placebo plugin. A GitHub runner should be no problem. I'll now have to check if the binary works or if there are any crashes.

adworacz commented 1 month ago

So I completed the refactor, but in my additional testing, I noticed no difference in memory usage.

After looking a little closer, I think I know why - Vapoursynth handles frame copies by reference, not by value. This means that copying a frame from one clip to another is essentially "free", with the underlying data only residing in one place. Only a little bit of bookkeeping/metadata is actually duplicated.

Reference:

So I'm going to resolve this issue for now. It appears that we aren't actually suffering any major performance hit by copying frames from the super clip.

Stefan-Olt commented 1 month ago

Ah I see, that makes sense. But thank you very much for investigating!