AmusementClub / vapoursynth-classic

A production video processing framework with simplicity and backward compatibility in mind. We strive to keep 99% R54 API compatibility at the vpy script level while providing better performance & stability than the upstream. API4 compatibility is preserved whenever possible as well.
https://amusementclub.github.io/doc/
GNU Lesser General Public License v3.0
37 stars 4 forks source link

ChangeFPS/SelectEvery plummets the speed when outputting at a lower framerate #7

Closed couleurm closed 1 year ago

couleurm commented 1 year ago

I am using AverageFrames with a video that has FPS in the hundreds, here's a 280FPS sample

When I use SelectEvery to lower the output framerate, rendering speed shits itself (500FPS -> 2), this doesn't happen on R54

You can try with the sample video and commenting/uncommenting the line in this script:

from vapoursynth import core
from havsfunc import ChangeFPS

clip = core.lsmas.LWLibavSource(source=r"D:\Video Vault\cian.mp4", format="YUV420P8", cache=1, prefer_hw=1)

clip = core.std.AverageFrames(clip, weights=([1]*5))

# clip = ChangeFPS(clip=clip, fpsnum=60, fpsden=1)
# clip[::4].set_output()
clip.set_output()

Using a BlankClip does not tank the speed, using a clip with only I-frames (e.g encoded in a lossless codec) makes it tank less (~20FPS from my single test)

AkarinVS commented 1 year ago

Thanks for the report. This is indeed an interesting issue.

Confirmed the 100x slowdown between these two scripts: fast:

from vapoursynth import core
clip = core.lsmas.LWLibavSource(source=a, format="YUV420P8", cache=1)
clip = core.std.AverageFrames(clip, weights=([1]*5))
clip.set_output()

slow:

from vapoursynth import core
clip = core.lsmas.LWLibavSource(source=a, format="YUV420P8", cache=1)
clip = core.std.AverageFrames(clip, weights=([1]*5))
clip[::4].set_output()

Profiling reveals that lsmas spent most of the time on the 2nd script.

I think it's because vs api4 changed the way it caches frames from source filter. The combination of AverageFrame and SelectEvery changes the request pattern in a way that makes the cache miss almost every single time.

For example, if I change the hardcoded 20 to 100 in this line: https://github.com/AmusementClub/vapoursynth-classic/blob/8fb4730129d1dfe0f514b8b9b0c57dff8f52abd4/src/core/vscore.cpp#L1186 the slowdown is reduced to 4x, similar to R54.

Will need to think about the root cause more.

AkarinVS commented 1 year ago

I've created a workaround for this issue. Please try this build https://github.com/AmusementClub/vapoursynth-classic/actions/runs/3451033013

download the release zip file and replace your vapoursynth.dll with the one in the zip.

It's a safe change, but its performance implications are not well understood at this time, and more benchmarks are needed. You're welcomed to benchmark your other scripts as well and please report back the results.

Thanks.

couleurm commented 1 year ago

You're welcomed to benchmark your other scripts as well and please report back the results.

Thank you so much! It is indeed working (faster than R54!)

if it ends up being unstable for other specific usecases, please make it optional (if that can be done after VS loads) for mine with something like core.std.needsSort(False)

AkarinVS commented 1 year ago

Testing didn't show any noticeable performance regressions, so I will keep the workaround and released https://github.com/AmusementClub/vapoursynth-classic/releases/tag/R57.A6.

Thanks for the testing.