Closed couleurm closed 1 year ago
Thanks for the report. This is indeed an interesting issue.
Confirmed the 100x slowdown between these two scripts: fast:
from vapoursynth import core
clip = core.lsmas.LWLibavSource(source=a, format="YUV420P8", cache=1)
clip = core.std.AverageFrames(clip, weights=([1]*5))
clip.set_output()
slow:
from vapoursynth import core
clip = core.lsmas.LWLibavSource(source=a, format="YUV420P8", cache=1)
clip = core.std.AverageFrames(clip, weights=([1]*5))
clip[::4].set_output()
Profiling reveals that lsmas spent most of the time on the 2nd script.
I think it's because vs api4 changed the way it caches frames from source filter. The combination of AverageFrame
and SelectEvery
changes the request pattern in a way that makes the cache miss almost every single time.
For example, if I change the hardcoded 20 to 100 in this line: https://github.com/AmusementClub/vapoursynth-classic/blob/8fb4730129d1dfe0f514b8b9b0c57dff8f52abd4/src/core/vscore.cpp#L1186 the slowdown is reduced to 4x, similar to R54.
Will need to think about the root cause more.
I've created a workaround for this issue. Please try this build https://github.com/AmusementClub/vapoursynth-classic/actions/runs/3451033013
download the release zip file and replace your vapoursynth.dll with the one in the zip.
It's a safe change, but its performance implications are not well understood at this time, and more benchmarks are needed. You're welcomed to benchmark your other scripts as well and please report back the results.
Thanks.
You're welcomed to benchmark your other scripts as well and please report back the results.
Thank you so much! It is indeed working (faster than R54!)
if it ends up being unstable for other specific usecases, please make it optional (if that can be done after VS loads) for mine with something like core.std.needsSort(False)
Testing didn't show any noticeable performance regressions, so I will keep the workaround and released https://github.com/AmusementClub/vapoursynth-classic/releases/tag/R57.A6.
Thanks for the testing.
I am using AverageFrames with a video that has FPS in the hundreds, here's a 280FPS sample
When I use SelectEvery to lower the output framerate, rendering speed shits itself (500FPS -> 2), this doesn't happen on R54
You can try with the sample video and commenting/uncommenting the line in this script:
Using a BlankClip does not tank the speed, using a clip with only I-frames (e.g encoded in a lossless codec) makes it tank less (~20FPS from my single test)