Wildly increasing memory consumption - video cache auto-tune goes mad at specific access patterns

pinterf commented 6 months ago

As nicely reported on doom9 https://forum.doom9.org/showthread.php?p=1995403#post1995403.

Script:

ColorBarsHD().KillAudio()
Spline36Resize(3840, 2160)
Spline36Resize(1920, 1080)
Spline36Resize(3840, 2160)
Spline36Resize(1920, 1080)
Spline36Resize(3840, 2160)
Spline36Resize(1920, 1080)
Spline36Resize(3840, 2160)
Spline36Resize(1920, 1080)
Prefetch(4)

Open Avspmod (MPC-HC is good as well) (you can open Task Manager process/memory page)
Press play and let the video play for a bit (~20-25 frames)
Press pause
Framestep backwards

We can notice a sudden increase in memory consumption at the ~10th backstep and for each following backstep.

(The many occurances of Spline36Resize just help us to exaggarate the effect)

The problem is probably similar to Issue #270 where a specific access pattern like 0, 0, 0, 1, 1, 2, 1, 3, 2, 4, 2, 5, 3, 6, 3, 7, 4, 8, causes similar effect, see https://github.com/AviSynth/AviSynthPlus/issues/270#issuecomment-1050587074

In this issue the access pattern is 1-2-3-4-5-6-...24-25-26- 25-24-23-22-21-20-19...7-6-5

pinterf commented 6 months ago

As a workaround you can use these lines at the beginning of the script.

#SetCacheMode(0) #  Run until frame 40, then step back 10 times in avspmod, 11th and on back step increases 200MB cache space 
SetCacheMode(1) #no problem
.. script follows

flossy83 commented 4 months ago

Hi, just wondering if there had been any progress on this issue? Are you still confident it's fixable or is it more of a "basket case" problem?

I use QTGMC with multithreading quite a lot, mainly for realtime DVD viewing as it cleans up the image so nicely, and that loads up the CPU on seek, which in turn exacerbates the issue. SetCacheMode(1) is completely incompatible with seeking on my systems so I can't use that.

pinterf commented 4 months ago

No real progress, I'm just trying to understand how the so called ghost cache entries work, and put debugging and logging helper code here and there. Even if I were to deal with this daily, it would still take several weeks to complete, I guess. Nevertheless the issue is a challenge, I think it's fixable.

pylorak commented 4 months ago

I'm just trying to understand how the so called ghost cache entries work [...]

Maybe I can help with that. The ghost entries are what allow the cache to be adaptive. The basic idea is that a ghost entry is somewhat like a normal cache entry except without the actual data (the frame), and they stay in the cache a little bit longer. Ghost entries are cheap memory-wise as they take up almost no space.

When a frame is requested and it is not in the cache anymore but its ghost still is, it means we have recently used that frame but it didn't live long enough in the cache. So next time we make sure that it stays alive longer before being evicted from the cache. This way, a frame whose ghost is never requested stays in the cache only for a short time (which avoids unnecessary memory consumption), but a frame with many requests to its ghosts stays in the cache progressively longer and longer, until its lifetime doesn't need to be extended anymore.

At least that was the original idea years ago. Once you get the idea it is pretty simple actually. The complex part of the cache is dealing with all this in a thread-safe way.

flossy83 commented 4 months ago

I was thinking maybe it's possible to do a bodge solution in the meantime, like simply detecting when the auto-tune went mad on seek and resetting the process's memory usage, which goes something like this in the Windows API...

# get handle to the process running Avisynth.dll
handleToProcess = GetCurrentProcess()

# remove as many pages as possible from its working set memory
SetProcessWorkingSetSize(handleToProcess, -1, -1)

# delete the handle
CloseHandle(handleToProcess)

https://learn.microsoft.com/en-us/windows/win32/api/memoryapi/nf-memoryapi-setprocessworkingsetsize

This is obviously "bad practice" as a long term default solution, but as a short term nondefault option it might be preferable to hitting the SetMemoryMax() size and getting slowdowns (I'm currently compensating for the slowdown by giving QTGMC an extra thread or two, it works okay but sometimes crashes when I alt-tab with maxed out memory usage)

edit: if I recall correctly you did something with SoxFilter 2.2 to make it reinitialise on seek to make it compatible with realtime seeking, so maybe the memory reintialisation could be done on seek only, and only if the issue occured, so that SetProcessWorkingSetSize() would only rarely be called on

pinterf commented 4 months ago

Hi pylorak, thanks for the explanation.

The problem is that the size of the main cache is always incremented by one in specific scenarios, since the item is found among the ghost entries, the value of "ghosted" in this case is always 1. (which is >0)

https://github.com/AviSynth/AviSynthPlus/blob/master/avs_core/core/LruCache.h#L232

pylorak commented 4 months ago

What happens is that when old frames are already ghosted but not yet removed from the ghost entries, the user begins to backstep in the video and thus the cache hits the ghost entries again, thereby causing the cache to grow.

I think the root cause of the problem here is that the cache does not know that the video step direction has changed. From the cache's point of view, hitting a ghost entry because it is the filter's regular access pattern, or hitting it because the user re-requested an earlier frame looks exactly the same ("earlier frame" here means not a frame with a lower frame number, but a frame that the user has already viewed recently - the problem is not going backwards, the problem is changing the direction),

My proposed solution is to clear the ghost list of all caches whenever the user changes step direction.

flossy83 commented 3 months ago

In the meantime is it possible to give us an Avisynth internal function which we can call inside our scripts to manually clear the ghost entries in the cache? Then maybe I could call it when the user seeks (detecting the seek inside ScriptClip, so I would need to be able to call it inside a ScriptClip).

I have tried outputting BlankClip() for a few seconds on seek to try and unload the CPU and it seems to somewhat reduce the chance of getting a cache frenzy when seeking +/- 10 seconds, but doesn't help with the 1 frame backwards seeks. Doing a +/- 10 second seek is common during realtime screening so it's better than nothing.

flossy83 commented 3 months ago

Actually I don't think that would work reliably because current_frame inside ScriptClip is often not in sync with what Avisynth is processing internally. Only Avisynth would know for sure when the frame order changed due to user seek. That's probably why my BlankClip() workaround only works some of the time.

pinterf commented 3 months ago

Meanwhile I did some tests but could not get a real achievement on the topic, but put some extra logging (frame requests, internal pattern direction recognition) in Avisynth. It turned out that AvsPMod frame requests are a bit weird - don't know the reason -, it seems that frames are requested multiple times when doing single stepping one by one.

E.g. this pattern (manual steps): 1, 1, 2, 2, 2, 3, 4, 4, 5, 5, 6, 6 (I then jump forward a bit and reversed the direction), 55, 55, 55, 54, 54, 54, 53, 53, 53.

Anyway, I'd expect a single step - single frame request pattern. If this pattern is confusing Avisynth's internal "pattern lock" prediction or not, don't know yet. My progress was stopped here two weeks ago, could not continue the debugging since then. Also did some experimental hacks on clearing the ghosts, but it relies on recognizing the change of the pattern (frame request orders) direction.

flossy83 commented 3 months ago

Also did some experimental hacks on clearing the ghosts, but it relies on recognizing the change of the pattern (frame request orders) direction.

Yes, I think that's what pylorak is suggesting too, and is what I was trying to do inside ScriptClip with something like:

if ( current_frame > previous_frame + seek_thresh
\ || current_frame < previous_frame - seek_thresh ){ 
     return BlankClip()  # in lieu of clearing cache ghosts 
}

But current_frame is not accurate so it doesn't work reliably. I reckon if current_frame was accurate then it may work, but then again I don't know how Avisynth works internally whether that would muck other things up. I'm guessing it would probably at least make a huge delay when seeking which may not be good either.

As this issue only affects seeking which is only a concern when using Avisynth for realtime live playback, maybe it's worth having a third cache mode the user can select like

0 = CACHE_FAST_START (default) 1 = CACHE_OPTIMAL_SIZE 2 = CACHE_REALTIME ?

pinterf commented 3 months ago

Frame order prediction does not work per-clip, it serves the prefetch mechanism (steps and proper direction) and acts at the very origin of the frame requests.

flossy83 commented 2 months ago

There's already this function Preroll which "works by detecting any out of order access in the audio or video track, and seeking the specified amount earlier in the stream and then taking a contiguous run up to the desired frame". Maybe a solution could be implemented in there?

I currently use Preroll on my ScriptClips as it seems to help them process frames in linear order (helps keep current_frame==previous_frame+1 inside the ScriptClip body).

gispos commented 2 months ago

Hello pinterf, AvsPmod requests the current frame exactly 2 times, once for the source clip and once for the display clip. The display clip is derived from the source clip with 'Eval'. 1.) there is no other way (Display, Pixel Value, DisplayFilter etc.) 2.) It has always been like this. 3.) It makes almost no difference to the speed (tested by myself).

The Prefetch(1,1) that you noticed is an option and can be switched off under Video > Display > 'Prefetch RGB Display conversion'.

What I have forgotten: If the D3D window is also used for the display, then there can also be 3 frame calls. The D3D window uses its own YUV420P8 clip.

pinterf commented 2 months ago

Thank you for the clarification, I just didn't understand why there are multiple calls instead of a one-by-one plus or minus pattern. Of course on my side, inside a Prefetch object they are consistent, but now it's easier to debug it if I watch only one of them.

pinterf commented 2 months ago

Test build, x64, no commit yet, I'd just like to see how it is working on your side in your usual workflow. Crossposted to #389 See readme txt. https://drive.google.com/uc?export=download&id=1IznUhi6-7o8bRJoGHQsF6zAaBWJkKeNg

flossy83 commented 2 months ago

Test build, x64, no commit yet, I'd just like to see how it is working on your side in your usual workflow. Crossposted to #389 See readme txt. https://drive.google.com/uc?export=download&id=1IznUhi6-7o8bRJoGHQsF6zAaBWJkKeNg

I can report the issue is 99% resolved on my system, but only for the synthetic test in OP, and only in AvsPmod, and there are still access patterns in AvsPmod that make it blow out, eg.

Press play and let it play for 200 frames, then pause.
Hold left arrow for 2 seconds (where left arrow = framestep backwards)
Hold right arrow for 3 seconds (where right arrow = framestep forwards)
Hold left arrow for 10 seconds

Result:

Playing it in MPC-HC with LAV as the source filter which "decodes" .avs files (i.e the DirectShow filter which requests frames from Avisynth.dll) is not showing any improvement on my systems in any of the tests vs the public release.

flossy83 commented 2 months ago

But I just wanted to say you've obviously improved the result a lot in the AvsPmod test so it's still a big step in the right direction. It feels like you're close to solving the issue.

pinterf commented 2 months ago

I'd like to see the MPC-HC case as well. Are reproduction steps the same as with AvsPmod? Is there any special setting in MPC-HC (sorry, it's so rare that I have to use them that I forgot about their specialities in a year).

flossy83 commented 2 months ago

I'd like to see the MPC-HC case as well. Are reproduction steps the same as with AvsPmod? Is there any special setting in MPC-HC (sorry, it's so rare that I have to use them that I forgot about their specialities in a year).

Yep the steps to reproduce it are the same as with AvsPmod.

In MPC go Options->Keys to set the framestep keys.

To configure it to use LAV for opening .avs files go Options->Internal Filters and tick "Avisynth". Check also External Filters list is empty.

When playing an .avs file you should have LAV Splitter and LAV Video Decoder icons in the system tray. If they're not in the tray, right click the window body->filters->copy filter list to clipboard, it should contain LAV Video Decoder and LAV Splitter as the active filters.

If you untick that previously mentioned checkbox then it will use something other than LAV, probably Microsoft's default AVI filter which in my experience has issues and should be avoided.

I've installed K-Lite Codec Pack "Full" so if you have issues getting LAV working I'd try that, it's still actively receiving updates.

pinterf commented 2 months ago

MPC-HC is always requesting the next 28 frames when you single-step. Regardless if you step forward or backward. The pattern is: 0, 1, ... 198 (started the avs script then I stopped it at 198) MPC-HC requests further frames: 199, 200, ... 226 I then press single-step backward to have the frame No.197 MPC-HC requests 197, and further on 198, 199, ... 225 I then press single-step backward to have the frame No.196 MPC-HC requests 196, and further on 197, 198, ... 224

etc...

Actually, there is always just a single miss in the frame request order (226->197, 225->196, 224->195, ...). As there is only one pattern miss (delta = -29) on each 28 good pattern (+1), Avisynth frame order pattern detection keeps thinking that the direction is +1 (forward)

MPC-HC (or whatever component) is probably doing this on purpose.

Next: why do we have such memory growth (probably video cache increase) at such a huge (-29) backsteps.

gispos commented 2 months ago

That's not normal. How can a script with heavy filters such as Denoiser be played? With every frame step 28 frames have to be processed? No wonder that the memory requirement then shoots up. The only reason would be the audio cache?

By the way, I have not noticed any differences in the memory requirements, I have jumped back and forth with and without the resize filter... nope, no difference for me.

@flossy83 It should also be noted that Windows itself performs memory management, and you should not take the memory requirement in the Process Manager literally, whether this is the actual memory used by the program or the 'Windows' allocated memory...

I had already displayed the actual memory used in one of my programs and it was significantly lower than that in the Process Manager. Windows only cleans up the allocated memory when it runs out or the program is closed.

To underline this: I had opened the script several times with AvsPmod restarted. Once I did not get the required memory above 2700 MB and the next time it was 3900 MB... that's what Windows had decided.

flossy83 commented 2 months ago

you should not take the memory requirement in the Process Manager literally

I've also tried Sysinternals Process Explorer and that shows the same issue. Plus, SetMemoryMax() defaults to 25% of total memory, in my case 4GB, and when it hits 4GB I start getting issues like slowdowns (lower fps) and desync inside my ScriptClips (current_frame != previous_frame+1, which messes with my metric calculations). Currently I'm using SetMemoryMax(8000) and being frugal with my seeking when watching DVDs. The issue only seems to occur when CPU is loaded. With a light CPU load the issue is almost nonexistent in my experience.

AviSynth / AviSynthPlus

Wildly increasing memory consumption - video cache auto-tune goes mad at specific access patterns #379