CrendKing / avisynth_filter

DirectShow filters that put AviSynth and VapourSynth into video playing
MIT License
108 stars 8 forks source link

"worker" issues #8

Closed chainikdn closed 4 years ago

chainikdn commented 4 years ago

Quickly change SVP settings several times, for example move the "quality/perf" slider ---> video player will hang.

It hangs at _avsScriptClip = invokeResult.AsClip();

    [External Code] 
    avisynth_filter_64.ax!PClip::operator=(const PClip & x) Line 1131   C++
>   avisynth_filter_64.ax!AvsFilter::CAviSynthFilter::ReloadAviSynthScript(const _AMMediaType & mediaType) Line 834 C++
    avisynth_filter_64.ax!AvsFilter::CAviSynthFilter::Receive(IMediaSample * pSample) Line 299  C++
    avisynth_filter_64.ax!CTransformInputPin::Receive(IMediaSample * pSample) Line 763  C++
    [External Code]

... which means it hangs at destruction of previous clip ... which means it's locked at _sourceFrameAvailCv

_avsScriptClip is still alive after FrameHandler::EndFlush() and GetSourceFrame() still can be called.

CrendKing commented 4 years ago

Thanks. Can you provide log?

chainikdn commented 4 years ago
T  24448 @    18732: Start processing output frame     72 at  101553125 ~  101758125 frameTime     205000 for source     36 Output queue size  1 Front     73 Back     73
T  19100 @    18732: Get source frame: frameNb     37 Input queue size  3 Front     35 Back     37
T   9828 @    18732: DBG: GetSourceFrame 38 <-- before srcLock(_sourceFramesMutex)
T   9828 @    18732: DBG: .... GetSourceFrame ... waiting for .... 38 <-- before _sourceFrameAvailCv.wait(srcLock);
T  10816 @    18734: DBG 1 <-- entering Receive with _reloadAvsSource=true
T  10816 @    18734: Frame handler begin flush
T  10816 @    18735: Frame handler wait for barriers
T   9828 @    18735: Drain for frame:     38
T  26572 @    18735: Output sample worker thread  26572 arrive at barrier
T  26500 @    18735: Input sample worker thread  26500 arrive at barrier
T  25772 @    18735: DBG: GetSourceFrame 39
T  25772 @    18735: Drain for frame:     39
<..>
T  24448 @    18736: Output sample worker thread  24448 arrive at barrier
T  20228 @    18737: DBG: GetSourceFrame 41
T  20228 @    18737: Drain for frame:     41
<...>
T  17912 @    18767: DBG: GetSourceFrame 50
T  17912 @    18767: Drain for frame:     50
T  23776 @    18919: Output sample worker thread  23776 arrive at barrier
T  10816 @    18919: Frame handler done synchronization
T  25108 @    18919: DBG: GetSourceFrame 51 <--- after EndFlush, _isFlushing=false
T  25108 @    18919: DBG: .... GetSourceFrame ... waiting for .... 51
T  10816 @    19343: DBG  <-- before _avsScriptClip = invokeResult.AsClip();
CrendKing commented 4 years ago

avisynth_filter_64.ax!AvsFilter::CAviSynthFilter::ReloadAviSynthScript(const _AMMediaType & mediaType) Line 834 C++

I don't find any recent commit that has _avsScriptClip = invokeResult.AsClip(); at line 834. But I get the problem. The issue is most likely in AviSynth. Every time we start a new script, if there is a prefetcher used in the script, avs env will try to destroy it first, which causes the hang. Previous we were using Avs 3.6+, which allows multiple prefetcher in the env, so we don't have to destroy the current one before restarting the script. But now we are on 3.5, we have to destroy. And all of these are not related to worker thread at all. I had headache with this since beginning. We could solve this by completely destroying avs env, but it will introduce performance penalty.

I could send you a version on top of Avs 3.6, could you test it for me?

chainikdn commented 4 years ago

But I get the problem. The issue is most likely in AviSynth. Every time we start a new script, if there is a prefetcher used in the script, avs env will try to destroy it first, which causes the hang.

mmm... nope. the problem is AVS clip is still requesting frames via GetSourceFrame() at the point where your condition variable _sourceFrameAvailCv will never trigger. you probaly just need to move _frameHandler.EndFlush() after the ReloadAviSynthScript() call

CrendKing commented 4 years ago

you probaly just need to move _frameHandler.EndFlush() after the ReloadAviSynthScript() call

You mean like https://github.com/CrendKing/avisynth_filter/commit/32d0262eec36daedaede26ae35c54c77f96e9295#diff-577e0566b5887b0ba83e859a83f0140eL109? I used to do it for draining in GetSourceFrame(), but removed once I no longer get frame from buffer. I've never seen this problem any more, but if it fix your problem I can add it back.

chainikdn commented 4 years ago

probably... the whole issue is _isFlushing is reverted to false before avs clip destruction something like

        _frameHandler.BeginFlush();
        ReloadAviSynthScript(m_pInput->CurrentMediaType());
        _frameHandler.EndFlush();

fixes this

CrendKing commented 4 years ago

Try https://github.com/CrendKing/avisynth_filter/commit/0ce598a9348854b272783255aaaa1e2a45e4bd38

BTW, I changed how log file is produced. It's no longer controlled by a sneaky preprocessor, but by a registry value "LogFile". If the value is not empty, it contains the file path. This way even a release build can generate log on demand.

CrendKing commented 4 years ago

Or even https://github.com/CrendKing/avisynth_filter/commit/f468764a4f11f819eb4d17f9b91fba7552c4f6dd. There is double flush in previous commit. Since most of the time BeginFlush and EndFlush comes together, why not consolidate them?

chainikdn commented 4 years ago

why "if (!_avsEnvAtLeastV8)" ? it still hangs with 3.6, obviously

plus now it crashes at video player closing :D because you call ReloadAviSynthScript() when _avsEnv is already null

CrendKing commented 4 years ago

After some thinking, I'm considering abandoning the "multiple worker threads" idea due to the serialized nature of how the frames work. Let me document my reasoning for future reference.

The general workflow of the transform filter is this:

  1. Upstream filter sends sample to our filter in an arbitrary rate. Normally frames arrive in fixed rate, but when the source is unpredictable (e.g. a network stream), or the OS resource is insufficient (e.g. CPU usage too high, lack of memory, etc.) quality change could happen.
  2. Upon receiving input sample, our filter could turn each into N output samples, where N could be any fractional number ranging from 0 to INFINITY. Each output sample should have consecutive frame times, i.e. outFrame[n].startTime === outFrame[n - 1].stopTime === outFrame[n - 1].startTime + outFrame[n - 1].duration. There should be no gap in between. The duration of each frame could vary for VFR (variable frame rate) videos.
  3. Once an output sample is ready, we deliver it to downstream. While DirectShow specification says

    When a renderer filter receives a sample, it schedules rendering based on the time stamp. If the sample arrives late, or has no time stamp, the filter renders the sample immediately. Otherwise, the filter waits until the sample's start time before it renders the sample."

most renderer can't deal with out-of-order delivery, regardless of how much ahead of time the sample is delivered. For example, EVR always renders samples as they arrive. EVR Sync drops out-of-order samples. And madVR just freezes.

Because of #2, all input threads must be synchronized to calculate the correct frame times. Because of #3, the delivery procedure must be synchronized across all output threads. And too much synchronization would mostly defeat the benefit of scalability of multiple worker threads design.

On the other hand, I'm still thinking having one input worker thread + one output worker thread + COutputQueue is beneficial, thanks to the flexibility to adapt to quality changes mentioned in #1.

EDIT: Look like madVR is not compatible with COutputQueue. It just freezes.

chainikdn commented 4 years ago

I'd say that input threads are not necessary at all, single threaded input in the main thread is ok. But the output frames calculations (i.e. "waiting for the Avisynth") and copying/interleaving must be done in a separate thread. Is it always enough to have one "output" threads, or some cases / system configurations will need several ones - I dunno, need more testing... You told you clearly see huge perf. improvement - when, where, how?


BTW, not related at all - could you add a non-ssse3 path for interleaving? a few people still have Athlon/Phenom CPUs... I'm obviously not talking about 4k playback, but at least it must not crash with "illegal instruction", so plain C implementation would be ok.

CrendKing commented 4 years ago

will need several ones

If say one input sample generates 7-8 output samples, and the transformation is resource intensive, having multiple output workers could scale, but again, they all have to synchronize on delivery. I could make the number of output threads configurable, and default to 1.

when, where, how?

If I play a 4K video, set SVP block size to 16 or even 8, having one thread (0.6.1) is guaranteed to be choppy. Having worker thread could deliver smooth playback. But this is extreme corner case. I don't think we should optimize the whole logic for extremity.

Athlon/Phenom CPUs

Seriously? How do they even use SVP efficiently? Can't you ask them to upgrade hardware? There are still around 1.5% Windows XP users out there. Should we also support Windows XP?

chainikdn commented 4 years ago

How do they even use SVP efficiently?

why not, phenom x6 used to be a good CPU, totally enough for 1080p...

Should we also support Windows XP?

this is different case. and no, we shouldn't :)

chainikdn commented 4 years ago

If I play a 4K video, set SVP block size to 16 or even 8, having one thread (0.6.1) is guaranteed to be choppy.

there may be another reason for this... the "master" branch doesn't wait in GetFrame() in case the requested frame is not here yet, it returns the last frame instead

CrendKing commented 4 years ago

Please try https://github.com/CrendKing/avisynth_filter/commit/c4f05028804e1c22b764bb96fa4cfac0e044e978.

there may be another reason for this... the "master" branch doesn't wait in GetFrame() in case the requested frame is not here yet, it returns the last frame instead

It is one of the reason why I started the worker thread design, but I know the difference of performance choppy and the "last frame" choppy.

could you add a non-ssse3 path for interleaving?

Please create separate ticket for tracking purpose.

chainikdn commented 4 years ago

but I know the difference of performance choppy and the "last frame" choppy

I just can't imagine how that 8-vs-16-blocks choppy could depend on threading in AVSF. It's weird...

CrendKing commented 4 years ago

Multi threading scales well for IO intensive tasks. It depends on many factors, including hardware, renderer and source. Say if the renderer already takes long time to receive samples, putting extra milliseconds on computing the flow vectors on the same thread would intensify the choppy symptom. On the other hand, if the computation is happening on a different thread, a blocking Deliver() thread would not impose much penalty since CPU is fully utilized.

If user's setup has very little latency on the IO side, then the task becomes CPU intensive. Multi threading usually would not help since it should already fully saturate CPU (assuming AVS/SVP themselves has enough threads to saturate).

chainikdn commented 4 years ago

putting extra milliseconds on computing the flow vectors on the same thread would intensify the choppy symptom

Wrong, Avisynth runs its own threads (many of them, SVP usually sets 25 threads), it doesn't connected with AVSF in any way except (!) for requesting input frames. And all those calculations are done in advance via Prefetcher.

CrendKing commented 4 years ago

Like I said, it depends on many factors. I have my own test scripts, which varies by having and having no prefetcher, varying thread count for AVS and SVP. If you are talking about the curated default script of SVP, then you could be right.

chainikdn commented 4 years ago

So, when you're talking about "huge improvements" you actually ran SVP's script without prefetcher? o_O

CrendKing commented 4 years ago

With prefetcher, but far fewer than 25 threads. I do test without prefetcher though.

chainikdn commented 4 years ago

So, to summarize what I personally want/need regarding AVSF performance. I just want to be sure that AVSF doesn't cap output performance at some "low" value like 100 fps for 4K/10-bit. There're a lot of possible bottlenecks in other places though, but lets make sure it's not AVSF's fault.

CrendKing commented 4 years ago

cap output performance at some "low" value like 100 fps for 4K/10-bit

Let's avoid vague statement like this. If you want particular performance watermark, then we need to develop repeatable and measurable benchmark. We need a fixed set of videos, fixed set of scripts, video player and upstream/downstream filters. Measure delivery FPS from AVSF and record it for each version. This way if there is degradation we can pinpoint the culprit. How's that sound?

chainikdn commented 4 years ago

__debugbreak triggers on every script reloading

we need to develop repeatable and measurable benchmark

a simple, non-SVP script, probably just resizing to a large value like 8k*8k, plus null renderer (??), plus a few perf. counters inside AVSF


tested MPC-HC + EVR-CP with maximum relaxed resizer settings - (GTX 2070, Ryzen 7) simple script, varying frame size and frame rate values:

SetFilterMTMode("DEFAULT_MT_MODE",2)
SetFilterMTMode("AvsFilterSource",3)
AvsFilterSource()
PointResize(4096,4096)
ChangeFps(240)
Prefetch(10)

so... single threaded "master" gives stable ~210 fps for 4k4k 8b frame. which is totally enough for now and in the next 10 years. hmmm. regarding "worker": it shows ~275 fps but with a huge "Avg sync offset" (between 100 and 900 ms), plus it hangs sometimes


much heavier script:

ChangeFps(240)
LanczosResize(4096,4096)

gives ~160 fps with both versions and 100% CPU load


overall, it looks like single-threaded AVSF is NOT a bottleneck at least until ~200 fps and it makes me feel a bit confusing... :D

z826540272 commented 4 years ago

__debugbreak triggers on every script reloading

we need to develop repeatable and measurable benchmark

a simple, non-SVP script, probably just resizing to a large value like 8k*8k, plus null renderer (??), plus a few perf. counters inside AVSF

tested MPC-HC + EVR-CP with maximum relaxed resizer settings - (GTX 2070, Ryzen 7) simple script, varying frame size and frame rate values:

SetFilterMTMode("DEFAULT_MT_MODE",2)
SetFilterMTMode("AvsFilterSource",3)
AvsFilterSource()
PointResize(4096,4096)
ChangeFps(240)
Prefetch(10)

so... single threaded "master" gives stable ~210 fps for 4k4k 8b frame. which is totally enough for now and in the next 10 years. hmmm. regarding "worker": it shows ~275 fps but with a huge "Avg sync offset" (between 100 and 900 ms), plus it hangs sometimes

much heavier script:

ChangeFps(240)
LanczosResize(4096,4096)

gives ~160 fps with both versions and 100% CPU load

overall, it looks like single-threaded AVSF is NOT a bottleneck at least until ~200 fps and it makes me feel a bit confusing... :D

SVP 4K has choppy playblack experience. AMD Fluid motion fine, I don't think AMF copies the frame around many times.

CrendKing commented 4 years ago

much heavier script:

SetFilterMTMode("DEFAULT_MT_MODE",2)
SetFilterMTMode("AvsFilterSource",3)
AvsFilterSource()
ChangeFps(240)
LanczosResize(4096,4096)
Prefetch(10)

gives ~160 fps with both versions and 100% CPU load

Hello. Sorry to get back to you a bit late. You actually provided a good example to test with. Here's my result.

Again, my CPU is Ryzen 7 3700X.

With the exact "heavy script",

All tests have no choppy audio, no out-of-sync audio, no hanging. Seeking works without any problem. Process frame number and output frame number are within reasonable difference. One thing to note that the debug version of the filter performs very poorly in this case.

Ironically I found that with the single thread version, the actual FPS is 248.3. The FPS is correctly 200 in worker thread version.

If you concern about concurrency performance, first the filter will default the thread count to 1 if no registry value overrides it, and second you can always explicitly set the thread count to 1.


I've tested the current worker thread version for about a month now, and haven't seen any problem. So I'm going to merge it to master, unless you find any bug or other concerns.

AMD Fluid motion fine, I don't think AMF copies the frame around many times.

AMF is different because everything happens within graphic card. AviSynth always requires CPU to process frames, so we have to copy frame buffer in and out of graphic memory.

chainikdn commented 4 years ago

So I'm going to merge it to master, unless you find any bug or other concerns.

actually you may notice I deployed 0.7.0 via SVP update two days ago ;)

CrendKing commented 4 years ago

Sorry I didn't until you just mentioned. I assume you built from the worker branch? Any bug report or feedback from your users? If everything works fine, I'm going to merge today.

chainikdn commented 4 years ago

https://www.svp-team.com/forum/viewtopic.php?pid=76862

I'm getting the error in the following screenshot when I close potplayer after playing a video with svp off. It will even give me this error if e.g. i turn off DXVA while a video is playing. Potplayer + madvr, updated to AVSF using update tool in SVP manager. Manually updating to Avisynth+ 3.6.1 did not help https://i.imgur.com/C8455LL.png EDIT: Found workaround, if anyone is having the same issue, simply use AVS+ 3.6 and downgrade the filter to v0.60

CrendKing commented 4 years ago

I assume it is the same as https://github.com/CrendKing/avisynth_filter/issues/9? And for 0.6.1, not worker branch right? I can't reproduce the problem, so it'd be nice if they can try the worker branch and set "LogFile" in registry to generate a log. In 0.6, log file needs to be toggled in source file.

chainikdn commented 4 years ago

ah, yeah, looks the same, sorry :D

there's possible another crash, with MediaPortal (32-bit only) + madVR + HDR video but I can't reproduce it either

anyway, 0.7.0 is better than 0.6.0 so I'd merge it right now

=== btw there was a video that breaks (freezes) 0.6.0 due to unusual frame time stamps, I suppose. It works in 0.7.0 but probably there's another room for improvement here, if you want you can check this out: https://yadi.sk/i/2hxUFhuq3hRBBQ

chorong761 commented 4 years ago

I assume it is the same as #9? And for 0.6.1, not worker branch right? I can't reproduce the problem, so it'd be nice if they can try the worker branch and set "LogFile" in registry to generate a log. In 0.6, log file needs to be toggled in source file.

Here is log generated by adding "LogFile" in regedit

https://drive.google.com/file/d/19lueaItRIHeC_423yTl2FhNbPkxIFBCZ/view?usp=sharing

CrendKing commented 4 years ago

there was a video that breaks (freezes) 0.6.0

I tried 0.6.0, 0.6.1 and 0.7.0 with SVP doing 2x FRC. None of them freezes. Do I need to somehow seek in this short video to trigger it, or it always happens at beginning? If you can trigger it all the time, log file would be helpful.

This could be a bug still exist in 0.7.

0.7.0 is better than 0.6.0 so I'd merge it right now

No need to rush. I'll try to solve https://github.com/CrendKing/avisynth_filter/issues/9 (possibly a non-issue) then merge.

chainikdn commented 4 years ago

None of them freezes

sorry, forget to mention in freezes in mpc-be (!) not in mpc-hc. and w/o svp at all.

chainikdn commented 4 years ago

for some reason 0.7.0 doesn't register/load in Win7 x64 the dependency walker report is there (see the last message): https://www.svp-team.com/forum/viewtopic.php?pid=76877#p76877

CrendKing commented 4 years ago

All the missing dependencies come from IE related stuff (from shell32.dll), which does not exist in my Windows 10. It is transitive dependency, so I can't do anything about it. He should update with latest VC runtimes. Projects like https://github.com/abbodi1406/vcredist allows you to do all versions AIO, but I don't take responsibility if anything goes wrong. It is not official from Microsoft after all.

Also try Windows Update see if there's any patch.

chainikdn commented 4 years ago

The thing is 0.6.1 (built by me) works and 0.7.0 built in the very same environment doesn't. And there're TWO people reporting this, and SVP always installs the latest VC runtimes. Strange...

CrendKing commented 4 years ago

Can you build 0.7.0 yourself and ask them to try? If it works, then we can compare the dependency of our builds, see difference.

It could be our Windows version, VS2019 version, or runtime version.

BTW, you sure you have AviSynth.dll in the same directory as the filter right?

BTW 2, in the next version, I can make a change so that it dynamically load AviSynth.dll instead of static link. It has two pros:

  1. We can add more search path for the dll, e.g. the player directory.
  2. MsgBox to tell user the missing dependency.
Nuihc88 commented 4 years ago

I am experiencing the same problem with v0.7.0 builds on Win7 x64 with both 32bit & 64bit versions and installing newest runtimes from the AIO pack mentioned above made no difference. The only consistent difference i see in my Dependency Walker Images is the red KERNEL32.DLL (SetThreadDescription) that would seem to correspond to the line: "Error: At least one module has an unresolved import due to a missing export function in an implicitly dependent module." Other than that v0.6.1 builds are showing the same warnings despite having no problems registering the filters. I hope this helps with the troubleshooting. PS. I am also unable to load the v0.7.0 filters directly from PotPlayer.

DWIs.zip

chainikdn commented 4 years ago

yep, SetThreadDescription it is

CrendKing commented 4 years ago

OK. I disable that call in release build. Try avisynth_filter.zip

Nuihc88 commented 4 years ago

OK. I disable that call in release build. Try avisynth_filter.zip

It works. In addition to registering correctly, loading the filter directly works as well.

CrendKing commented 4 years ago

in freezes in mpc-be (!) not in mpc-hc. and w/o svp at all.

@chainikdn My main player is MPC-BE and it works fine. And when you say w/o svp, does it involve svpflow?.dll at all or something simple like

AvsFilterSource()
return Prefetch(8)

is sufficient to trigger? I tried both and everything works.

Can you do it and give me the log file when it crashes? Thanks.

Nuihc88 commented 4 years ago

While the version above works fine, the newly released v0.7.1 is giving me this error: Unable Once closed, PotPlayer loads normally, aside from AviSynth Filter not being loaded...

CrendKing commented 4 years ago

@Nuihc88 Where is your AviSynth.dll located? You might find one in your PotPlayer directory. If not, move the one with the filter there and try again.

If problem persists, Can you zip your AviSynth.dll and upload here?

Nuihc88 commented 4 years ago

Like the readme says, you need to place the AviSynth.dll in the PotPlayer directory now.

I already have. Tried a few different versions at different directories already, 'player folder', 'filter's folder' & 'system folder', but this is the one i normally use: AviSynth.zip - I'll try with the one bundled with SVP next... EDIT: Problem persists with v3.5 r3043 bundled with SVP, so i don't think it's about the version... EDIT2: If i remove AviSynth.dll files from 'player folder' and 'system folder' a different error is shown: Failed ...So i don't think it's about the path(s) used either.

CrendKing commented 4 years ago

I don't know why CreateScriptEnvironment2 has different exported names between x64 and x86 builds. AviSynthFilter.zip should work regardless. Once you confirm, I'll release 0.7.2.

Nuihc88 commented 4 years ago

The build above appears to be working correctly.

Nuihc88 commented 4 years ago

Was double-checking and noticed that the older AviSynth+ version bundled with SVP is now giving this error: CreateScriptEnvironment I guess there could still be any number of similar issues, all with essentially the same cause, but only affecting specific configurations...

CrendKing commented 4 years ago

The minimum supported AviSynth+ version is 3.5.x. Which versions work and which not? Please include both 32/64 bits and version number.

EDIT: Try AviSynthFilter.zip