Closed chainikdn closed 2 years ago
Not sure why the combination of PotPlayer+EVR would cause this issue, since PotPlayer doesn't allow debugger to attach. The reason is https://github.com/CrendKing/avisynth_filter/commit/ff84cb4674d4384f4d567f3b44691cd3ac5b6c69, which I implemented true P010 by bit shifting. We know the function itself is fine since it works in all other combinations. Could be something specific to PotPlayer.
If you really want to use P010 in PotPlayer EVR, maybe ask the player author. Otherwise, if you just want to get back to the 1.3.1 behavior, uncheck P010 from AVSF input format.
We know the function itself is fine since it works in all other combinations.
Not so sure about it... I have at least one complain where user with i3-8100 claims that 1.3.1 performance is significantly better than 1.4.2, even with madVR.
if you just want to get back to the 1.3.1 behavior, uncheck P010 from AVSF input format
Nope, EVR won't connect in P016.
Actually I was thinking about a registry option to completely disable shifting and return to old behavior...
significantly better
Well, there is no magic here. The function does shift every 16-bit integer for each frame, even with SIMD if available. The larger the video size is, the slower.
Pre-1.4 behavior is basically hack. The input and output format is P010, but AviSynth receives YUV420P16. So if, for example SVP, can only process 10-bit frames, and not 16-bit ones, one has to convert that 16-bit back to 10-bit in the script with ConvertBits(10)
. But then it's essentially the same conversion as my function, and same performance penalty.
You can do this experiment. Uncheck P010 in AVSF. Use this script:
AvsFilterSource()
ConvertBits(10)
Prefetch(16)
This way the output from script is P010, which will connect to EVR. Try this in PotPlayer. You will see the exact same poor performance.
https://github.com/CrendKing/avisynth_filter/issues/65 is the issue triggered me to make the change. You can review the issue. Either I do the conversion myself, or I have to document the hack, requiring user (which includes you and other SVP devs) to put ConvertBits(10)
. I chose to remove the hack.
Nope, EVR won't connect in P016.
You are correct. Previously, P010 and P016 are interchangeable since they share the frame server format, so even the input is P016, the output can still be P010. Now they are different, thus no connection. I will experiment a bit see if I can improve this.
Well, there is no magic here.
"significantly better" to the point where 1.4.2 is basically unusable and the user had to revert back to 1.3.1 In this case, my vote is for "hack" :D
So if, for example SVP, can only process 10-bit frames
Nope, SVP can process P016 too, and actually the script always does ConvertBits(16) if the video player is "avsf".
==== though I can't see any perf. difference on my Ryzen 4800, except for weird PotPlayer+EVR behavior
I did some tests. It turns out that when EVR allocates for a sample, the memory buffer of the sample has PAGE_WRITECOMBINE protection (could be EVR is doing memory-mapped IO), whereas madVR's sample does not. Pages with PAGE_WRITECOMBINE is optimized for sequential write performance, at the cost of severe read penalty. The current implementation of the bit shift function is the only place that reads destination buffer. That's why only EVR + 10bit has the issue.
Simple solution would be to allocate temporary buffer without PAGE_WRITECOMBINE, do the bit shifting, then copy the result to EVR's sample buffer. I verified this working. I'm wondering if there's better solution (e.g. could I change that memory page's protection flag when bit shift is needed).
Try https://github.com/CrendKing/avisynth_filter/actions/runs/2394598935
This is specific to EVR. I can reproduce the issue in MPC-BE as well.
PAGE_WRITECOMBINE basically turns the page into non-temporal, which means data is not reusable after the first reference. It is commonly used in 3D graphics applications. There is way to allocate temporary cache just for write combining memory with instructions like MOVNTDQA, which according to my test does work, but still imposes way higher penalty than using MOVDQU on regular RW page, because the bit shift function only read each 16-bit once. The extra allocation, copy and free of the new implementation should have trace amount of footprint.
https://stackoverflow.com/questions/37070/what-is-the-meaning-of-non-temporal-memory-accesses-in-x86 contains some good information about non-temporal memory.
Yeah, this seems to work.
However, I don't like the whole idea of doing something for nothing. And here we actually do a lot of work, especially in cases like 4K 24->120 fps. Again, for nothing. Just for the abstract internal beauty.
Can we add some way to bypass the conversion at all? A parameter to AvsFilterSource() looks like a good place. Something like
AvsFilterSource(dirtyHackCrendKingDoesntLike=true)
Well, folks from https://github.com/CrendKing/avisynth_filter/issues/65 somehow had their SVP expecting P10, not having ConvertBits(16) in their generated script, and end up having green screen. I don't know if SVP has updated to address that issue, but other users of this filter could be confused of this as well, should I revert https://github.com/CrendKing/avisynth_filter/commit/ff84cb4674d4384f4d567f3b44691cd3ac5b6c69.
My point is, we now have a version that works reasonably OK without the need of a hack, which is good. Unless you can demonstrate a real need of that hack to overcome the benefit of being hackless, I think it's time to close the issue. I really don't think we are doing something for nothing. In fact, I'd even go so far to retract the patch just to recommend people to stick to madVR for 10 bit content, to be honest. It's not like madVR has any disadvantage against EVR, apart from being 3rd party.
Unless you can demonstrate a real need of that hack to overcome the benefit of being hackless
this:
I have at least one complain where user with i3-8100 claims that 1.3.1 performance is significantly better than 1.4.2, even with madVR ... to the point where 1.4.2 is basically unusable and the user had to revert back to 1.3.1
If there's one user => there are many.
other users of this filter could be confused of this as well, should I revert https://github.com/CrendKing/avisynth_filter/commit/ff84cb4674d4384f4d567f3b44691cd3ac5b6c69
I'm not asking you to revert, I'm asking to add a bypass option. "off" by default.
I have at least one complain where user with i3-8100 claims that 1.3.1 performance is significantly better than 1.4.2
I'm happy to take a look at the user's claim, if he can come here and provide info, such as log and sample video. If it is proved to be a big problem, I'm happy to add an option to return the old behavior.
dunno why you want a prove for that you actually know by yourself?
Well, there is no magic here. The function does shift every 16-bit integer for each frame, even with SIMD if available. The larger the video size is, the slower.
indeed, no magic that doing an additional (and unnecessary) work make it slower
What I meant there is the function is not zero cost. It is linear to the video size. However, comparing to the bit shifting function, the deinterleave, interleave, AviSynth internal and SVP are much more costly. It's hard to believe adding the bit shift would suddenly make smooth playback to be unbearable. Most of the time, either the user's computer has a lot of headroom, so adding bit shifting is ignorable cost, or the computer already suffers, so adding bit shifting is also inconsequential. The only case where it make sense is that computer is right at the edge of processing 24fps with near 100% CPU. That's why I want to know what causes the big difference. There could be other factor or bug that is the root cause, just like this one with PAGE_WRITECOMBINE.
Hello, I have been expriencing this issue (i3-8100, Geforce GTX 1060 3GB, 8GB Ram.) SVP 4 worked perfectly using PotPlayer + AVISynth 1.3.1 + default rendered all the way up to recent update. Recent update resulted in color issues (using all other non madVR renderers) and performance loss using madVR + SVP. After corresponding with SPV's (awsome!) support, I had to revert back to 1.3.1 after experiencing performance issues with all renderers (including madVR) following recernt update. madVR also experienced much less smooth playback following recent update. It worked, but took time loading after fast forwarding or any skip during playback.
I hope this is helpful 🙏👍
Could you open a new issue, follow the issue template. Some screenshots for color problem, log and sample video to reproduce the symptom and CPU usage during playback would be very helpful.
This ticket is for EVR 10-bit problem, and a patch has been provided. So let's close this issue.
please, don't mess things up there's no "color problem", I'm only talking about performance issue introduced with this "10-bit shifting"
I was referring to Overtone3's comment:
Recent update resulted in color issues (using all other non madVR renderers)
I'm yet to see numbers to support this "performance issue". I do profiling myself and the overhead of that function is low. For example, in a test without SVP, Deinterleave() spent 150 units of CPU resource, Interleave() spent 20 units, the two bit shift spent 60 units. However, AviSynth's Antialiaser::GetAlphaRect()
spents 3314 units. If I enable SVP, SVP itself will take even more resource.
And to be clear, it's not me who wants to have the bit shifting in the first place. I even asked AVS+ guys to properly support p010le
in addition to the existing yuv420p10le
internally so I can get away without the shifting, but no response.
is it possible that the code goes the wrong path (i.e. non-AVX2) on i3-8100? may be adding a few perf. counters to the log output will be a good idea?
"color issue" is about ConvertBits(16) in SVP's script is not compatible with AVSF 1.4.2 when using EVR, so just forget about it
Sure. I'll add a few more logs. I think i3-8100 should support AVX2.
Thanks for clarifying the color issue.
soo... it's like 5-7 ms per 4K frame, am I correct? input thread: (2 ms deintreleave + 6 ms shift) 24 fps output thread (one thread!!!): (3 ms interleave + 6 ms shift) 60+ fps
vote for
AvsFilterSource(dirtyHackCrendKingDoesntLike=true)
Again, script, log and sample video.
https://4kmedia.org/ultra-hd-hdr-samsung-4k-demo-wonderland/ avsf.log This is with just 30-40% CPU load. More CPU load --> bigger delay. Besides the additional (unneeded!) load, I can imagine situation when this thread will start to slow down everything since the output conversion is single-threaded.
40% CPU seems high without SVP. Do you have hardware decoding on?
I tested the same video and here's the average numbers I'm seeing:
No-op script: Log: no-op.log Deinterleave: 1ms Interleave: 1ms Right shift (all 3): 2ms Left shift: 1.5ms CPU: 5% total (Ryzen 5800X), using AVX2 LAV hardware decoding is set to D3D11 (I think it's reasonable to expect user enabling hwdec for 4K videos) If I turn off hwdec, CPU usage becomes around 13%
Assuming your log is taken with the same no-op, they are basically doubling my numbers all-around.
After enabling SVP (doubling 24fps to 48fps): Log: svp.log All stats from AVSF are still the same The turnaround time of a frame is 388ms, note that due to throttling this is not how much time the script actually spent on CPU: 25% total Still able to maintain stable 48fps output
I think these numbers are consistent. Since all these functions basically go through each pixel and do some SIMD stuff, they should have approximately same cost. I don't think adding 3ms is a big deal for 4K, especially when all these numbers are amortized by multi-threading.
And since it's madVR we are talking about here, if you really don't like the shifting, just uncheck P010 and P210 in AVSF. According to my test, with SVP enabled, P010 has at most 1% CPU overhead in my environment comparing to P016.
Still able to maintain stable 48fps output
No doubt Ryzen 5800X is a good CPU and this additional overhead doesn't hurt it at all. Sadly not anyone have 5800X... Even on my 4800H, the output thread takes 3 ms + 6 ms = 9 ms per frame, thus limiting output to ~100 fps (*) absolute theoretical maximum. Lets ask Overtone3 for a 8100 numbers.
especially when all these numbers are amortized by multi-threading
what do you mean? we now have only ONE output thread that does all this stuff
==== (*) not even 100 fps
T 29236 @ 14670749: Processing output frame 830 for source frame 332 at 140852444 ~ 141019288 duration 166844
T 29236 @ 14676981: InterleaveUV() start
T 29236 @ 14679210: InterleaveUV() end
T 29236 @ 14679262: BitShiftEach16BitInt(0) start
T 29236 @ 14684574: BitShiftEach16BitInt(0) end
T 29236 @ 14684867: Delivered frame 830
6 ms - copy 2 ms - interleave 5 ms - shift 14 ms total --> 70 fps max (!!) or 110 fps w/o the bit shift
Hello,
I am still experiencing image and performance issues using EVR and SVP 4.0. I know I don't have the strongest system in the world but used to play 4k videos @60fps until recent updates. (and also I thought Gefore GTX 1060 3GB should compensate somewhat for weak CPU no?) I am happy to share addtional information but might need some guidance on which data and how to generate it. I am attaching SVP 4.0 event log + [SVP log.txt] (https://github.com/CrendKing/avisynth_filter/files/8837037/SVP.log.txt) ![EVR + SVP 4 0] ![EVR (Autoselected) + AVISynth1 4 2 HDR 10bit ] What else can I share? SVP log.txt
@Overtone3 You can enable AviSynth Filter's logging by following the instruction on this wiki page.
@Overtone3 remove that "quick fix" from generate.js. Current solution is this:
And now you probably have P010 unchecked but ConvertBits(10) resulting in both image and perf issues with EVR.
Hi,
Nuihc88 - I followed the instruction on the wiki page, but for some reason no log file is being generated in my path folder after watching videos on PotPlayer... I am not sure what am I doing wrong. I did the whole registry thing and restarted but that didn't help. Any idea what I could be missing?
Chainikdn - thanks, "quick fix" was already reversed when I posted the previous screenshots. Indeed, P010 is unchecked in AVSF. Sorry, I am not sure I understand the second bullet point. In the script, "ConvertBits" is mentioned twice. One is the quick fix line which is now back to 16 and the other is: res += bl+'input_m8 = '+(media.p10 ? 'input_m.ConvertBits(8)':'input_m')+br; which I didn't touch.
Hi, Overtone3. It might be dumb, but please disable AVSF and check if there is color issue. Make sure AVSF doesn't show up in context menu -> Filters.
Log file should be created if everything worked as expected. Does AVSF show up in menu -> Filters?
Is there particular reason you don't use madVR for 10-bit content? It supports all kinds of video formats and usually results in better performance.
Also, what's the format of the video? Is it 4:4:4 and 10 bit? Or just 4:2:0? Does your GPU support HW decoding HEVC?
Hi,
No color issues when disabling AviSynth. Yes, AviSynth, does show up in menu. I am happy to use madVR, however, I am experiencing the same performance issues playing 4k videos using madVR + SVP (no color issue though). I am using Gefore GTX 1060 3GB which supports HEVC decoding (most videos that I watch are HEVC). My video card output color format is 4:2:2. Videos' color format is 4:2:0.
BTW - I noticed that when starting playback using EVR + SVP, and then switching to madVR -- the color issue prevails. Restarting PotPlayer with madVR resloves the colost issue, although, still struglling playing 4k videos due to performance issue. 🙏
This is because of "10-bit output" checked in PotPlayer preferences (Video section).
AVSF connects as YUY2 -> YUY2 in this case (no idea why) and things obviously gets broken.
I noticed that YUY2. Usually it is there because EVR only support YUY2 beyond 4:2:0 10 bit. For example, if you play 4:2:2 10 bit, or 4:4:4 16 bit, all are downgraded to YUY2.
AVSF supports YUY2. Not sure why there is color issue for you without log.
Hi! So unchecking the "10-bit output" under video settings resolved both the color and performance issue, but doesn't that mean that 10bit videos are now presented at 8bit?
I don't really know what to do regarding the YUY2 subject you mentioned.
I tried again to log AviSynth but no luck. Adding here my registry screenshots. What am I doing wrong?
Cheers!
doesn't that mean that 10bit videos are now presented at 8bit?
first of all the video is converted to 8-bit (i.e. YUY2) at the very beginning of the chain, then 8->16 bit in the script and then back to 8-bit I suppose?
What am I doing wrong?
you're writing to c:\, which won't work because of permissions
Ok chainikdn, you were spot on regarding the writing to c drive. I was able to generate AviSynth logs. :) Attaching two of them - one using madVR with HDR output and one using EVR. EVR had fairly good performance without dropping frames. madVR interestingly dropped frames only when on full screen mode. Sorry, but if the video is converted to 8-bit, doesn't that mean that I lose color data? I would very much like to view the videos in their full orifinal 10-bit format... :\
Edit: adding another AniSynth log using 10-bit color output enabled. Color issue continues. avisynth_filter - madVR.log avisynth_filter - EVR.log avisynth_filter 10-bit color output enabled + EVR.log
Sorry, but if the video is converted to 8-bit, doesn't that mean that I lose color data?
sure you do
EVR doesn't want P016 --> everything gets converted to YUY2 from the very beginning P010 implies conversion in AVSF --> "performance issue", and CrendKing doesn't want to "fix" this so I'd say let madVR decide how to render P016 video. dunno what exactly does that PotPlayers' "10-bit" switch but I'm sure it's not needed for madVR.
From the first two logs, were you playing regular 4:2:0 8 bit video, because the formats are NV12 -> NV12? And EVR is fine while madVR drops frames in full screen? What's the video?
3rd log is from 10 bit video (P010 -> P010). I see the file is very short and the source rate ratio is very low. What's the video? You are saying EVR has issue but madVR is fine?
Also, replace your avisynth_filter.dll with the one from https://github.com/CrendKing/avisynth_filter/actions/runs/2459714066 (Release x64) when you report back.
Hi,
Ok,
Thanks for the info.
At the point I suggest you:
Hi, Thanks again. I updated NVIDIA drivers but no change. Installed LAV filters and used in PotPlayer. See below configuration data and results. (This is all using the same file) Bottom line is I am still experiencing performance issues viewing HDR using madVR.
I tried DmitriRender (without SVP nor AVSF) as well which leads to the same result. I am beginning to this this is a madVR issue..
I'm lost..
![P010 + madVR ]
![Screenshot 2022-06-13 143901]
![Screenshot 2022-06-13 144037]
I am beginning to this this is a madVR issue
If I'm allowed to bet on this one, I place my chips on PotPlayer. Have you tried MPC-BE? Since I can't reproduce the issue, you have to trial and error to find out who's the culprit, the player, madVR or AVSF. If you think it's AVSF, also upload the log for analysis.
Ok, so I tried MPC-BE + madVR + FFDdhow filters and got the exact same result. Using madVR I was able to generate 10bit outout but experienced performance issue viewing 4K videos.
I'm lost 🤣 "MPC-BE + madVR + FFDdhow" won't give you 10-bit cause ffdshow is 8-bit only AVSF + madVR: for ver. 1.4+ you must have "P010" checkbox unchecked in AVSF properties, and madVR will show P016 in this case ---> this must work as fast as AVSF ver.1.3 + P010
in fact I'm ready to deploy custom AVSF build with this P010-bit-shifting disabled when RC is connected
MPC-BE + madVR + FFDdhow filters and got the exact same result.
Probably something is wrong with your environment at this point.
in fact I'm ready to deploy custom AVSF build with this P010-bit-shifting disabled when RC is connected.
Feel free to fork if you still believe that bit shifting is such a big deal. My stand is still that until concrete proof shows up I believe it's minimum impact.
My stand is still that until concrete proof shows up I believe it's minimum impact.
leave 4 cores w/o HT instead of 8 cores + HT and you'll see the impact
Unfortunately I don't have a CPU with 4 cores. I can only see what I can see. Why don't you ask some of your SVP customers with that configuration to come here, show me some screenshots and logs that prove impact, and I'll happily provide that option you ask for.
Is what I'm asking unreasonable?
I don't have a CPU with 4 cores
you can use Ryzen Master or BIOS settings or even just msconfig.exe ;)
Oh I see what you mean. I'll try that later. But didn't you say this bit shifting is about single core only? Disabling 4 cores or HT does not alter single core performance, no? (It could even increase performance a bit because OS has less cores to context switch)
what do you mean? we now have only ONE output thread that does all this stuff
Disabling 4 cores or HT does not alter single core performance, no?
yeah, but all other computation threads are still here, sharing the same 4 (for example) logical cores
I'll try that later
go for bios or ryzen master, cause you can't disable HT with msconfig.exe
Just did the test.
Environment: MPC-HC + LAV Filters + madVR Video: https://4kmedia.org/ultra-hd-hdr-samsung-4k-demo-wonderland/
Total CPU usage:
Logs: 10bit.log 16bit.log SVP on 16bit.log
The takeaways are:
The 50ms difference between 10 and 16 bit are probably within statistical error, but the 4 seconds difference from SVP is clearly not. That's 22ms delay per frame. No wonder why it's so choppy.
Just to be clear, I'm not blaming SVP for having performance issue. I believe SVP is already doing it's best. But if a user would ever use SVP with any sort of interpolation, the extra time taken by bit shifting becomes insignificant.
So at this point, I'm fairly certain about the impact of the functions. If anyone can provide a use case where those 1-2ms on 4K 10-bit HDR playback is crucial, I can consider adding the option. But SVP is not just not one.
So I know EVR is not good in 10-bit playback... and still Daum PotPlayer allows it. If it's set to EVR (which is by default) and you play 10-bit video, everything is connected in P010 and looks good. The problem it's now NOT so good starting with AVSF 1.4.0. No SVP, no script, just an empty AVSF --> the video is totally unwatchable, like 1-2 fps. Any video - tried both 4K and 720p. Plays fine with AVSF 1.3.1.