elishacloud / dxwrapper

Fixes compatibility issues with older games running on Windows 10/11 by wrapping DirectX dlls. Also allows loading custom libraries with the file extension .asi into game processes.
zlib License
1.15k stars 82 forks source link

Investigate Windows 10 1607-1703 subpar d3d9 performance #164

Open mirh opened 1 year ago

mirh commented 1 year ago

Follows #111. It turns out that newer Windows did far more than just gimping VP. I couldn't get 1607 to experience more than a ~15% handicap (and boy wasn't it hard to find the right scenes), but with 1703 you'd have to be blind not to see a difference any time you are CPU limited.

I tested (in a 640x480 window at minimum detail for games, default size and settings for the rest):

I understand performance isn't exactly the kind of "qualitative" issue that the project usually addresses, but the effect is consistent and just as annoying if your framerates weren't ludicrous to begin with (in CSGO the performance uplift of multicore rendering is practically nullified, while the worst case microbenchmark scenario that I could scavenge was not even ONE THIRD of the W7/1511 speed). There are of course other caveats, but none that should lessen the main point.

Yes, I did all of this testing inside VMWare workstation, but while the virtual SVGA device isn't exactly comparable to native, the provided d3d9 driver should be pretty legit to compare with itself (unlike normal gpus I believe it even has the same codepaths for every Windows). Secondly this was done on my i7-6500U+950M old laptop, which isn't exactly a workhorse. But either through downclocking or core limiting (which should come especially easy if you use a VM), I see no reason anybody couldn't get down to the same level if it turned out it was actually required for reproducing the problem. Last but not least, I also took care of excluding any possible spectre and meltdown consideration (<2018 Windows should know nothing about them, and my host has mitigations disabled anyway).

mirh commented 1 year ago

Ok, so... a few other oddities. First of all, VMW 17 sucks and I couldn't even start some of the samples without the whole mksSandbox crashing down.

Secondly, at least when testing my favourite HDR_FP16x2.. I figured even W7 seemed to be somewhat suboptimal (220fps, vs 1511 doing 330fps and 1703 barely touching 170fps). That also correlated with pretty different relationships wrt cpu usage. On W7 I got these numbers while hitting 25% utilization (i.e. exactly one full thread). With 1511, I was barely registering cpu activity at all. While on 1703 it was averaging 40%.

Last but not least, I tried to swap 1703's d3d9.dll with the 1511 one (and in syswow64, for good measure and even because some applications are very finicky). Performance didn't seem to sway the slightest (at least in this one aforementioned sample).

mirh commented 1 year ago

Ok never mind, mystery unravelled for W7 falling behind the highest expectations The vm3dmp/vm3dum driver is the same across all windows versions (8.17.3.5, at least on VMW 16) But it is enabling/supporting/using different capabilities depending on that (if not any, you can notice it uses WDDM 1.0 there, as opposed to 1.1 anywhere else). But even just vanilla W8 was enough to reproduce the 1511 numbers (if not even a pinch better).

FWIW I also tested this natively on my 9600k+2080S desktop (with the legacy 473 branch versus 531, but still) and I could report 900fps in W7 vs 550-600 in W10 22H2. Funnily enough, not even dxvk (640) could compete.

Trass3r commented 11 months ago

You could try to record an ETW trace if it's really a problem on the latest version of the OS. https://github.com/google/UIforETW/releases

mirh commented 11 months ago

Ok so, uh.. There's a lot to unpack here. First of all, I have spent some time in these two months, to finally resurrect the damn old nvidia sample (binaries included).

I couldn't find (or at least I couldn't be bothered to have play nice) the exact same original build environment of the day, but the newer one works just as good. In fact.. it seems even too good? Pick up the fps numbers of the last post (from whatever ~2004 dx sdk and VS .NET 2003 exe they give you), and now multiply them by 2.5. On my desktop you have 1550 for new W10, almost 1800 for dxvk and 2100 for W7. And then on linux (both with wined3d and dxvk) you can make about the later results.

But I also gave a run to the old build while I was there, and wtf? Linux can pull off 1700fps even with that. And yes, optimization it might be argued, but then I also tried your absolutely delightful tool in Windows. And whereas the new build seems to take the majority of cpu time inside of nvd3dum.dll (I couldn't really spot anything else really, can you give it a go too?), the old one has like half of the cpu usage wasted by d3dx9_29. And not in any whatever "proper" function, but rather in gdi32full.dll.

And given how fairly quirky the virtual 3d device can be, I wonder if that couldn't also be responsible for the biggest ass imbalances that I have measured inside my vm. Conversely, there's still at least a tangible 20% handicap between the best performing (real!) conditions and W10 that I'm very confident about. EDIT: come on, W7 with dxvk scores 1000 on the old version and 2500 on the new one.