acomminos / wine-pba

Patches to add a persistent buffer allocator for faster dynamic geometry in Direct3D games.
GNU Lesser General Public License v2.1
138 stars 6 forks source link

World of Warcraft - Test Results #7

Open IngeniousDox opened 6 years ago

IngeniousDox commented 6 years ago

commit: https://github.com/acomminos/wine-pba/commit/0ca7edba117f5c91c75623880caf8e4d1eeb117b0ca7edba117f5c91c75623880caf8e4d1eeb117b specs:

Testing Spot: Suramar, on top of Astravar Harbor Graphic settings: Preset 7 Test Reults:

Staging 2.21 DX11: 42 fps (no csmt - lowers fps)
Staging 3.2 DX11: 18 fps (no csmt - lowers fps)
wine-pba 2.21 DX11: 67 fps 

Staging 2.21 DX9: 48 fps (no csmt - lowers fps)
Staging 2.0 DX9: 82 fps
wine-pba 2.21 DX9: 97 fps

Amazing results for DX11!! DX9 is also a bit faster then before (7 fps higher then your first build). But the improvement is DX11 is amazing. Eyeballing the CPU% from HTOP and the GPU-Util from nvidia-smi, I get:

DX9: 275% CPU / 67% GPU for 97 fps
DX11: 256% CPU / 51% GPU for 67 fps

Way higher then before. Looking forward to your upcoming improvements.

Dox

IngeniousDox commented 6 years ago

BUG: I think you have a memory leak or similar. My system locked up. After restarting I kept an eye on MEM% using HTOP, and WoW is increasing with 0.1% every 2 seconds.

IngeniousDox commented 6 years ago

I have also noticed something different, the fps I get after I log in from a fresh start, on my test toon varies. I usually have around 65 fps, but I have also had 70+ fps a few times, but I have also noticed 45ish fps a few time. This interested me.

I start the game offscreen on a different tag of my AwesomeWM setup. My background fps is capped at 30, but my foreground fps isn't. I normally have VSync on with triplebuffer, but for testing I disable vsync. I figure it has something to do with how fast I switch to the game or something, and the average fps it has been running on till I fully loaded in the game. So I tried without background limiting, that seems to get me to around 60 fps (Which is probably because for offscreen rendering it is vsynced automagically to my desktop refresh rate by either my WM or Xorg or Nvidia, so the average became 60 fps).

So then I disabled fps limiting altogether, made sure I was on that Tag from the moment the game starts up. And low and behold, I get 75 fps in that spot consistently. This was remarkable enough, that I figured I would post here.

Dox

ghost commented 6 years ago

Very happy with this on older i5 laptop with nVidia Optimus. Went from 8fps to over 30 in Dalaran, from 30 fps in world PVE to 60fps. Made WoW useable for me without going to Windows boot on laptop. But huge memory leak makes system grind to a halt within a few minutes.

Doesn't work at all on AMD gpu system.

acomminos commented 6 years ago

Will address the memory leak issue asap in #8.

acomminos commented 6 years ago

Leak should be fixed now.

IngeniousDox commented 6 years ago

commit: https://github.com/acomminos/wine-pba/commit/a5348cf24854102f49ba6e227e6e837a3871fa35

test result: 80 fps / 260% CPU / 62% GPU

Another increase in fps. The leak seems fixed.

Thanks for the quick fix.

yaomtc commented 6 years ago

It's the opposite result for me on AMD. Using a Radeon RX 580 with an Intel i5 4690k, with Linux 4.15.6, Mesa 17.3.6, and the pba-2.21 build from Lutris 0.4.14. When compared to unpatched Wine Staging 2.21 from Lutris (~70-80 fps) with the same static camera angle and position, there is a major decrease in frame rate, down to around 20-26 fps. All options in Lutris are set to the default, with the exception of disabling Lutris runtime in each test.

acomminos commented 6 years ago

Going to look into using AMD_pinned_memory where available- I've heard it's much faster than ARB_buffer_storage on AMD cards.

@IngeniousDox, I've made a variety of changes to the data structures backing the buffer heap that should make it much more efficient- can you measure the impact on your benchmark when you have a chance?

IngeniousDox commented 6 years ago

commit: https://github.com/acomminos/wine-pba/commit/57f67be9a6e5af59039321cf7ddf55bbad838e16

Unfortunatly I moved my toon slighty. So first a new baseline test with the previous commit:

base: 74 fps / 255% CPU (1 core maxed) / 57% GPU

I'll find a better spot for testing, or /timetest a route in the future.

test result: 72ish or 64ish fps / 244% CPU / 56% or 44% GPU
test result: 69ish or 62ish fps / 244% CPU / 56% or 44% GPU

FPS is slightly lower, and it still depends on how fast I log in I think. And there was a weird jumping up and down every few seconds, my highest CPU core % during that time dropped from 95% to below 80% for a bit.

Yes, need a better way to test this.

IngeniousDox commented 6 years ago

Timetest Crimson Thicket (Suramar) -> Vengeance Point (Broken Shore):

Base -> Newest
Min: 20 -> 14
Max: 122 -> 113
Average: 72 -> 64

And back:

Base -> Newest
Min: 18 -> 15
Max 149 -> 142
Average: 76 -> 69

Had to go back and forth 2x, since the nvidia shadercache is being filled the first time I go. Does show fps is down a bit. But before we can really test, it would be lovely if the fps you get when you log in, doesn't depend on how fast I log in, or if I have it focussed. ;)

IngeniousDox commented 6 years ago

Did a quick lfr, killing the game midfights to switch between the 2 (previous build and the current build). I had the gut feeling the previous build I had is slighty faster then the new build, but it hardly matters in raids. FPS is lower then out in the world, GPU has lower util, while CPU has more usage.

I just want to note, that I have double the fps compared to Staging 2.21 without PBA in raids now aswell. So thats great.

acomminos commented 6 years ago

Thanks for the data- I'm going to perform further analysis of the addresses in fenced batches. I suspect coalescing batches prior to submission into the primary free tree should give us the best of both worlds.

acomminos commented 6 years ago

I've implemented deferred coalescing in ac9b10e, which should significantly improve frame timing for games with cyclic buffer allocation patterns (such as WoW).

IngeniousDox commented 6 years ago

commit: https://github.com/acomminos/wine-pba/commit/68de8e9b3f26e68bc6d64f353e0954ddab2f7590

First noting, that logging in still ends up giving me either around 60ish fps or around 70ish. When I have 60 fps, logging to another toon, moving around a bit, and then log back, and it is still on 60 fps. Both have around the same CPU usage, but the 60 fps has 44% gpu while 70 fps has 54% gpu.

test result: 71 fps / 267% CPU / 52% cpu

Timetest Crimson Thicket (Suramar) -> Vengeance Point (Broken Shore):

Min: 20 fps
Max: 106 fps
Average: 67 fps

And back:

Min: 18 fps
Max: 136 fps
Average: 71 fps

Now on a more important note: Game feels smooth(er). First without vsync, then with vsync back on. It isn't something you can put fps numbers on, but it is a gamer feeling I guess.

acomminos commented 6 years ago

Thanks. The phenomena you're noticing is likely to be an improvement in consistency of frame timing. This makes sense, since we no longer freeing large chunks of buffers periodically (deferred instead).

I'll look into a good way to quantify this- I believe glxosd can report this.

IngeniousDox commented 6 years ago

Quick 2 boss lfr Aggr/Argus: Feels smooth, even under load of 25 man raid. It was already feeling smooth on the "https://github.com/acomminos/wine-pba/commit/a5348cf24854102f49ba6e227e6e837a3871fa35" build that I was using yesterday (The build that gave me the best fps thusfar). But this feels better.

I think it will really benifit classes that stuff like blink / charge / fel rush etc. Heroic leaping on warrior feels much better.

IngeniousDox commented 6 years ago

commit: https://github.com/acomminos/wine-pba/commit/c565623357098b1c946b4ef6c7768f123630e27f

test result: 71-75ish fps / 280-290%CPU (1 core 95%) / 57%-59% GPU

I guess I was "lucky" with timing this log in, and getting high fps. I tested 2 more times, and got slightly lower fps ranges. It remains an interesting bug.

Timetest Crimson Thicket (Suramar) -> Vengeance Point (Broken Shore):

Min: 22 fps
Max: 106 fps
Average: 66 fps

And back:

Min: 22 fps
Max: 137 fps
Average: 71 fps

FPS on timetest didn't change significantly.

IngeniousDox commented 6 years ago

PS: It feels "slightly" less smooth to turn around with this build. I had to relog both builds a few times to double check. With the previous build Heroic Leaping and turning around to charge back to a dummy also feels better.

What glxosd output are you looking for to make testable?

acomminos commented 6 years ago

Looks like glxosd has the TimeRecorder plugin for logging frame timings (triggered using shift-F8 by default). It would be interesting to plot and compare the distribution of that data across revisions.

IngeniousDox commented 6 years ago

In Warrior class hall: Charge dummy, heroic leap away, turn 180, charge back, mortal strike.

old = previous build (https://github.com/acomminos/wine-pba/commit/68de8e9b3f26e68bc6d64f353e0954ddab2f7590) new = latest build (https://github.com/acomminos/wine-pba/commit/c565623357098b1c946b4ef6c7768f123630e27f)

(once with vsync on, once with vsync off)

glxosd_benchmarks.tar.gz

I had between 120 and 130 fps without vsync.

While I was typing this comment I had the game running, and I tabbed back and saw 150 fps. There were no other warriors anymore around me anymore, so I quickly redid the tests without Vsync:

glxosd_benchmarks_solo.tar.gz

I know you have WoW, and you can get this glxosd TimeRecorder output yourself while knowing exactly what you recorded. But since I wanted to toy with it a bit, and got some output, I figured I would just give you my datasets anyways.

IngeniousDox commented 6 years ago

commit: https://github.com/acomminos/wine-pba/commit/a5dff624a8a63dca94138145105718ee712f7ddf

Test results similar. Not going repost. However, the login lower fps issue seems fixed.

Closed: https://github.com/acomminos/wine-pba/issues/19

acomminos commented 6 years ago

I'm interested to see how 0e1e946 performs for you; I noticed that UBO updates were bottlenecking PBA-allocated buffers, so I integrated ARB_multi_bind to reduce the number of driver calls needed for state updates.

It seems to be working very well for me (qualitatively).

IngeniousDox commented 6 years ago

commit: https://github.com/acomminos/wine-pba/commit/0e1e946b099e15b09430be8f2334b6d37e46b311 (Well, the next commit where you changed README)

test result: 75ish fps / 280% cpu / 54% GPU

CPU usage distribution over my 4 cores is good. I don't have 1 core 100%.

Timetest Crimson Thicket (Suramar) -> Vengeance Point (Broken Shore):

Min: 19 fps
Max: 106 fps
Average: 77 fps

And back:

Min: 21 fps
Max: 134 fps
Average: 80 fps

Average is 10 fps higher, so really nice increase in fps.

Now for the holy crap moment: In skyhold @ the mission board, I usually was around 60/70 fps. Now I'm suddenly at 140/150. So yeah, FPS is up. I'll test real raiding this weekend, but I'll see if I can do a quick LFR somewhere today.

IngeniousDox commented 6 years ago

Quick LFR Aggr / Argus: Aggr felt about the same, Argus felt a bit lower.

Doing 40 man invasion boss also felt like dropping fps lower then usual.

Honostly, not really a good test, I'll see if I can test 2.21bpa vs 3.3pba on friday when I'm doing mythic progression. That said, it might not even be due to Wine-pba, since we went from 2.21 to 3.3

schtufbox commented 6 years ago

Might be a bit of both, did notice that they'd made some other changes too...As long as it works!

xpander69 commented 6 years ago

Just came here to say Thank you for the awesome work. Wow is quite a bit more enjoyable now.

did some testing also:

https://www.youtube.com/watch?v=t7t61CScbGM

do not pay attention to the bad drops for 3.3 with pba, seems it was still filling the shader cache or something. im currently using 3.3 and its as smooth as the 2.21 was and i think it has even improved framerate a bit. might be placebo though :)

IngeniousDox commented 6 years ago

I saw your video. I figured it was filling shadercache as well. Anyways, FPS while doing /timetest is definitely higher. It definately isn't placebo.

What we need is now is test this in 20 man (mythic) raiding. To see how well it renders with a lot of spell effects. This is where the biggest hits in FPS have always been.

IngeniousDox commented 6 years ago

Switched from 4.15 Ubuntu Mainline kernel to Liquorix 4.15 kernel after someone poked me about having good results with it. Here is the timetest with Staging 3.3+PBA:

Timetest Crimson Thicket (Suramar) -> Vengeance Point (Broken Shore):

Min: 20 fps
Max: 112 fps
Average: 86 fps

And back:

Min: 21 fps
Max: 150 fps
Average: 92 fps

So, another 10ish fps higher again. Hurray!

I quickly retested with Staging 2.21+PBA. That however didn't give the same fps increase, it was in the range of 1~2 fps more.

Dehir commented 6 years ago

With wine-3.3 + amd fix seems to improve my performance.

https://www.youtube.com/watch?v=FwAo6CsckCA

LFR raiding 25+ ppl get around 30~ fps.

IngeniousDox commented 6 years ago

Varimatras 20 man mythic: Staging 2.21 gets slightly more fps (35ish) then Staging 3.3 (32ish) under raid load (similar results as with the other games). But staging 3.3 simply feels better / smoother. Staging 2.21 has some dips, and just general moving with 3.3 just feels better during raids.

Now, could have to do with the fact that 3.3 has newer code....but w/e the reason, I'm going to stick with 3.3 + PBA for WoW.

ghost commented 6 years ago

Huh, thought I'd posted this but I don't see it in the thread.

Flightpath Crimson Thicket to Vengeance Bay. I ran it a few times to get the shaders settled, then timed. DX11 only. Both running Arch and KDE 5.12.2.

Laptop: nVidia 630M on Optimus, i5-3910 with 8GB ram, Linux 4.15.7 slow/fast/avg 3.3PBA: 4/145/23 3.3Stg: 2/45/20 3.3: Fails to load game 2.21PBA: 3/176/46 2.21Stg: 2/58/30

Desktop: AMD 7770, Pentium G6950, 8GB ram, Linux 4.15.6

3.3PBA: 3/180/40 3.3Stg: 2/70/20 3.3: Fails to load game 2.21PBA: Unusable 2.21Stg: 1/37/8

So AMD is great on 3.3PBA where it didn't work at all on 2.21PBA, nVidia laptop best on 2.21PBA by large margin. I ran the nVidia one several times because I wasn't trusting the result, but that's the consistent outcome.

acomminos commented 6 years ago

Thanks for the data- I wonder how much (if any) slowdown is being introduced by non-pba-related 3.3 changes.

I'm going to rebase the latest patches onto 2.21 tonight to figure out what's up.

acomminos commented 6 years ago

1631e46 shouldn't improve benchmark numbers significantly, but it should reduce stuttering.

IngeniousDox commented 6 years ago

Honostly, I think the we are working the margins of CPU bound / GPU bound now. I'm thinking 3.3 uses more CPU then 2.21. So if you are CPU bound you get slightly less fps. Like during raiding.

During /timetesting, I think you run against the max of your GPU now. And well, that might be just my setup, and could be different for other setups. I definitely got increased fps with 3.3 during timetesting. Especially after I switched to zen kernel for throughput, but that is CPU throughput allowing my GPU to do more work. I still need to test raiding without zen kernel, since throughput sacrifices fps a bit, and I get CPU bound.

So I'm compiling your latest patches where you said you are using less CPU. And I'll timetest soonish. And I'll test raids tonight.

xpander69 commented 6 years ago

funny thing is that i get more FPS with 3.3 on heavy cpu situations, like Dalaran with lots of people i get 40+ FPS now, while with 2.21 i had drops down to 25 at times, Rest of the FPS seems pretty much same. I haven't done any timetests though. Just compiled latest patches also will see whats it now

IngeniousDox commented 6 years ago

commit: https://github.com/acomminos/wine-pba/commit/4b64220635c9eb0aeee34c451421085d1a71bb6b

Timetest Crimson Thicket (Suramar) -> Vengeance Point (Broken Shore):

Min: 20 fps
Max: 110 fps
Average: 83 fps

And back:

Min: 22 fps
Max: 139 fps
Average: 87 fps

I actually lost fps compared to my previous test with Zen kernel. But I'm not worried about that. I'll test it in raiding tonight for a proper test.

Dehir commented 6 years ago

@IngeniousDox My guess is the same even havent tested with 2.21 thou.

Ill get more fluid performance even high crowd raiding when fps dips close to 20~35. There is no stuttering and spikes any what so ever. GPU usage actually increases. Noted that running my ryzen 7 1700 with 3.0Ghz stock vs 3.9 overclock makes difference. It actually affects many game framerates. Chivalry Mediaval Warfare example.

My example: Raiding with 3.0 Ghz 20-25 FPS~ Raiding with 3.9 Ghz 25-35 FPS~ (mostly stable at around 35~)

But general for me i think its playable at the current state even while raiding with amd 7 1700 + RX 480. Thou there are always room for improvements, and you newer know if there becomes patch to break things on blizzard side ;(

xpander69 commented 6 years ago

the new patchset seems to be a miracle in heavily crowded places like dalaran 50+ FPS and Orgrimmar in the middle near bank 60+ FPS, was about 10-20FPS worse before, will keep tracking this but it seems really good so far . My Ryzen 7 1700X is cloced at 3.9ghz, XFR and boosts are disabled, performance governor is enabled when gaming.

edt: might be though that at this hour there aren't that many players around than usually on the evenings here (EU), will keep testing. so far seems amazing and my GTX 1070 is at 100% load most of the time, except in heavily crowded areas where it still drops down to 60%. 2560x1440 resolution and quality settings at 7, FXAA high.

IngeniousDox commented 6 years ago

I'm confirming increased FPS in Dalaran. Quickly set up a 2nd prefix so I could switch fast. Found a spot in Dala where I had 50 fps, relogged, and ended up ~10 fps higher there. Going to find a LFR during lunch.

@Dehir Overclocking CPU helps indeed. I have 4 cores (No HT). WoW is limited by the singlecore mainthread. CSMT gives it a 2nd thread/core to do the GL calls, and NVidia Threaded gives the driver another thread/core to do CPU intensive stuff. So thats 3 cores. The 4th core does sound / OS for me I guess.

You could almost say I bought my CPU based on overclocking especially for WoW on linux with CSMT + Nvidia threaded....and you are correct, that is actually what I did 3 or 4 years ago, and it is still working great. ;)

xpander69 commented 6 years ago

you use GL_THREADED on top of CSMT? that helps even more? .,..hmm never thought about this. i have 16 threads and i see them mostly balanced in load, wow is totalling 12-14% on those from 100%(all 16 threads) when in cpu limited situation thoug then i can see 1 core going up to 100%

IngeniousDox commented 6 years ago

They are 2 different technologies, like I described. It depends on the game if they work together though. Only way to find out is to test. With Staging 2.0 and DX9 it worked fine...but later versions CSMT just added CPU usage so I stopped using CSMT, and only use GL_THREADED.

With PBA CSMT works fine again, and GL_THREADED still works fine.

Now about raiding, just did LFR, and on Varimathras I had 40+ fps, 10 fps more then on Wed night. So yeah deffo better.

IngeniousDox commented 6 years ago

Ok, going to have to rethink something in testing. I figured I would double check if GL_THREADED actually gives me more fps. Since right now CSMT is working great, it could very well be that I'm just wasting CPU time by using GL_THREADED aswell. And I got some interesting results.

Timetest Crimson Thicket (Suramar) -> Vengeance Point (Broken Shore):

Min: 20 fps
Max: 116 fps
Average: 88 fps

And back:

Min: 22 fps
Max: 160 fps
Average: 98 fps

This is actually the highest I have ever had...but I figured, it could be a fluke, so I'm going to recheck it. I went back and forth mutliple times, and every time I did this, I actually got lower averages:

Forward: 88.2 > 86.7 > 84.6 > 83.2 > 83.2 > 84.0
Backward: 98.4 > 96.6 > 95.0 > 91.6 > 90.9 > 92.0

Restarted the game to see if I would get the same results, but this time I stayed on Forward ~88, Backwards ~98 for 5 times back and forth. So it seems I can't repeat it. But I figured I would note it.

Tested 1 more time by pressing Alt-Z right after I click the other FP, this removes the UI. This gave me Forward 94, Backwards 104. Perhaps I should have always tested like this, but I did't.

Anyways, my conclusion is that I (and you) need to check with and without __GL_THREADED_OPTIMIZATIONS. It has always given me more fps in the past years, but this seems no longer the case.

IngeniousDox commented 6 years ago

Ok, I confirmed I'm not crazy. I used the previous build to test the difference with and without __GL_THREADED_OPTIMZATIONS.

GL_THREADED: Forward 83, backwards 87.
No GL_THREADED: Forward 81, backwards 91

So on the previous build it actually smooths out the fps average.

This also confirms, that 4b64220 is an amazing improvement.

SteveEbey73742 commented 6 years ago

Flight path tests using vulkan as D3D render in place of opengl in wine registry. Dalaran (BS) - shipwreck cove 8 min 186 max 58 avg Booty Bay - Lights hope 7 min 201 max 88 avg Moonglade - Dig site, Tanaris 11 min 195 max 82 avg Wildhammer Stronghold (outland) - Stormspire, Netherstorm 9 min 149 max 54 avg Valiance Keep - Westfall Brigade (Northrend) 6 min 143 max 64 avg Timeless isle - Shado Pan Garrison 1 min 265 max 54 avg Srotmshield - Rizlits Hold Fast, Nagrand (draenor) 7 min 103 max 55 avg

Compiled 64 bit only, CFLAGS=-march=bdver2 -mtune=bdver2 -O3 -fPIC -fomit-frame-pointer -fno-align-functions -fno-align-loops -pipe symlinked wine to wine64, run regedit, put vulkan in place of opengl in Direct3d render. Enabled CSMT on the staging tab of winecfg. Launch wow from desktop shortcut, not battle.net app. Running Geforce GTX 970, 4Gig DDR5, 390.25 proprietary Nvidia from website. AMD FX 6350 6 core, 3.6Ghz, 16Gig DDR3 1600 ram. WoW set on 7 for all.

IngeniousDox commented 6 years ago

It would be lovely if it was that easy, and we could put Vulkan as renderer. DXVK is being made for that reason, but doesn't work for WoW yet.

You tested with OpenGL and Wine-pba.

SteveEbey73742 commented 6 years ago

No, I did not. What do I need to show, I have screen shots of both, and can see visual differences in running vulkan versus opengl. is there a wine test somewhere, that can show the underlying render that is being used.

IngeniousDox commented 6 years ago

Well, if you found a way to already run WoW on Vulkan, people will be interested. So we will want to know the steps you took, so we are able to replicate it. I'll start with a few questions, perhaps you can fill in the blanks:

TRPB commented 6 years ago

wine-pba wouldn't work with vulkan anyway, it makes use of opengl features.

mirh commented 6 years ago

Can we please keep OTs out? People is not here to hear about vulkan.

IngeniousDox commented 6 years ago

Its a thread about World of Warcraft - Test results. Its on topic enough. We are testing WoW in various ways, with and without PBA. Now, I haven't heard of a way to get WoW working it with Vulkan, so I'm open to hearing about it. And if I works with Vulkan, I'm going to test with it to get comparisons. You are free to mute the thread if you ain't interested.