ValveSoftware / Proton

Compatibility tool for Steam Play based on Wine and additional components
Other
24.33k stars 1.06k forks source link

Threadripper Performance Problem and Workaround #2305

Open ghost opened 5 years ago

ghost commented 5 years ago

Proton would eat up all of the Threadripper threads and push the utilization to 75% to 100% on all of the threads and you would notice a low fps game-play on any game that run on Proton.

The workaround is:

Taskset command didn't work for me when I try to restrict the number of cores that process should be allowed to have/access, so another way is that you need to use the maxcpus option in Grub (append on Linux kernel line) and have integer set to about 4 to 8. This problem is specific to Threadripper and it affects all games that run on Proton.

When rebooting with grub, simply press 'e' on highlighted boot menu item and append the maxcpus option like so in grub:

linux ....... maxcpus=4

And simply press ctrl + x after editing the grub boot configuration, login to your favorite desktop manager then bring up System Monitor (any should be fine, I use Gnome System Monitor) to confirm that there are 4 or so cpus running and you can simply try running your game in proton as normal and see if this works.

OvermindDL1 commented 5 years ago

That doesn't really seem like an acceptable workaround as that means the rest of the cores go unused. In addition threadripper with proton here I haven't noticed low FPS or full core usage, though I haven't been checking to be honest on the core usage.

Unsure why Ctrl+x would bring up a system monitor either, by default that's the 'Cut to Clipboard' command on every interface I've used in recent memory as well? What system monitor do you expect it to bring up? htop? KSysGuard? Etc..?

If there really is such an issue on threadripper with proton, then shouldn't this thread be left open as well?

ghost commented 5 years ago

Control + X is for starting the boot after editing grub configuration at the bootloader stage. I'll update the comment to expound it a bit more.

Any system monitors really, they would list the number of threads in use, normally, I would see 32 "cpus" when it's really are threads, but with Maxcpus configuration added, it would reduce the total number of threads to specified amount.

And yeah, it is really a big issue in Proton, but I was unsure about keeping it open since it's specific to Threadripper.

Before Configuration:

screenshot from 2019-02-04 09-01-16

After Configuration: screenshot from 2019-02-04 09-07-46

OvermindDL1 commented 5 years ago

Control + X is for starting the boot after editing grub configuration at the bootloader stage. I'll update the comment to expound it a bit more.

Ah, that makes much more sense, I was erroneously conflating a system monitor being accessed via some Ctrl+x menu. ^.^;

And yeah, it is really a big issue in Proton, but I was unsure about keeping it open since it's specific to Threadripper.

Even if it is a single type of hardware, that hardware is unique in its very high core count, so if it happens on it then that indicates an underlying bug in proton and/or wine. Thus...

  1. Can you reproduce it in wine itself?
  2. Does it happen with any native application?

Regardless, I still vote that this should be reopened and triaged by valve. :-)

I'll see if it can be replicated on my wife's computer (her's is the one with a threadripper chip). I know she's been playing (catching up on) a lot of windows games via proton lately and hasn't complained about performance/FPS issues (and she does if she experiences any). But still, I'd definitely say reopen this.

ghost commented 5 years ago

Alright, I've reopened this.

  1. Wine seems to not have this problem, because in a normal circumstances with Proton, it would eat up 75% to 100% of the CPU resources for no reason while Wine itself doesn't have this behavior.

  2. The native application doesn't have this problem, especially Linux-based games.

ghost commented 5 years ago

Programmatically, if they had some forever loop (on purpose or by accident) somewhere that creates or can wind up creating threads and that went crazy, then I suppose that could happen but I would think that it would not be specific to Threadripper.

I'm not sure what else could actually cause that to have so many threads and eating 100%. Threads have to be created to be used and so they should have some meaningful use really, but this sounds like it wasn't meant to be.

ghost commented 5 years ago

I have uploaded the videos to demonstrate what happen on both configurations:

The 32 Threads Configuration (The default boot): https://www.youtube.com/watch?v=X7FdygP35Eg

The 8 Threads Configuration (Added maxcpus commandline option at Bootloader): https://www.youtube.com/watch?v=edWs_wd3rKU

You could tell that the 8 Threads configuration is a lot smoother to play for games. The odd part is that, this problematic behavior does not happen on Wine (the default wine that comes with ArchLinux) or games native to Linux.

kakra commented 5 years ago

Out of curiosity, would you mind checking if using my Proton "patch" works better? It contains some threading / priority changes. It would give me a clue if working more on such patches would have some bigger benefit. I do not own a Threadripper so I have no machine to test this myself...

https://github.com/kakra/wine-proton/blob/rebase/proton_3.16/README.md

BTW: Do you use DXVK with vanilla wine?

ghost commented 5 years ago

BTW: Do you use DXVK with vanilla wine?>

This is a good idea too as Proton VS Wine is a bit different and DXVK also uses the CPU so it probably also creates multiple threads on the CPU (Might be a problem with DXVK and not with Proton).

kakra commented 5 years ago

@byte1024 Yes, DXVK spawns multiple threads for the shader pipeline compiler. There's a report on performance behavior with DXVK, and it gained an option to manually set the number of threads: https://github.com/doitsujin/dxvk/commit/6adf53458994982a47c5bf9429b6dc41582f6758 https://github.com/doitsujin/dxvk/pull/751

qpl23 commented 5 years ago

Hi, just wanted to note that wine's current dlls/ntdll/nt.c has some fixes that are not in Valve's version of that file, which could be relevant.

Specifically, without something like this diff https://bugs.winehq.org/attachment.cgi?id=62712&action=diff&context=patch&collapsed=&headers=1&format=raw Skyrim SE won't make it past the loadscreen on threadripper machines because OS processor info wasn't parsed correctly on high core-count machines. (See various posts in #4 )

So I wondered if it might be worth seeing if this specific fix makes any difference for other threadripper oddities.

Ruedii commented 5 years ago

There are several options to boost CPU usage.

The big concern here is a lot of micro-sleeps in the thread-safe code. The more threads you run the more this occurs. Usage of manual C-State idle instructions might also help if the yield still returns too soon, and the helper threads are completed.

Simply use yield instead and you should see a massive performance boost.

One could even reroute the Win32/Win64 microsleep signal to initially yield then C-State if the yield doesn't create a long enough sleep.

Additionally, for thread-ripper, make sure actions are done in big enough batches. By boosting the batch size you can flush through on CPB standard clock speed mode reducing execution latency. For single threaded tasks you could get CPB Turbo clock speed mode.

The same goes to a lesser degree for Ryzen processors, of course, and high thread count Intel processors.

h1z1 commented 5 years ago

Simply use yield instead and you should see a massive performance boost.

Yield works for timer related issues (like resolution). It can cause other issues though, mainly scheduling latency when there is +any+ kind of CPU contention since you're effectively telling the OS you don't care about the rest of your timeslice. You could end up sleeping for a very long time assuming you aren't otherwise boosted (realtime for example).

Fun fact, many years ago I ran CS servers. Before Valve added pingboost, one of the tweaks was a preload replacing usleep() with yeild. It worked because the time to process a tick was shorter then the overhead of getting scheduled.

tl;dr spinlocks

Regarding clocks, I don't see what those cores are running at but it's quite likely they vary.. a lot. One of the things I've noticed on threadripper at least is a ccx performance... irregularity when moving data between cores with different speeds. Utalization of the core isn't a factor, it's purely the clock speed.

GloriousEggroll commented 5 years ago

Just fyi the current version of proton has the high core count fix so skyrim se/fallout 4 get past the loading screen on 4.2, older versions of proton do not have this patch. (I own a threadripper and have been keeping track of this)