firelzrd / bore-scheduler

BORE (Burst-Oriented Response Enhancer) CPU Scheduler
GNU General Public License v2.0
314 stars 13 forks source link

The Witcher 3 hangs #22

Closed Kron4ek closed 1 year ago

Kron4ek commented 1 year ago

The Witcher 3 (v1.31) running via Wine hangs with BORE scheduler. Sometimes this happens sooner, sometimes later, but usually this takes no longer than 1 minute to happen.

When it hangs it also seems to prevent new processes from working correctly. For example, i tried running free and pgrep after the game hanged, and they also hanged indifinitely until i switched to another TTY and killed the game from there.

Disabling BORE via sysctl kernel.sched_bore=0 fixes the issue.

Here is the video of how it looks like: https://youtu.be/fQBmaU5yG2o

Let me know if need to provide more info or try something.

My system:

CPU Intel Pentium G4620 3.7 GHz Linux 6.4.7 with BORE 3.0.0 (compiled with Clang 15, Full LTO and O3) Wine-Staging 8.13 DXVK 98f3887-git

firelzrd commented 1 year ago

Thank you so much for the report. That hang-up looks to be a serious show stopper. I'll be working on it in a top priority. It's interesting that when the program has hung, some related functionality is also affected, and some other tasks like switching between TTY still works. That may be a hint. To make sure I got you correctly, can I ask you if the program hangs up "forever" or may it eventually come back if you wait?

firelzrd commented 1 year ago

Okay, I installed & played The Witcher 3 on my rig straight through for like 30 minutes without any problem. Maybe there's some precondition (independent from the BORE scheduler itself) to reproduce the problem. Maybe WINE-staging, maybe DXVK, I don't know. I heard recentely Linux 6.4 series suffered from some serious graphics issue. Here's my setup. Please feel free to tell me when you have any ideas to share.

AMD Ryzen 7 4800U with Radeon Graphics Linux 6.3.10 with CachyOS patchset, BORE 3.0.1, GCC 11.4.0 GNOME 3.30 Lutris-GE-Proton8-12 DXVK 1.10.3

Kron4ek commented 1 year ago

To make sure I got you correctly, can I ask you if the program hangs up "forever" or may it eventually come back if you wait?

First it unhanged after like 30 seconds, but after a few seconds freezed again, then i waited for around 5 minutes and it still didn't unfreeze. Switching to another TTY and then back to the first one usually makes the game to come back, but then it hangs again after a few seconds.

My findings so far are:

I'll try to reproduce the issue on kernel 6.3.

firelzrd commented 1 year ago

Interesting... I'll try 6.4.7-based kernel (built with Clang) and see if there's any difference. Thanks for your cooperation.

Follow-up: No hangup was observed with kernel 6.4.7 either. Can I tell me what your graphics card is?

Kron4ek commented 1 year ago

I tried 6.3.13 with BORE 3.0.1 and the issue occurs in this case too, unfortunately. Also tried BORE 2.5.3 and BORE 2.4.2 and the issue persists. And i also experience it on linux 6.1.40 with BORE 2.5.3 (linux-cachyos-lts). I'll try even older BORE versions.

Can I tell me what your graphics card is?

Radeon RX 470.

Kron4ek commented 1 year ago

Ok, so i tested more BORE versions: 1.7.14, 2.0.1, 2.1.1, 2.2.8, they all have this issue, which at least means it's not a regression. Other interesting findings:

The last point is especially weird. It seems like the game process is treated as a realtime non-preemptible process or something, even though it's certainly SCHED_NORMAL. It uses all cpu time and new processes do not get it, at least that's how it looks. And the issue is not reproducible with nice -n -20 stress -c 4.

firelzrd commented 1 year ago

Thank you for testing so many cases.

Radeon RX 470.

It's a Polaris 10 graphics, but since the issue also happens on older kernels, it can't be the 6.4-specific graphics issue which has recently been discussed.

I tried 6.3.13 with BORE 3.0.1 and the issue occurs in this case too, unfortunately. Also tried BORE 2.5.3 and BORE 2.4.2 and the issue persists. And i also experience it on linux 6.1.40 with BORE 2.5.3 (linux-cachyos-lts). I'll try even older BORE versions.

Ok, so i tested more BORE versions: 1.7.14, 2.0.1, 2.1.1, 2.2.8, they all have this issue, which at least means it's not a regression. Other interesting findings:

  • When the game hangs, it maxes out all CPU threads, in my case it uses all 4 threads - 400% CPU. And it continues to do so until i terminate it. In normal conditions it uses only half of that.
  • I found out that the issue is more easy to reproduce when running the game with only 2 cores and higher priority:
    $ nice -n -20 taskset -c 0,1 wine game.exe
  • As i mentioned earlier, the game prevents new processes from working when it hangs, usually only within the same TTY, but sometimes even switching TTYs breaks. But when it's limited to only 2 or 3 threads, new processes do work fine.

The last point is especially weird. It seems like the game process is treated as a realtime non-preemptible process or something, even though it's certainly SCHED_NORMAL. It uses all cpu time and new processes do not get it, at least that's how it looks. And the issue is not reproducible with nice -n -20 stress -c 4.

From my past experiences, such "prevents of executing new process" is usually observed when related to kernel threads blocking other processes' resource access like I/O. For example:

Your detailed analysis gives me an interesting insight to the problem. Regarding those facts, I'll play around it. to hopefully find something.

Kron4ek commented 1 year ago

Good news, i managed to reproduce the issue without the game. Running sched_yield with stress-ng prevents new processes from working when BORE is enabled, but when it's disabled this issue does not occur.

$ stress-ng -y 4

I'm not exactly sure, but i think The Witcher 3 is also doing sched_yield before it freezes and during the freeze. To max out CPU threads and do yeilding:

$ stress-ng -c 4 -y 4
firelzrd commented 1 year ago

That's nice. How about: $ sudo sysctl -w kernel.sched_burst_smoothness_down=3

Will it still freeze?

Kron4ek commented 1 year ago

Yes, still freezes.

firelzrd commented 1 year ago

Okay, that's a good hint. I've got an idea. Let me come back with an experimental patch later. Since I got a business meeting from now, it should take an hour or two maybe, 'til the patch arrives. Thank you for the support. You're really helping.

firelzrd commented 1 year ago

Fixed. (v3.1.0) Please try it and let me know what you think.

Kron4ek commented 1 year ago

It is fixed indeed, i can't reproduce the issue on 3.1.0, both with the game and with stress-ng. Thank you.

firelzrd commented 1 year ago

Thank YOU very much for all the devoted cooperation :)