felixdoerre / primus_vk

Vulkan GPU-offloading layer
BSD 2-Clause "Simplified" License
230 stars 17 forks source link

Freeze with DXVK 1.2+ #43

Closed Ann1kaB closed 5 years ago

Ann1kaB commented 5 years ago

When I try to play any game with DXVK 1.2+ the game will freeze after a while and in the log for lutris I get this: PrimusVK: Creating image: 1280x720 PrimusVK: Creating image: 1280x720 PrimusVK: Creating image: 1280x720 PrimusVK: Creating image: 1280x720 PrimusVK: Creating image: 1280x720 PrimusVK: Creating image: 1280x720 PrimusVK: Creating image: 1280x720 PrimusVK: Creating a Swapchain thread. PrimusVK: Get Swapchain Images buffer: 0x1390cf8 PrimusVK: Count: 4 Error 2 in 350 Error 2 in 350 Error 2 in 350 and thats when it freezes. The "Error 2 in 350" is what happens a few seconds after the game freezes. I tried all 1.2 versions of DXVK and all freeze at a random point, there is no consistency that I can see. Yet DXVK 1.0.3 and under work totally fine, at lower FPS however compared to newer versions. GPU1: Nvidia GeForce GTX 1050 Driver: 418.52.10 & 430.26 GPU2: Intel HD Graphics 630 (Kaby Lake GT2) Driver: 19.1.0 Primus_vk version:primus-vk-git r88.e3ca959-1

felixdoerre commented 5 years ago

So that's a VK_TIMEOUT while waiting for a fence (after 10 seconds). On e3ca959 there are two relevant places where we wait for a fence regularly:

In fact in commit e3ca959 it was changed that the fence in line 1068 is reset after waiting. Did the same problem also occur with an older version of primus_vk?

I assume the output Creating image: 1280x720 is printed already way before the freeze.

Ann1kaB commented 5 years ago

In fact in commit e3ca959 it was changed that the fence in line 1068 is reset after waiting. Did the same problem also occur with an older version of primus_vk?

yeah it did its part of the reason I tried primus-vk-git and then edited the AUR PKGBUILD to the test_membarrier.

I assume the output Creating image: 1280x720 is printed already way before the freeze.

yes it is, I wanted to copy the entire log but thought that would be enough and just so there isn't any confusion.

Ann1kaB commented 5 years ago

oh I also have this in dmesg:NVRM: GPU at PCI:0000:01:00: GPU-5db9e2de-9789-0e7e-f066-040d9568c8dd [ +0.000003] NVRM: Xid (PCI:0000:01:00): 31, Ch 00000013, intr 10000000. MMU Fault: ENGINE HOST0 HUBCLIENT_HOST_CPU faulted @ 0x4_00000000. Fault is of type FAULT_PDE ACCESS_TYPE_READ when I attempt to play Far Cry 3. Dirt 4 loads menu but when I start a game it crashes. Doom 2016 crashes.

Matze1224 commented 5 years ago

Hey, also having the same problem. Game starts freezing at a random point. In BeamNG.drive, the error message "Error 2 in 350" also appears after the freeze. This also happens to other games which requires Proton and DirectX (using DXVK). It works with older Proton versions which uses DXVK 1.0.

Log of the game is here, but the start is cuted through I pasted it from the terminal I started the game. I had to stop the game process to stop freezing.

felixdoerre commented 5 years ago

That log could provide a useful hint: it seems that there are two Vulkan instances created at the same time. Also this issue happening sporadically indicates that this issue is probably timing-related. Can any of you try to provide a thread dump in the situation when the fence blocks (and times out)? I still haven't found time to reproduce this issue and debug it on my own.

Matze1224 commented 5 years ago

Hey, I currently didn't found any option to dump the thread. Instead, I launched it with the command VK_LOADER_DEBUG=all pvkrun ./run 2>&1 which got more verbosive. Log is here.

I tried the dump with pstack but the dump doesn't worked (I think it's something with wine). Also, gdb doesn't worked too cause I didn't know how to break at the crash. I will try to get more informations why this crashes as much I can do.

Matze1224 commented 5 years ago

I also noticed that following line appeared in dmesg over and over again when the game crashes:

[ 5945.212706] perf: interrupt took too long (2501 > 2500), lowering kernel.perf_event_max_sample_rate to 79750****

This message means the cpu is busy what fits to the crashes (but not 100 percent sure).

Matze1224 commented 5 years ago

I now did some dumps of two games with procdump for linux. You can browse the coredump with gdb or similar. I tried to install as much symbols as possible.

Now there are two games in the tar archive (yea, the near one gigabyte tar archive is 18 gigabytes big). For both games, there is an folder with the game log, overview of the processes (to identify the dumps) and the dumps itself. I took the dump of all processes that look like the current game (BeamNG.drive has multiple processes) cause I'm not sure which process is relevant.

For now, I saw following changes in the behavior of the error:

  1. In BeamNG.drive, there is no line "Error 2 in 350" any more. Instead of freezes, there is a blackscreen now.
  2. Project Cars 2 (the second game) has an other behavior, it does not print this error line but freezes at a random point.

Have fun with the dumps, hope it helps. Good luck.

felixdoerre commented 5 years ago

I've reproduced the problem with "The Witcher 3". I tested DXVK v1.1.1 which seems to work and v1.2 which is broken. I've run git bisect to determine the breaking commit and found, that it breaks with https://github.com/doitsujin/dxvk/commit/b35f3c14df2e8abc9fae60f9ce925a063d9fa40b . I also reverted that change on v1.2 and the problem seems to be solved (i.e. The Witcher seems to work stable with that modified dxvk 1.2)

felixdoerre commented 5 years ago

After hours of testing various ideas of what could be wrong, I believe I found the problem: primus_vk's AcquireNextImageKHR function invokes QueueSubmit. However the application (i.e. dxvk) is only responsible for synchronizing QueuePresent and QueueSubmit calls, so the layer allows this QueueSubmit to happen concurrently. I added an additional mutex to prevent those parallel calls. Can you test eb4b9d1 and see if that also solves the problem for you?

Matze1224 commented 5 years ago

Have tested it and works fine with the games I tested. No freezes, there are playable now! Thank you for this patch.

Ann1kaB commented 5 years ago

this patch seems to solve the problem, thank you so much! @felixdoerre

ellogwen commented 5 years ago

Oh wow! I was reporting an issue to the DXVK project that games stopped working with DXVK 1.2.1 and together we assumed it was a hardware issue on my end and I jumped back to DXVK 1.0.x! I will try the new version of primus_vk together with DXVK 1.3.1 tonight.. Thank you :) Edit: Yap, is working fine with primus_vk 1.1 and DXVK 1.3.1 - no freezes anymore, but no performance gain either ^^