doitsujin / dxvk

Vulkan-based implementation of D3D8, 9, 10 and 11 for Linux / Wine
zlib License
13.18k stars 849 forks source link

All my Steam Proton games that use DXVK eventually freeze and crash, Nvidia rep out of ideas #2365

Closed fatalerror100500 closed 2 years ago

fatalerror100500 commented 2 years ago

Hi,

I’m experiencing freezes at games that use DXVK (I only play through Steam Proton), the game just freezes for few minutes and then crashes. This happened in all games that I’ve played:

Yakuza Kiwami
Yakuza 0
Mass Effect Legendary Edition
Days gone
Foxhole
Rocket League
Overcooked! All you can eat
Deep Rock Galactic

These crashes happen at random times, in some games the crash may happen after 2-3 hours of playing (for example in Mass Effect) and in some games it happens very often, maybe once in 15 minutes (Deep Rock Galactic). I’ve tried checking RAM, VRAM, Video card temperature, but everything seems to be fine. Steam Proton logs show some errors such as VK_ERROR_DEVICE_LOST, but I still have no idea what the issue is that’s causing these crashes. Right now I’m using Ubuntu 20.04. Last year I’ve been playing Deep Rock Galactic on ultra settings on Windows 10, and never had any crashes of the game. I’ve been able to play a native Linux game, Europa Universalis IV, without any freezes/crashes for 7 hours. I was never able to play a proton game without it freezing after 3-4 hours.

I’ve tried installing both older and newer drivers (nvidia-driver-460; 495; 390) and the issue stays. Right now I’m using nvidia-driver-470. I’ve tried using different Proton versions as well (Proton Experimental; 6.3-7; 5.13-6; Glorious Eggroll 6.16-GE1)

"When I run uname -ar, I get: Linux rus-N95TP6 5.11.0-38-generic #42~20.04.1-Ubuntu SMP Tue Sep 28 20:41:07 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

My dGPU is Nvidia GeForce GTX 1060 Mobile; My iGPU is Intel(R) UHD Graphics 630 (CFL GT2)

My CPU is Intel(R) Core™ i7-8700 CPU @ 3.20GHz

Below I’ve attached the bug report generated by nvidia-bug-report.sh around 2 minutes after Deep Rock Galactic froze, because the crashes happen there more often than in the other games. I’ve also attached the Proton log if it might help. It’s probably relevant that I’m running a laptop with an integrated Intel iGPU (hence PRIME is involved, though the Steam proton log seems to show that the video device used is in fact the Nvidia dGPU)

steam-548430.log.gz

nvidia-bug-report.log.gz

I’ve tried to run Deep Rock Galactic with the kernel 5.4 (Linux rus-N95TP6 5.4.0-89-generic #100-Ubuntu SMP Fri Sep 24 14:50:10 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux) and the same freeze happened. Log files below:

nvidia-bug-report.log.gz

steam-548430.log.gz

In order to test the overheating theory I ran Deep Rock Galactic at a capped framerate of 40fps in order to keep my GPU at low temps. If this were a thermal issue then the game wouldn’t freeze.

Unfortunately, it froze. I checked the temperature logs: it peaked at 78C, i.e. well below dangerous temps.

This was while running kernel 5.4.

Logs:

nvidia-bug-report.log.gz

steam-548430.log.gz

2021-11-09-temp-logs-2.txt.gz The freeze happened around 19:24:01 according to the temperature log.

I was able to run a non-native linux game through Steam Proton, Hades, in Vulkan mode (the game supports running either in DirectX mode or Vulkan mode. I assume that running it in DirectX mode makes Steam Proton employ DXVK, while running in Vulkan mode doesn’t require DXVK, but that’s just an assumption I’m making, check the attached proton logs to confirm) Without any freezes for over 16 hours, I left it overnight and just closed the game in the end, it was obvious it won’t freeze.

Later on, for testing and differential diagnosis purposes, I’ve run Hades in DirectX mode. To my expectations, it froze/crash after 1 hour.

This leads me to think this issue is somehow related to DXVK.

Proton logs for both runs attached:

Hades (Vulkan).log.gz

Hades(DirectX).log.gz

nvidia-bug-report(Hades).log.gz

2021-11-09-temp-logs-Hades.txt.gz

The freeze happened around 00:38:33 according to the temperature log time. The peak temperature was 59 C.

I’ll also attach the output of the command ‘VULKAN_DEVICE_INDEX=1 vulkaninfo’:

VulkanDeviceIndex.txt.gz


I've reported my issue on the Nvidia Forums, here's the link

The last reply from the Nvidia rep was:

I’m really out of ideas, better reach out to the dxvk issue tracker.


After that I've tried to prepare logs with Apitrace as required in the bug report guide. Unfortunately, I've spent 2 days trying to reproduce same freezes with Apitrace but they didn't occur even once. I've tried running "Deep Rock Galactic"; "Foxhole"; "Hades" and "Overcooked! All you can eat", but the freeze never happened. I could only reproduce the freeze with d3d11 log and dxgi log in the game "Overcooked! All you can eat" without the Apitrace, I'll attach the logs below. Also, because the Proton log file was huge (8 GB), and the log repeated many times, I've taken the first 10000 lines of the log and the last 10000 lines of the log. I'll attach the Steam Proton log as well. As soon as I don't use apitrace the freezes return. Somehow apitrace manages to prevent freezes!

Why do these freezes happen and how can I fix this problem?


Software information

Overcooked! All you can eat

System information

Log files

Blisto91 commented 2 years ago

Hmmm it is indeed very strange. It could almost sound like it is related to #1318

buzzcut-s commented 2 years ago

Oh.

I also noticed following spam in every Xorg log in all of your nvidia-bug-reports.

(EE) modeset(0): present flip failed
(WW) modeset(0): flip queue failed: Invalid argument
(WW) modeset(0): Page flip failed: Invalid argument

I (and many others) have had the same issue on laptop PRIME setups (I'm on a 9750H + 1660Ti) for almost a year now. The original Xorg issue is even older (reported 4 years ago) See here : https://gitlab.freedesktop.org/xorg/xserver/-/issues/24

Seems like it's now more common on PRIME setups since Intel introduced Asynchronous flip support in their graphics driver in 5.11 (Kernel patches and the 5.11 PR)

The only "fix" I've found so far is to switch to an Nvidia only Xorg session when I need to offload anything intensive to my Nvidia GPU. I use optimus-manager on Arch to switch but that requires a logout every time I want to play something. And sometimes I forget to switch before starting a game. yay freeze.

On hybrid/on-demand/prime Xorg freezes and the only way to recover is to either kill my session or reboot. And this happens without fail, every single time.

This has pretty much single-handedly ruined all seamless gaming experience for me :D

Also, as far as I can tell, there's not any userspace toggle to disable async flips either in Xorg or in the i915 driver (although disabling it in the kernel and recompiling is probably an option).

Some more links on the same bug : https://forums.developer.nvidia.com/t/playing-games-with-nvidia-card-kills-x-server/170404 https://gitlab.freedesktop.org/xorg/driver/xf86-video-intel/-/issues/208 https://bbs.archlinux.org/viewtopic.php?id=267296

The Xorg devs rate-limited these errors on the newest version, but that doesn't really help anyone since the session is already crashed by the time we hit these errors.

See : https://gitlab.freedesktop.org/xorg/xserver/-/issues/1164 https://gitlab.freedesktop.org/xorg/xserver/-/merge_requests/658

My Xorg logs now have the following entries when I crash

[  1351.512] (WW) modeset(0): Present-flip: queue async flip during flip on CRTC 0 failed: Invalid argument
[  1351.519] (WW) modeset(0): Present-flip: detected too frequent flip errors, disabling logs until frequency is reduced
[  1370.074] (WW) modeset(0): flip queue retry

I also don't think most (or any) of this applies to DXVK specifically because Vulkan games also freeze my session. So Hades on both Vulkan and DX11/DXVK freezes for me.

Also note that xorg-server built from upstream master branch doesn't fix this for me either. And I'm on 5.15 kernel so that also doesn't fix anything.

Further, if I turn on DRM debugging in the kernel - these async page flip failures on Xorg also generate the following errors from the i915 driver

dmesg ``` Nov 18 23:53:17 buzzcut kernel: i915 0000:00:02.0: [drm:intel_plane_atomic_calc_changes [i915]] [CRTC:51:pipe A] with [PLANE:31:plane 1A] visible 1 -> 1, off 0, on 0, ms 0 Nov 18 23:53:17 buzzcut kernel: i915 0000:00:02.0: [drm:intel_atomic_get_global_obj_state [i915]] Added new global object 00000000f8f4a4bc state 000000001663e063 to 000000000efa3bcd Nov 18 23:53:17 buzzcut kernel: i915 0000:00:02.0: [drm:intel_atomic_get_global_obj_state [i915]] Added new global object 00000000e2c7ce2e state 00000000681e0b2c to 000000000efa3bcd Nov 18 23:53:17 buzzcut kernel: i915 0000:00:02.0: [drm:skl_compute_wm [i915]] [PLANE:31:plane 1A] level *wm0,*wm1,*wm2,*wm3,*wm4,*wm5,*wm6,*wm7, twm, swm, stwm -> *wm0,*wm1,*wm2,*wm3,*wm4,*wm5,*wm6,*wm7, twm, swm, stwm Nov 18 23:53:17 buzzcut kernel: i915 0000:00:02.0: [drm:skl_compute_wm [i915]] [PLANE:31:plane 1A] lines 0, 7, 8, 9, 14, 16, 17, 20, 0, 0, 0 -> 0, 4, 6, 6, 12, 14, 15, 17, 0, 0, 0 Nov 18 23:53:17 buzzcut kernel: i915 0000:00:02.0: [drm:skl_compute_wm [i915]] [PLANE:31:plane 1A] blocks 58, 104, 122, 137, 212, 242, 257, 302, 0, 0, 0 -> 18, 64, 88, 98, 182, 219, 235, 274, 0, 0, 0 Nov 18 23:53:17 buzzcut kernel: i915 0000:00:02.0: [drm:skl_compute_wm [i915]] [PLANE:31:plane 1A] min_ddb 59, 105, 123, 138, 213, 243, 258, 303, 0, 0, 0 -> 19, 65, 89, 99, 183, 220, 236, 275, 0, 0, 0 Nov 18 23:53:17 buzzcut kernel: i915 0000:00:02.0: [drm:intel_atomic_check [i915]] Linear memory/CCS does not support async flips Nov 18 23:53:17 buzzcut kernel: i915 0000:00:02.0: [drm:intel_dump_pipe_config [i915]] [CRTC:51:pipe A] enable: yes [failed] Nov 18 23:53:17 buzzcut kernel: i915 0000:00:02.0: [drm:intel_dump_pipe_config [i915]] active: yes, output_types: EDP (0x100), output format: RGB Nov 18 23:53:17 buzzcut kernel: i915 0000:00:02.0: [drm:intel_dump_pipe_config [i915]] cpu_transcoder: EDP, pipe bpp: 24, dithering: 0 Nov 18 23:53:17 buzzcut kernel: i915 0000:00:02.0: [drm:intel_dump_pipe_config [i915]] MST master transcoder: Nov 18 23:53:17 buzzcut kernel: i915 0000:00:02.0: [drm:intel_dump_pipe_config [i915]] port sync: master transcoder: , slave transcoder bitmask = 0x0 Nov 18 23:53:17 buzzcut kernel: i915 0000:00:02.0: [drm:intel_dump_pipe_config [i915]] bigjoiner: no Nov 18 23:53:17 buzzcut kernel: i915 0000:00:02.0: [drm:intel_dump_pipe_config [i915]] splitter: disabled, link count 0, overlap 0 Nov 18 23:53:17 buzzcut kernel: i915 0000:00:02.0: [drm:intel_dump_pipe_config [i915]] dp m_n: lanes: 4; gmch_m: 3985171, gmch_n: 8388608, link_m: 332097, link_n: 524288, tu: 64 Nov 18 23:53:17 buzzcut kernel: i915 0000:00:02.0: [drm:intel_dump_pipe_config [i915]] audio: 0, infoframes: 0, infoframes enabled: 0x0 Nov 18 23:53:17 buzzcut kernel: i915 0000:00:02.0: [drm:intel_dump_pipe_config [i915]] vrr: no, vmin: 0, vmax: 0, pipeline full: 0, guardband: 0 flipline: 0, vmin vblank: -1, vmax vblank: -2 Nov 18 23:53:17 buzzcut kernel: i915 0000:00:02.0: [drm:intel_dump_pipe_config [i915]] requested mode: Nov 18 23:53:17 buzzcut kernel: [drm:drm_mode_debug_printmodeline] Modeline "": 144 342050 1920 2028 2076 2080 1080 1090 1100 1142 0x0 0x9 Nov 18 23:53:17 buzzcut kernel: i915 0000:00:02.0: [drm:intel_dump_pipe_config [i915]] adjusted mode: Nov 18 23:53:17 buzzcut kernel: [drm:drm_mode_debug_printmodeline] Modeline "1920x1080": 144 342050 1920 2028 2076 2080 1080 1090 1100 1142 0x48 0x9 Nov 18 23:53:17 buzzcut kernel: i915 0000:00:02.0: [drm:intel_dump_crtc_timings [i915]] crtc timings: 342050 1920 2028 2076 2080 1080 1090 1100 1142, type: 0x48 flags: 0x9 Nov 18 23:53:17 buzzcut kernel: i915 0000:00:02.0: [drm:intel_dump_pipe_config [i915]] pipe mode: Nov 18 23:53:17 buzzcut kernel: [drm:drm_mode_debug_printmodeline] Modeline "1920x1080": 144 342050 1920 2028 2076 2080 1080 1090 1100 1142 0x40 0x9 Nov 18 23:53:17 buzzcut kernel: i915 0000:00:02.0: [drm:intel_dump_crtc_timings [i915]] crtc timings: 342050 1920 2028 2076 2080 1080 1090 1100 1142, type: 0x40 flags: 0x9 Nov 18 23:53:17 buzzcut kernel: i915 0000:00:02.0: [drm:intel_dump_pipe_config [i915]] port clock: 540000, pipe src size: 1920x1080, pixel rate 342050 Nov 18 23:53:17 buzzcut kernel: i915 0000:00:02.0: [drm:intel_dump_pipe_config [i915]] linetime: 49, ips linetime: 0 Nov 18 23:53:17 buzzcut kernel: i915 0000:00:02.0: [drm:intel_dump_pipe_config [i915]] num_scalers: 2, scaler_users: 0x0, scaler_id: -1 Nov 18 23:53:17 buzzcut kernel: i915 0000:00:02.0: [drm:intel_dump_pipe_config [i915]] pch pfit: 0x0+0+0, disabled, force thru: no Nov 18 23:53:17 buzzcut kernel: i915 0000:00:02.0: [drm:intel_dump_pipe_config [i915]] ips: 0, double wide: 0 Nov 18 23:53:17 buzzcut kernel: i915 0000:00:02.0: [drm:intel_dump_pipe_config [i915]] dpll_hw_state: ctrl1: 0x1, cfgcr1: 0x0, cfgcr2: 0x0 Nov 18 23:53:17 buzzcut kernel: i915 0000:00:02.0: [drm:intel_dump_pipe_config [i915]] csc_mode: 0x2 gamma_mode: 0x1 gamma_enable: 1 csc_enable: 0 Nov 18 23:53:17 buzzcut kernel: i915 0000:00:02.0: [drm:intel_dump_pipe_config [i915]] degamma lut: 0 entries, gamma lut: 1024 entries Nov 18 23:53:17 buzzcut kernel: i915 0000:00:02.0: [drm:intel_dump_pipe_config [i915]] [PLANE:31:plane 1A] fb: [FB:102] 1920x1080 format = XR24 little-endian (0x34325258) modifier = 0x0, visible: yes Nov 18 23:53:17 buzzcut kernel: i915 0000:00:02.0: [drm:intel_dump_pipe_config [i915]] rotation: 0x1, scaler: -1 Nov 18 23:53:17 buzzcut kernel: i915 0000:00:02.0: [drm:intel_dump_pipe_config [i915]] src: 1920.000000x1080.000000+0.000000+0.000000 dst: 1920x1080+0+0 Nov 18 23:53:17 buzzcut kernel: [drm:drm_atomic_check_only] atomic driver check for 000000000efa3bcd failed: -22 Nov 18 23:53:17 buzzcut kernel: [drm:drm_atomic_state_default_clear] Clearing atomic state 000000000efa3bcd Nov 18 23:53:17 buzzcut kernel: [drm:drm_mode_object_put.part.0] OBJ ID: 108 (2) Nov 18 23:53:17 buzzcut kernel: [drm:drm_mode_object_put.part.0] OBJ ID: 106 (4) Nov 18 23:53:17 buzzcut kernel: [drm:drm_mode_object_put.part.0] OBJ ID: 106 (3) Nov 18 23:53:17 buzzcut kernel: [drm:drm_mode_object_put.part.0] OBJ ID: 102 (4) Nov 18 23:53:17 buzzcut kernel: [drm:drm_mode_object_put.part.0] OBJ ID: 102 (3) Nov 18 23:53:17 buzzcut kernel: [drm:__drm_atomic_state_free] Freeing atomic state 000000000efa3bcd Nov 18 23:53:17 buzzcut kernel: [drm:drm_mode_object_put.part.0] OBJ ID: 102 (2) Nov 18 23:53:17 buzzcut kernel: [drm:drm_ioctl] comm="Xorg", pid=1063, ret=-22 Nov 18 23:53:17 buzzcut kernel: [drm:drm_ioctl] comm="Xorg" pid=1063, dev=0xe200, auth=1, DRM_IOCTL_MODE_RMFB Nov 18 23:53:17 buzzcut kernel: [drm:drm_mode_object_put.part.0] OBJ ID: 102 (2) Nov 18 23:53:17 buzzcut kernel: [drm:drm_mode_object_put.part.0] OBJ ID: 102 (1) Nov 18 23:53:17 buzzcut kernel: [drm:drm_ioctl] comm="Xorg" pid=1063, dev=0xe200, auth=1, DRM_IOCTL_SYNCOBJ_DESTROY Nov 18 23:53:17 buzzcut kernel: [drm:drm_ioctl] comm="Xorg" pid=1063, dev=0xe200, auth=1, I915_GEM_EXECBUFFER2_WR Nov 18 23:53:17 buzzcut kernel: [drm:drm_ioctl] comm="Xorg" pid=1063, dev=0xe200, auth=1, I915_GEM_MADVISE Nov 18 23:53:17 buzzcut kernel: [drm:drm_ioctl] comm="Xorg" pid=1063, dev=0xe200, auth=1, I915_GEM_BUSY Nov 18 23:53:17 buzzcut kernel: [drm:drm_ioctl] comm="Xorg" pid=1063, dev=0xe200, auth=1, I915_GEM_MADVISE Nov 18 23:53:17 buzzcut kernel: [drm:drm_ioctl] comm="Xorg" pid=1063, dev=0xe200, auth=1, DRM_IOCTL_SYNCOBJ_CREATE Nov 18 23:53:17 buzzcut kernel: [drm:drm_ioctl] comm="Xorg" pid=1063, dev=0xe200, auth=1, DRM_IOCTL_MODE_DIRTYFB Nov 18 23:53:17 buzzcut kernel: [drm:drm_mode_object_put.part.0] OBJ ID: 105 (4) Nov 18 23:53:17 buzzcut kernel: i915 0000:00:02.0: [drm:drm_crtc_vblank_helper_get_vblank_timestamp_internal] crtc 0 : v p(0,-57)@ 3689.236729 -> 3689.237075 [e 0 us, 0 rep] Nov 18 23:53:17 buzzcut kernel: i915 0000:00:02.0: [drm:drm_update_vblank_count] updating vblank count on crtc 0: current=531121, diff=1, hw=532301 hw_last=532300 Nov 18 23:53:17 buzzcut kernel: i915 0000:00:02.0: [drm:vblank_disable_fn] disabling vblank on crtc 0 Nov 18 23:53:17 buzzcut kernel: i915 0000:00:02.0: [drm:drm_crtc_vblank_helper_get_vblank_timestamp_internal] crtc 0 : v p(0,-55)@ 3689.236742 -> 3689.237077 [e 0 us, 0 rep] Nov 18 23:53:17 buzzcut kernel: i915 0000:00:02.0: [drm:drm_update_vblank_count] updating vblank count on crtc 0: current=531122, diff=0, hw=532301 hw_last=532301 Nov 18 23:53:17 buzzcut kernel: [drm:drm_ioctl] comm="Xorg" pid=1063, dev=0xe200, auth=1, DRM_IOCTL_CRTC_GET_SEQUENCE Nov 18 23:53:17 buzzcut kernel: i915 0000:00:02.0: [drm:drm_crtc_vblank_helper_get_vblank_timestamp_internal] crtc 0 : v p(0,100)@ 3689.244632 -> 3689.244024 [e 0 us, 0 rep] Nov 18 23:53:17 buzzcut kernel: i915 0000:00:02.0: [drm:drm_crtc_vblank_restore] missed 1 vblanks in 6948758 ns, frame duration=6944502 ns, hw_diff=1 Nov 18 23:53:17 buzzcut kernel: i915 0000:00:02.0: [drm:drm_vblank_enable] enabling vblank on crtc 0, ret: 0 Nov 18 23:53:17 buzzcut kernel: i915 0000:00:02.0: [drm:drm_crtc_vblank_helper_get_vblank_timestamp_internal] crtc 0 : v p(0,103)@ 3689.244645 -> 3689.244019 [e 0 us, 0 rep] Nov 18 23:53:17 buzzcut kernel: i915 0000:00:02.0: [drm:drm_update_vblank_count] updating vblank count on crtc 0: current=531122, diff=1, hw=532302 hw_last=532301 Nov 18 23:53:17 buzzcut kernel: [drm:drm_ioctl] comm="Xorg" pid=1063, dev=0xe200, auth=1, I915_GEM_SET_TILING Nov 18 23:53:17 buzzcut kernel: [drm:drm_ioctl] comm="Xorg" pid=1063, dev=0xe200, auth=1, DRM_IOCTL_CRTC_GET_SEQUENCE Nov 18 23:53:17 buzzcut kernel: [drm:drm_ioctl] comm="Xorg" pid=1063, dev=0xe200, auth=1, I915_GEM_SET_TILING Nov 18 23:53:17 buzzcut kernel: [drm:drm_ioctl] comm="Xorg" pid=1063, dev=0xe200, auth=1, I915_GEM_SET_TILING Nov 18 23:53:17 buzzcut kernel: [drm:drm_ioctl] comm="Xorg" pid=1063, dev=0xe200, auth=1, I915_GEM_SET_TILING Nov 18 23:53:17 buzzcut kernel: [drm:drm_ioctl] comm="Xorg" pid=1063, dev=0xe200, auth=1, DRM_IOCTL_MODE_ADDFB2 Nov 18 23:53:17 buzzcut kernel: [drm:drm_mode_addfb2] [FB:102] Nov 18 23:53:17 buzzcut kernel: [drm:drm_ioctl] comm="Xorg" pid=1063, dev=0xe200, auth=1, DRM_IOCTL_MODE_PAGE_FLIP Nov 18 23:53:17 buzzcut kernel: [drm:drm_atomic_state_init] Allocated atomic state 00000000c037c933 Nov 18 23:53:17 buzzcut kernel: [drm:drm_mode_object_get] OBJ ID: 108 (1) Nov 18 23:53:17 buzzcut kernel: [drm:drm_mode_object_get] OBJ ID: 106 (2) Nov 18 23:53:17 buzzcut kernel: [drm:drm_mode_object_get] OBJ ID: 106 (3) Nov 18 23:53:17 buzzcut kernel: [drm:drm_atomic_get_crtc_state] Added [CRTC:51:pipe A] 00000000fe9f55fe state to 00000000c037c933 Nov 18 23:53:17 buzzcut kernel: [drm:drm_mode_object_get] OBJ ID: 105 (3) Nov 18 23:53:17 buzzcut kernel: [drm:drm_mode_object_get] OBJ ID: 105 (4) Nov 18 23:53:17 buzzcut kernel: [drm:drm_atomic_get_plane_state] Added [PLANE:31:plane 1A] 00000000ff0f0a4e state to 00000000c037c933 Nov 18 23:53:17 buzzcut kernel: i915 0000:00:02.0: [drm:drm_atomic_set_fb_for_plane] Set [FB:102] for [PLANE:31:plane 1A] state 00000000ff0f0a4e Nov 18 23:53:17 buzzcut kernel: [drm:drm_mode_object_get] OBJ ID: 102 (2) Nov 18 23:53:17 buzzcut kernel: [drm:drm_mode_object_put.part.0] OBJ ID: 105 (5) Nov 18 23:53:17 buzzcut kernel: [drm:drm_atomic_check_only] checking 00000000c037c933 Nov 18 23:53:17 buzzcut kernel: [drm:drm_mode_object_put.part.0] OBJ ID: 105 (4) Nov 18 23:53:17 buzzcut kernel: [drm:drm_mode_object_get] OBJ ID: 102 (3) ```

The important bit being the following check failure

kernel: i915 0000:00:02.0: [drm:intel_atomic_check [i915]] Linear memory/CCS does not support async flips

This was mentioned in the xorg-sever issue (this) where the dev mentioned :

The root cause for failing flips seems to be that the hardware itself can't do async page flips with the modifiers that are being used on the particular CRTC. See https://github.com/torvalds/linux/blob/0f4498cef9f5cd18d7c6639a2a902ec1edc5be4e/drivers/gpu/drm/i915/display/intel_display.c#L12374.

Kernel prints the following error if drm debugging is enabled: [ 827.411096] i915 0000:00:02.0: [drm:intel_atomic_check [i915]] Linear memory/CCS does not support async flips

It seems that there's not much we can do because there's no way to get the information on whether async flip will succeed on a particular CRTC, nor we can pass it to the applications via PresentQueryCapabilities request of the present extension (the capability may change dynamically if the modifiers change).

So yeah.

Thinkaboutmin commented 2 years ago

Hmm, same thing seems to happen to me and some other people when playing Bioshock Infinite with Proton using the latest DXVK.

Proton Issue: https://github.com/ValveSoftware/Proton/issues/2700#issuecomment-960815289 ProtonDB: https://www.protondb.com/app/8870

I did not go as far as testing other versions of DXVK or even others drivers versions but I can at least confirm that the freezing is happening after some time (just like it was said, after 15 minutes or so) and that is not related to FSYNC, ESYNC or any other game or wine configuration so far.

When looking at the logs, the same message of the VK_ERROR_DEVICE_LOST, probably pointing for a Nvidia driver error...

When doing a sudo dmesg exactly after the game crashes I at least get this error message:

NVRM: Xid (PCI:0000:03:00): 31, pid=104353, Ch 00000045, intr 10000000. MMU Fault: ENGINE CE3 HUBCLIENT_CE2 faulted @ 0x100_fafcf000. Fault is of type FAULT_PDE ACCESS_TYPE_WRITE

Can't say if it's useful, but it's at least something else :stuck_out_tongue_closed_eyes:

Oh yeah, I did not test with any other games so far, only Bioshock Infinite...

Relevant Specs:

GPU: GTX 1050ti Driver: 495.44 Kernel: 5.15.6-xanmod2-2

If there's a need for an APITrace, please say so. But I can only trace on Linux because I don't have a Windows machine for such a thing ATM...

fatalerror100500 commented 2 years ago

@doitsujin I've finally been able to get the crash to occur while running apitrace. I understand that you've ignored this issue due to a lack of apitrace logs, and I apologize for not providing them from the start, it's just that I couldn't get the crash to occur with apitrace running.

I've been spending hours every day playing Deep Rock Galactic (and other DXVK-utilizing games too) with apitrace running with a terrible framerate for a week and couldn't get the crash to occur, despite the crash occurring within hours if I played without apitrace. Finally, I got a bit tired of this and took a break.

Now I've been doing it again for the past week and today, after countless hours of grueling low-fps gameplay, finally, the crash occurred while running Deep Rock Galactic with apitrace!

I now have the log trifecta! Proton logs, DXVK logs, and apitrace!

Deep Rock Galactic Crash Proton logs: proton-logs-steam-548430.zip Deep Rock Galactic Crash DXVK (DXGI) logs: dxvk-dxgi-logs.zip Deep Rock Galactic Crash apitrace: https://drive.google.com/file/d/19uCz8ZilNWWikfbz_aJ2gkBtydcOEMCp/view?usp=sharing

Proton: 1639067376 experimental-6.3-20211209 Kernel: Linux 5.4.0-91-generic #102-Ubuntu SMP Fri Nov 5 16:31:28 UTC 2021 x86_64 GPU: NVIDIA GeForce GTX 1060 GPU Driver: v495.44 CPU: Intel® Core™ i7-8700 CPU @ 3.20GHz × 12 OS: Ubuntu 20.04.3 LTS I'm on a laptop, so PRIME may be involved, though I've set PRIME to Performance Mode (i.e. always use dGPU).

Just a reminder, my differential diagnosis leads me to believe that this is somehow tied to DXVK, because when I run Hades through Proton in Vulkan mode (which doesn't utilize DXVK, I assume) the crashes don't occur. When I run Hades through Proton in DirectX mode (which utilizes DXVK, I assume), the crashes do occur.

These crashes also never occur on Windows, hence, I think this excludes a DirectX-specific issue.

Monsterovich commented 2 years ago

@fatalerror100500 Hello. How much RAM do you have? Could you please post the output of dmidecode. This may be a related issues.

https://github.com/doitsujin/dxvk/issues/2401#issuecomment-998283436

fatalerror100500 commented 2 years ago

@Monsterovich Hi. I have 32GB of RAM. Here's the output of dmidecode: dmidecode-output.txt.gz

Monsterovich commented 2 years ago

@Monsterovich Hi. I have 32GB of RAM. Here's the output of dmidecode: dmidecode-output.txt.gz

I had VK_ERROR_DEVICE_LOST with 16gb of RAM. Now I have 32gb and the crashes are gone. I have two ram slots.

HK47196 commented 2 years ago

@fatalerror100500 check dmesg for an FAULT_PDE ACCESS_TYPE_WRITE error after a game crashes, look for an error like Thinkaboutmin posted:

NVRM: Xid (PCI:0000:03:00): 31, pid=104353, Ch 00000045, intr 10000000. MMU Fault: ENGINE CE3 HUBCLIENT_CE2 faulted @ 0x100_fafcf000. Fault is of type FAULT_PDE ACCESS_TYPE_WRITE

I'm starting to think nvidia's drivers have regressed again, as this was a more common issue before and I've seen it pop up a few times more recently. If you do see that error, try using the 465.31 drivers(that exact version) and let me know if it continues.

fatalerror100500 commented 2 years ago

I want to report that in the last 4 days I've found out that my game doesn't freeze anymore. I left the game (Deep Rock Galactic) overnight and it didn't freeze after 10.5 hours. The issue probably disappeared somewhere between the last 27 days and 4 days. In the last 27 days my GPU driver and Kernel version were updated, so maybe that has something to do with the issue. I switched to Proton 6.3-8 and played on it because I set up Proton Experimental to run with apitrace. Before, the game was freezing and crashing on Proton 6.3-8 as well. I haven't tried to run the game with the latest Proton Experimental yet.

My Proton, Kernel and GPU specifications at this moment (Deep Rock Galactic NOT crashing anymore): Proton: 1638789187 proton-6.3-8c Kernel: Linux 5.4.0-92-generic #103-Ubuntu SMP Fri Nov 26 16:13:00 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux GPU Driver: v495.46

felipefacundes commented 2 years ago

@fatalerror100500 The same thing happened to me, with native games in vulkan it works very well, with games via wine with vulkan api it is also effective. Now, when I use DXVK in games that use directx, it doesn't work on the latest Nvidia driver 494 and 495.46, the games just don't open. SOLUTION: I had to downgrade the driver to version 470.94, only with the older version of the Nvidia driver I was able to run the games with DXVK.

shollermann commented 2 years ago

Got the same issue with Dying Light 2 but also when I use DXVK with Battle.net and Lutris. After setting my Manjaro to NVIDIA drivers 470 in the hardware section of systemsettings and disabling max and min FPS in Diablo 3 and the VSYNC option both titles ran flawlessly for at least 2 hours. It seems that it is not the complete solution. For example with Horizon Zero Dawn on High Details the game freezes after a while despite of similar settings. I strongly suggest a DXVK Problem because I use different wine variants as proton and tried lutris-fshack. My DXVK Version is 1.9.4L in Lutris

shollermann commented 2 years ago

Got a Kernel message kernel: NVRM: Xid (PCI:0000:27:00): 8, pid=1867, Channel 00000023 this happens always when the game freezes. Relating to the NVIDIA Developer Board it can be caused by driver issues, application errors, a bus error or thermal problems.

Blisto91 commented 2 years ago

Did you figure anything out with this? Does it still happen with the newest driver.

buzzcut-s commented 2 years ago

I did, at least for my PRIME system. And it confirmed helped a bunch of other people on the LGD discord as well, with similar setups.

The only thing that worked was to disable i915 async flips in-kernel. Basically compile the kernel with this applied : https://github.com/buzzcut-s/kernel-x86/commit/f3e30f9f96438489ff59619fdcbada1a31e09f8c

Takes my system crashes from being 100% every time reproducible to actually never having crashed since. This is with the added caveat that perhaps the original issue in the OP might be different from what I and others have experienced. My comment further above describes what the issue was for us, at least.

darkbroodzed commented 2 years ago

For others like me who brought here by google :smile: Hangs on "NVIDIA Optimus" notebooks still not fixed at time of my post with fresh kernel 5.18 you can fix issue by:

  1. use kernel 5.10
  2. use custom kernel compiled with patch https://github.com/buzzcut-s/kernel-x86/commit/f3e30f9f96438489ff59619fdcbada1a31e09f8c
  3. add dirty fix to xorg config

/etc/X11/xorg.conf.d/80-prime-hunging-fix.conf

Section "Device"
        Identifier "Intel Graphics"
        Driver "i915"
        Option "PageFlip" "false"
EndSection

Actual driver name for you case can be guessed by looking at /var/log/Xorg.0.log

evenfrost commented 2 years ago

The same issue started happening for me in the past months in Deep Rock Galactic, The Lord of the Rings Online (run via Steam and multiple versions of Proton, including Proton Experimental, GE-Proton), newest Diablo Immortal (run via Lutris, also with multiple wine and Proton versions).

Tried different configs, env variables, VSync, ESync, and graphics settings, and the output is the same: the game freezes and then drops out to the desktop. Sometimes in 15 minutes of gameplay, sometimes after 2-3 hours in.

The last 2 errors caught in steam-{GAME_ID}.log:

err:   DxvkSubmissionQueue: Failed to sync fence: VK_ERROR_DEVICE_LOST
err:   DxvkSubmissionQueue: Command submission failed: VK_ERROR_DEVICE_LOST

I'm playing on a laptop, Dell XPS 17 9700 with GeForce 1650 TI Mobile on it and Intel Core i7-10750H, 16 GB of RAM. OS: Ubuntu 22.04, Linux kernel: 5.15.0-37-generic, NVIDIA drivers: 510.73.05.

Cooling is working well. I disabled Intel TurboBoost as it was causing spikes in laptop heating and performance issues, and I'm also running a laptop on a cooling pad with 3 vents. The average in-game processor temp is ~70-80 C, graphics card temp is ~60-70 C.

I've also tried to reduce the graphics settings of Deep Rock Galactic from Ultra High to High as there were reports that it could help, but the freezes still occur as of now.

I don't recall such freezes happening a few months ago, so maybe it's an issue with some new version of some software. I'm going to test everything on 470 drivers and report back. Maybe an option is to upgrade to 32 GB of RAM, but if the issue is related to VRAM overflowing, I doubt it will help.

Blisto91 commented 2 years ago

@evenfrost Just as a sanity check have you verified that the games are running on your nvidia gpu and not the iGPU? If you are in doubt you can check with the dxvk hud.

evenfrost commented 2 years ago

@Blisto91 yes, for sure. I'd have huge performance problems if they were running on the integrated card. Nevertheless, I checked this explicitly multiple times as well through e.g. MangoHud, nvidia-settings, etc, and these games all run on the dedicated one.

Blisto91 commented 2 years ago

Try latest master. A lot of the dxvk codebase have been touched recently. You can find a build of it here https://github.com/doitsujin/dxvk/actions/runs/2615690694

evenfrost commented 2 years ago

FWIW, I've then switched to 470 NVIDIA drivers from 510 and now crashes in Deep Rock Galatic are completely gone. In The Lord of the Rings Online on 470, there can be one crash in several hours (or no crashes at all during the playtime), and on 510 there could be multiple crashes every hour.

I'm going to run the games again on 515 when it gets a stable release on Ubuntu and the latest GE Proton, which should have the master dxvk as well.

maboroshinokiseki commented 2 years ago

I just installed NVIDIA 515 driver, with proton experimental, I'm able to play Final Fantasy 1 Pixel Remake for several hours, with 510 driver, it'll just freeze several times in an hour.

Cloudwalk9 commented 2 years ago

For others like me who brought here by google smile Hangs on "NVIDIA Optimus" notebooks still not fixed at time of my post with fresh kernel 5.18 you can fix issue by:

1. use kernel 5.10

2. use custom kernel compiled with patch [buzzcut-s/kernel-x86@f3e30f9](https://github.com/buzzcut-s/kernel-x86/commit/f3e30f9f96438489ff59619fdcbada1a31e09f8c)

3. add dirty fix to xorg config

/etc/X11/xorg.conf.d/80-prime-hunging-fix.conf

Section "Device"
        Identifier "Intel Graphics"
        Driver "i915"
        Option "PageFlip" "false"
EndSection

Actual driver name for you case can be guessed by looking at /var/log/Xorg.0.log

This solution basically murders performance at least for me, on kernel 5.17, Ubuntu 22.04. On Quake 2 RTX, it runs, obviously on the NVIDIA GPU, but at a quarter of the framerate.

Notably, this entire issue is not present on GNOME Wayland. PRIME offloading is a surprisingly near-flawless experience in most cases on Wayland, and by that I mean it ran DOOM Eternal at 110 FPS. But there exists a bug causing sync issues and diagonal tearing with some xwayland workloads, such as Blender and the Steam client itself.

evenfrost commented 2 years ago

Two weeks into 515 drivers, not a single crash for me. Things definitely got better here.

Blisto91 commented 2 years ago

@fatalerror100500 Are you able to confirm the same?

fatalerror100500 commented 2 years ago

@fatalerror100500 Are you able to confirm the same?

Hi. I can confirm that with the 515 driver the crashes are gone in the game "Deep Rock Galactic", however, I didn't test other games yet.

edwloef commented 2 years ago

I am still having the issue while playing Guild Wars 2 through Proton GE 7-25 (will try updating to 7-27 but i doubt anything will change), running driver nvidia 515.57-9 and kernel 5.18.14.arch1-1 with a GTX 950M

note: I'm using Proton GE because the game won't launch using normal Proton Experimental. Not sure if it affects anything to do with dxvk though.

Blisto91 commented 2 years ago

@edwloef And you are sure it's the same issue? Do other games crash the same and have it always been like that for you in GW2? Be aware that the newly released update for gw2 have made the game crashy, among other things, for linux users.

edwloef commented 2 years ago

@Blisto91 I've had the crashes since before said update, and had been using dxvk through lutris at the time. I don't play any other games that utilize dxvk so i can't confirm for sure but I'd assume it to be the same issue, since the crashes don't happen for my dad who plays without dxvk installed.

Blisto91 commented 2 years ago

Since you can't confirm it happens in other games i think it's best if you made a new issue. Pls fill out the issue template as best as you can with logs and all. Remember to specify if you are playing dx9 or dx11

But before that. Does it work for you if you don't use dxvk? So wined3d instead. Also try latest proton-ge as you said since that runs on almost latest dxvk master now. When you say your dad doesn't use dxvk do you mean he uses wined3d or windows?

edwloef commented 2 years ago

I am playing dx11, and my father plays dx9 with wined3d. And as I can't launch the game without proton I can't test without dxvk.

Blisto91 commented 2 years ago

If you are launching through steam then add this to the games launch options

PROTON_USE_WINED3D=1 %command%
edwloef commented 2 years ago

If you are launching through steam then add this to the games launch options

PROTON_USE_WINED3D=1 %command%

It doesn't crash when using this command. Not sure if this is helpful but it also doesn't crash when running through dxvk with a second monitor showing the desktop, or at least I haven't seen it happen as of yet.

Monsterovich commented 2 years ago

If you are launching through steam then add this to the games launch options

PROTON_USE_WINED3D=1 %command%

It doesn't crash when using this command. Not sure if this is helpful but it also doesn't crash when a second monitor is showing the desktop, or at least I haven't seen it happen.

Because WineD3D enables wine OpenGL D3D implementation? This is not dxvk.

Blisto91 commented 2 years ago

@edwloef thanks for checking 🙂 Last thing to check would be dxvk master, if you hadn't already, which you can try when using proton experimental bleeding edge or latest proton ge.

If nothing changes there then just make a new issue. Remember to use the issue template and fill it as best as you can and to include logs. Note the details we've talked about here like how long you've had the problem, that it doesn't seem to appear with wined3d etc.

edwloef commented 2 years ago

If you are launching through steam then add this to the games launch options

PROTON_USE_WINED3D=1 %command%

It doesn't crash when using this command. Not sure if this is helpful but it also doesn't crash when a second monitor is showing the desktop, or at least I haven't seen it happen.

Because WineD3D enables wine OpenGL D3D implementation? This is not dxvk.

That's exactly what I was trying to confirm 👍

Cloudwalk9 commented 2 years ago

This doesn't seem to be a DXVK bug. This happens with all Vulkan applications if you're using PRIME offloading (NVIDIA On-Demand).

This comment goes into detail about it. There's nothing DXVK can do and it requires i915, Xorg, and/or Nvidia devs to fix. https://github.com/doitsujin/dxvk/issues/2365#issuecomment-973129273

EDIT: It doesn't occur in Quake 2 RTX, but it occurs in DOOM Eternal. So I should probably say "many" Vulkan applications.

Blisto91 commented 2 years ago

@fatalerror100500 Have you seen if the issue is also fixed in other games? 🙂

seacat17 commented 4 days ago

Bump. This issue is here again, Nvidia driver ver: 560.35.03.

Can't play games with DXVK, they all eventually freeze and crash. WineD3D, Vulkan and OpenGL do not have this issue.

Blisto91 commented 4 days ago

Please open a new issue and fill out the template. Include a log from a game session where a crash occurred.

Edit: By log i mean Proton/wine/dxvk. But dmesg is also interesting

doitsujin commented 4 days ago

Not like we have much to go by when there's two people in the whole world who actually have this problem. It's literally not debuggable unless it's a wide-spread enough issue that we can actually reproduce in some way.

WinterSnowfall commented 3 days ago

If the problem described here is indeed related to Nvidia Prime, I've run across it as well on occasion. At various points during gameplay the iGPU can drop out (enter idle state perhaps?) and can cause a device reordering leading to device loss. I doubt you can't reproduce it with Vulkan native applications, though.

It will be stable if you:

You can be more specific in the filter, just note that the value needs to uniquely identify your GPU name, as reported by vulkaninfo.

It is possible simply using the device filter will solve the problem in "On-Demand" mode as well, if you're using your Nvidia GPU. However, I don't guarantee a similar approach would work if you try to use the iGPU, either exclusively or in "On-Demand" mode. "Power Saving Mode" is just hopelessly broken for me at least.

Unfortunately, multi-GPU still involves a lot of jank at driver/X11 level, and that's not something dxvk can solve.