doitsujin / dxvk

Vulkan-based implementation of D3D8, 9, 10 and 11 for Linux / Wine
zlib License
12.68k stars 808 forks source link

DXVK causes random system halt with input/output error #3933

Open only-su opened 5 months ago

only-su commented 5 months ago

For some reason running games with dxvk cause my system to suddenly lose system disk reference. I have troubleshooted any other possibility but, for the best of my knowledge this is what is happening. I would be glad with any help to how to investigate this further to give more info. I will be as thorough as possible with my system info as this maybe can help anyone understand? I'm not sure, I'm not a specialist and I'm really lost.

So, some time ago my computer started having a strange behavior. It would just start to freeze some apps a little but not completely, games would keep running but a little strange and suddenly it would all freeze but any audio would keep playing on repeat, I would not be able to change to tty. It gave me a hell of a headache, but I eventually thought of leaving a terminal window running open on my second screen and when the problem happened I checked that trying to run any command would give me a "input output error".

So obviously I thought I had a faulty disk. Booted on a live ISO and ran a bunch of tests on my NVME. No problems warned. Repaired filesystem, maybe it was it. The problem kept happening. Okay, something is strange so I bought a new NVME, a really good one, clean installed my system, the problem kept happening. I would be playing a game, a game running on proton. "Oh! maybe it's proton?" I thought. and then stumbled on this post Mysterious crash from Proton games .

I performed what the OP suggested. "PROTON_USE_WINED3D=1". Disable the default DXVK and run through WineD3D. The problem disappeared, I can play games for hours on end and no input/output error to haunt me.

So this is a cry for help. How can a gpu library yeet my disk driver? I'm scared and utterly confused. I am willing to give any more help I can if someone care to explore it or to elucidate what is happening.

Software information

Any game, default steam settings.

System information

Computer Information: Manufacturer: ASRock Model: B450M Steel Legend Form Factor: Desktop No Touch Input Detected Processor Information: CPU Vendor: AuthenticAMD CPU Brand: AMD Ryzen 5 3600X 6-Core Processor CPU Family: 0x17 CPU Model: 0x71 CPU Stepping: 0x0 CPU Type: 0x0 Speed: 3902 MHz 12 logical processors 6 physical processors Hyper-threading: Supported FCMOV: Supported SSE2: Supported SSE3: Supported SSSE3: Supported SSE4a: Supported SSE41: Supported SSE42: Supported AES: Supported AVX: Supported AVX2: Supported AVX512F: Unsupported AVX512PF: Unsupported AVX512ER: Unsupported AVX512CD: Unsupported AVX512VNNI: Unsupported SHA: Supported CMPXCHG16B: Supported LAHF/SAHF: Supported PrefetchW: Unsupported Operating System Version: "BigLinux" (64 bit) Kernel Name: Linux Kernel Version: 6.1.80-1-MANJARO X Server Vendor: The X.Org Foundation X Server Release: 12101011 X Window Manager: KWin Steam Runtime Version: steam-runtime_0.20240304.79797 Video Card: Driver: AMD AMD Radeon RX 580 Series (radeonsi, polaris10, LLVM 16.0.6, DRM 3.49, 6.1.80-1-MANJARO) Driver Version: 4.6 (Compatibility Profile) Mesa 24.0.2-manjaro1.1 OpenGL Version: 4.6 Desktop Color Depth: 24 bits per pixel Monitor Refresh Rate: 60 Hz VendorID: 0x1002 DeviceID: 0x67df Revision Not Detected Number of Monitors: 2 Number of Logical Video Cards: 1 Primary Display Resolution: 1920 x 1080 Desktop Resolution: 3360 x 1169 Primary Display Size: 18.78" x 10.55" (21.54" diag), 47.7cm x 26.8cm (54.7cm diag) Primary VRAM: 8192 MB Sound card: Audio device: ATI R6xx HDMI Memory: RAM: 32023 Mb VR Hardware: VR Headset: None detected Miscellaneous: UI Language: English LANG: pt_BR.UTF-8 Total Hard Disk Space Available: 953868 MB Largest Free Hard Disk Block: 398638 MB Storage: Number of SSDs: 2 SSD sizes: 1000G,500G Number of HDDs: 0 Number of removable drives: 0

Apitrace file(s)

Log files

WinterSnowfall commented 5 months ago

I hope it's clear that dxvk can't crash your computer - what it can do is use your hardware resources more efficiently and that could in turn expose problems that otherwise stay dormant.

To be honest your issue sounds like something I'm glad I'm not stuck with, because the root cause could be a large number of things and it's probably going to take a lot of patience and time to narrow it down. Here are some potential causes I can think of:

In any case, I doubt this is a (user space) software problem, the symptoms you've mentioned are a bit too harsh for that.

Blisto91 commented 5 months ago

We'd probably at minimum need a dmesg or journal log to be able to get an idea.

only-su commented 5 months ago

Thanks you all for taking your time to exploring my problem. As of Mysterious crash from Proton games I'm not the only one with this exact problem. I will try benchmarking to see if I can make it emerge in another situation as suggested. PCIe bus issues - If this is the case, how do I make sure? Is there a good channel for me to post this information and make the problem avaliable to interested people? C-states / powersaving madness - I had already had problems with it in the past. All options are already disabled as of now. Maybe it is related. Unstable overclock - I do not overclock. Kernel bug - If it is the case, how do I communicate it? Where should I post for making this problem avaliable to interested people? What should I look for on dmesg? Hardware fault - It was my first thought, that made me change the NVME. I ran a bunch of tests on this and related possibilities and all came clean. But maybe it is mobo or pci related as you pointed out.

Dmesg or journal log - How does one attain it? What is relevant to you to see? How does I provide it to you? I already tried to take a peek on dmesg but nothing made sense to me. The messages just before the crash are no different in my eyes from the messages that were before. I think, as the disk is not accessible anymore, even if it was the case of a loggable error no log would be written anyway as the computer can't access the log file anymore.

Thank you all that commented for your time.

turol commented 5 months ago

For dmesg, ssh in from another computer and run dmesg -w in that shell. Also works for journalctl -f. Assuming the network interface doesn't disapper immediately when the error occurs you can then save the messages on the other computer.

WinterSnowfall commented 5 months ago

PCIe bus issues - If this is the case, how do I make sure? Is there a good channel for me to post this information and make the problem available to interested people?

I don't think there's a specific channel for these sort of things, but any hardware enthusiasts' forum is potentially a good place to start.

Kernel bug - If it is the case, how do I communicate it? Where should I post for making this problem available to interested people?

You should probably rule out other root causes first, before you start bothering any of the kernel folk. A lot of this is, unfortunately, scouring the internet for users with similar hardware configurations and seeing if they've had issues. In short, long and arduous detective work.

What should I look for on dmesg?

Anything that might point to a cause, as dmesg will capture kernel module responses to a whole range of situations, including hardware faults.

I think, as the disk is not accessible anymore, even if it was the case of a loggable error no log would be written anyway as the computer can't access the log file anymore.

Don't quote me on this, but I think dmesg will write to memory first, so assuming your system/CPU is still alive, it should still dish out information. That being said there are other things that can make it choke. If dmesg doesn't register anything, that's quite the pickle. Perhaps run it in a separate terminal before the crash happens, so you can still see the output once it does?

ssh in from another computer and run dmesg -w in that shell

Good advice in theory, but ssh logins won't work if it can't read auth keys or passwords from your drive. Or if you meant doing that before the crash, then sure, but I doubt it will continue logging anything.

Blisto91 commented 5 months ago

You can show dmesg from previous boot with journalctl so no need to ssh in while it is happening. journalctl -k -b -1 To output it in to a file you can journalctl -k -b -1 > dmesg.txt (or a path to the file if you want it a specific location)

Edit: Though if it can't actually save this to disk after the crash happens doing it the above way might not yield anything useful

only-su commented 5 months ago

More info on the hardware theory: I checked the post that saved me (Mysterious crash from Proton games) and noticed that the OP had their inxi info, I checked and we do not share any hardware in CPU, MOBO or GPU. We both have a WD NVME, but, I had the problem before I changed NVME and it was a xraydisk. Their filesystem is ext4 and mine BTRFS.

As for now our only similarity is having the system running in a NVME. And running a Manjaro based distro.