Closed christophehenry closed 1 year ago
This doesn't reproduce with
glmark2
which seem to exclude the problem comes from the nvidia driver.
To be technically correct this excludes the Nvidia OGL driver. There could still be a problem Vulkan-side.
Does vkcube
work?
Any particular XID you are seeing in the output of journalctl -b -1
?
It's very unlikely something of this sort is caused by dxvk, anyway. Also, what exactly do you mean by "the whole machine stops"? Is it a hard lock (only reset button works) or can you still ssh into the machine remotely but the display output is frozen?
Does vkcube work?
It works seemlessly.
Any particular XID you are seeing in the output of
journalctl -b -1
?
What do you mean by XID?
what exactly do you mean by "the whole machine stops"?
No it's not a lock. The whole machine powers off entirely. As if I would use the reset button.
It's very unlikely something of this sort is caused by dxvk, anyway.
If not, I don't really know who to turn to to troubleshoot this :confused:
What do you mean by XID?
The Nvidia kernel module issues various XIDs (think of them as error codes) depending on the cause of a GPU crash. But it sounds like it's a lot worse in your case.
No it's not a lock. The whole machine powers off entirely. Like I if I would use the reset button.
That sounds like a power limitation or hardware issue of some sort, not something a mere graphics API translation layer can cause in itself. It may be responsible for creating the conditions leading to a crash (e.g. high utilization and power draw), but dxvk certainly can't reset your system.
Have you seen this happen in any other situation, such as running the Superposition or Heaven benchmarks?
If it's a power/hardware specific issue it's less likely you're going to see anything in dmesg, since the system will crash before it has a chance to record anything.
Sounds like quite the pickle, but you have to take it methodically and rule out causes until you find out what's actually responsible for these resets in the first place.
:fire: ?
What I just discovered is, if I put the apitrace
DLLs next to the game executable, the game doesn't crash.
What I just discovered is, if I put the
apitrace
DLLs next to the game executable, the game doesn't crash.
apitrace will (sometimes severely) limit the performance of a game, so it's not all that surprising.
Also this line in particular:
juil. 17 18:43:04 localhost gamemoded[4943]: ERROR: Failed to open file for read /sys/class/powercap/intel-rapl/intel-rapl:0/intel-rapl:0:0/energy_uj
Doesn't sound very good to me, but I'm not sure what gamemoded is supposed to be anyway.
Also sounds to me like a power related hardware issue. You could reduce power draw using DXVK_FRAME_RATE.
I am already. On that particular game (TW2) , it is set to 40.
Besides, it doesn't explain why those same game (Civ6 and The witcher 2) were running seamlessly just a month ago. I launched Civ 6 on Proton on 8th of June and it was running correctly.
Everything indicates that an update, at some point, has broken something. And the only common denominator so far is DXVK. This crash doesn't reproduce with vkcube
nor does it reproduce with the Linux version of Civ 6.
Something about power spikes might have changed? I've never seen software causing a hard-reset like that, only freezes and crashes.
@christophehenry
This crash doesn't reproduce with vkcube nor does it reproduce with the Linux version of Civ 6.
guessing game: round 2: $ gamemoderun mangohud vkcube
?
(or how it should start)
Assuming you have this in both lutris and steam launch commands, but not for vkcube. Maybe vulkan layers are broken, or mangohud specifically.
gamemoderun mangohud vkcube
runs smoothly too without problems :pensive:
So I tried lauching TW2 limiting the framerate to 30 using DXVK_FRAME_RATE
and it ran for ~45 seconds before crashing.
I suppose this confirms a problem with the hardware after all? I don't know what to do now…
The first thing I'd try is to take out the SSD and boot from it on another system (either laptop or desktop), to see if the same behavior replicates. An easy option to do that is by using a SATA/NVMe to USB dock.
But it depends if your laptop is still under warranty or not. If it is, I'd definitely RMA it after backing up my data and secure erasing the SSD, rather than mess with its internals and risk losing the warranty.
Some further ideas: looking for dust buildup in the fans pathway in case heat would contribute to this. Some laptops might route the power through the battery or not, so you could try to run it without battery if it is easily removable (and checking if the opposite - running it purely from battery - would make a difference).
Thank you. I will try cleaning the dust. But I doubt this may be a problem since I didn't see any temperature spike in Mangohud during my tests. I already have the battery removed, most of the time on this laptop :confused:
I will close this issue since it seems less and less related to DXVK as time passes. I'll reopen if this changes.
Anyway, thank you very very much for helping me.
Hi! If you're interested in the epilogue, the problem came from my laptop's charger. So absolutely no connection with DXVK.
Still, I wanted to thank you all for helping diagnose the problem.
So it was "the power supply" after all :). Happy to hear you figured it out (and thankfully it's an easy thing to replace).
Ok, this one will probably be a bit confusing but my games using DXVK make my machine crash. Not just Linux, the whole machine stops. This has reproduced for a a least a week now (maybe more, I am not sure) with every DXVK app I have on my computer both on Wine and Proton.
After the crash, dmsg seems empty and
journalctl -b -1
doesn't seem to contain any relevant information.I think, but I am not sure that this issue has appeared after the latest Nvidia driver update (535.54.03). Everything was running smoothly around the beginning of June..
Software information
This reproduces with The witcher 2, both windows version running on Wine (with Lutris) and Steam version running on Proton 8. Also reproduces with Civilization VI on Proton (does not reproduce with Linux version).
This doesn't reproduce with
glmark2
which seem to exclude the problem comes from the nvidia driver.System information
Apitrace file(s)
I am not sure these will be actually useful. I couldn't reproduce the issue lauching just wine
'witcher2.exe' > game.log 2>&1
.Log files
Please let me know if you need more informations