HansKristian-Work / vkd3d-proton

Fork of VKD3D. Development branches for Proton's Direct3D 12 implementation.
GNU Lesser General Public License v2.1
1.86k stars 194 forks source link

The Last of Us Part 1 Xid 109 #1925

Closed Blisto91 closed 1 month ago

Blisto91 commented 7 months ago

Testing TLOU1 i am hitting a Xid 109

NVRM: Xid (PCI:0000:01:00): 109, pid='<unknown>', name=<unknown>, Ch 00000116, errorString CTX SWITCH TIMEOUT, Info 0x32c07d

While testing this i have just been running around outside in "The Capitol Building" area seen below. Don't know if it can also happen other places as i haven't tried that.

Screenshot ![Screenshot_20240303_213833](https://github.com/HansKristian-Work/vkd3d-proton/assets/47954800/ecfd062f-5559-47b2-8c38-07f3942f4a0a)

Software information

The Last of Us Part 1 Ultra preset

System information

Running with VK_ICD_FILENAMES=/usr/share/vulkan/icd.d/nvidia_icd.json to avoid a frozen white screen on game start. Likely weirdness due to also having a iGPU.

Log files

I've gathered some breadcrumb logs which all mention the same shader in the potential crash region. Also attached the dxil and spv file of that shader below from a dump proton-breadcrumbs.tar.gz

615fcfe61ee4a544..tar.gz

shelterx commented 7 months ago

Weird, I played for a while and it worked here but I'm in "Downtown" right now. How long does it take before it crash?

RTX 4070 Recent dxvk-nvapi/vkd3d-proton master with reflex support. Vulkan dev 550.40.53 Proton 8 Ultra settinga

Blisto91 commented 7 months ago

Under a minute for me. Sometimes only a few seconds. It's worth noting though that I am unsure if the nature of it is a bit racy or random since when I was collecting breadcrumbs I had to play alot to reproduce again for the second log.

shelterx commented 7 months ago

I see what you mean now, what's weird is that I quit in that very same location yesterday and didn't have a crash. Now it went boom after 30 seconds. Isn't Xid crashes usually driver related? I mean, I played this game without crashes with older drivers.

[tis mar  5 21:03:08 2024] NVRM: GPU at PCI:0000:01:00: GPU-c527f869-4b6b-fb00-51ea-f36233874170
[tis mar  5 21:03:08 2024] NVRM: Xid (PCI:0000:01:00): 109, pid='<unknown>', name=<unknown>, Ch 00000068, errorString CTX SWITCH TIMEOUT, Info 0x3ec06
Blisto91 commented 7 months ago

Isn't Xid crashes usually driver related? I mean, I played this game without crashes with older drivers.

Often yes. But i asked if i should make a vkd3d-proton issue for this one since i had breadcrumbs that lead to a specific shader and i got a yes.

HansKristian-Work commented 7 months ago

Given this shader seems to do some waveop loops similar to our maximal_convergence test that also hangs the GPU on NV atm, it's likely caused by that. Currently awaiting a driver fix for the maximal_convergence test, and hopefully that fixes this too.

shelterx commented 6 months ago

@Blisto91 For some reaon the game even crashes during shader compilation for me now, don't know why. Never had that problem before. Does it work for you with latest dxvk-nvapi and vkd3d-proton?

Blisto91 commented 6 months ago

Yes. Both master and latest stable of both. 550.67

shelterx commented 6 months ago

Game still crashes with 550.67 :( The fact that it crashed during shader compile for me was because of my memory overclock, walks away in shame (i've fixed it now tho').

shelterx commented 6 months ago

I got this now:

NVRM: GPU at PCI:0000:01:00: GPU-c527f869-4b6b-fb00-51ea-f36233874170
NVRM: Xid (PCI:0000:01:00): 11, pid='<unknown>', name=<unknown>, Ch 000000af Cl 0000c997 Off 00001028 Data 00000020
shelterx commented 5 months ago

Seems fixed for me in Dev 550.40.61 - Please verify

shelterx commented 5 months ago

It's NOT fixed. Which's weird because I played it for a while without any issues and now it crashes almost instantly with XID 109 again :(

shelterx commented 4 months ago

Still crashing with 555.42.02 [tis maj 21 15:48:18 2024] NVRM: Xid (PCI:0000:01:00): 109, pid='', name=, Ch 0000004f, errorString CTX SWITCH TIMEOUT, Info 0x51c04d

Blisto91 commented 4 months ago

Sorry yea i hadn't gotten around to this again. But thanks for checking

shelterx commented 4 months ago

This is confusing me so much, sometimes it just crashes almost straight away, sometimes it's like it's never happend and the game runs fine. 555 beta again... First launch, compiled main story shaders. Ran the game, worked fine. Second lanuch, ran game, crashed within 10 seconds.

runar-work commented 2 months ago

Yeah, it's a bit random. I'm back on RTX 4070 and reproduced it easily there. I don't know if it was just random that I didn't reproduce on 3070 last time, as I can sometimes play a while without hangs on the 4070 as well. Sometimes it takes 4-5 launches before it hangs, but quitting after a minute or two of gameplay and relaunching seems to be the fastest way to reproduce. Now I had eight launches in a row where I got a Xid 109 a few seconds after loading into the capitol building area, so sometimes it's trivial.

The hang still happens with 550.40.65 and 555.58.02 on 6.9.8-arch1-1.

The shader Blisto found is named CS_VolumetricsTemporalCombineProbeCacheFroxelsScalar. It does have subgroup operations within loops, but forcing maximal reconvergence in dxil-spirv didn't help. QA descriptor checks had no effect, and forcing barriers for this shader via VKD3D_BARRIER_HASHES also doesn't prevent the hang.

shelterx commented 2 months ago

FWIW, I tried spoofing a different nvidia GPU architecture, it did not help either. Nor did disabling nvapi.

shelterx commented 1 month ago

I might have to eat my words later but vkd3d-https://github.com/HansKristian-Work/vkd3d-proton/commit/e957460ed1aade52300d5dfb9790478cd1ab80d9 and Nvidia 560.31.02 no longer causes XID 109 crashes here. (I don't use the nvidia open driver and have GSP disabled)

Someone please verify if possible.

shelterx commented 1 month ago

This is hilarious... It stopped working again. I give up. I played the game fine for about an hour, i even restarted the game maybe 2-3 times. Now I was going to play again today. Instant crash. Same drivers, nothing changed, just a few system restarts in between...

shelterx commented 1 month ago

~Sorry for the spam, deleting /home/user/.cache/nvidia temporarily fixes it. So it seems like it's some cache issue with the Nvidia driver.~

shelterx commented 1 month ago

@HansKristian-Work You know what, it's a race condition....

There seems to be a "frametime spike" happening about 5-6 seconds after the game has loaded it's initial menu screen, if you load a saved game before that frametime spike has occured, the game will crash with XID 109 when entering the level. So whatever that frametime spike is, it cannot happen during a game save load.

So you wait a while (less than 10 seconds on my machine) at the main menu before loading a save, the game will work fine every time.

That's probably why it worked after a driver update, because of the shader recaching time.

Update, i can reproduce the crash in Windows too with vkd3d-proton and dxvk-nvapi.

shelterx commented 1 month ago

Breadcrumbs log seems to look the same as the one @Blisto91 posted. I can reproduce it easily, like I said, start the game quickly = instant crash. Wait a while at the main menu before starting = no crash breadcrumb.log

K0bin commented 1 month ago

How exactly do I reproduce the issue?

I hit continue on the main menu immediately and it loads into the game just fine. The game itself works fine too after that.

shelterx commented 1 month ago

RTX 4070 vkd3d-proton latest git dxvk-nvapi latest git

  1. Let it finalize the shader building at the begining
  2. Set graphics to ultra
  3. Load a saved game (it should probably work fine now)
  4. Quit the game
  5. Start the game, quickly load a saved game (continue). It should crash now (takes ~ 5-30 seconds), try step 4-5 again if it doesn't.

I don't think location really matters but I'm in "The Woods" now. I haven't tried the Dev Linux 550.40.71 driver tho'.

K0bin commented 1 month ago

I did manage to reproduce it in the end.

shelterx commented 1 month ago

Ok, so I tried really hard to make it crash with Dev Linux 550.40.71 .... it doesn't anymore. At least not for me. Wonder if the root cause was the same as the Final Fantasy crash, what do you think? Was it similar in any way?

@K0bin can you try the dev driver?

runar-work commented 1 month ago

I was just about to comment that a fix was added in the latest beta driver. I don't know if the two problems are related, though.

K0bin commented 1 month ago

Good call, I'll try the beta driver.

K0bin commented 1 month ago

I can't reproduce the issue with the beta driver. There's too much randomness involved with this bug to make any definitive statement but so far it seems like it's fixed.

shelterx commented 1 month ago

I even restarted my computer twice make sure it wasn't a fluke here.... it just won't crash anymore, which is good I guess. :) But yeah, the randomness is strong in this one.

Blisto91 commented 1 month ago

We'll just reopen if it suddenly appears. Thanks for the help