HansKristian-Work / vkd3d-proton

Fork of VKD3D. Development branches for Proton's Direct3D 12 implementation.
GNU Lesser General Public License v2.1
1.77k stars 185 forks source link

Starfield, i found a vkd3d fix hinting to a possible root cause for some freezes and crashes! #1693

Closed fakhraldin closed 9 months ago

fakhraldin commented 10 months ago

Salamu alaykum friends of the penguin,

This solution is for special case vkd3d freezes and crashes, reported by a minority of amd gpu users here. The bigger nvidia issues are obviously another story due to NV_device_generated_compute. Nonetheless this finding might help the latter as well. I haven't found any hints to this solution on the game's dedicated protondb. I could have posted it there, but it could have been lost in the flood of reports. Due to it's special and rare nature, it could rather shed some light for the devs on possible root causes of the freezes and crashes to strengthen vkd3d's basis. If this solution is trivial though, i am going to close this post.

I experience reproducible freezes and crashes after 1-2 minutes, every time i launch the game.

I tried the following workarounds without any success.:

Additionally i followed this vkd3d's dev recommendation from here and did the following.:

However i found the solution just benath the environment variable no_upload_hvv on the vkd3d-proton github page. After deleting every env. var. and just adding the following one, i experienced no more crashes or freezes.:

VKD3D_CONFIG=force_host_cached

With this the game ran stable. After about half an hour i closed the game. To rule out margins of errors, i removed that env. var. and relaunched the game. And again the game freezed and crashed after 1-2 minutes after launch. But as soon, as i added back VKD3D_CONFIG=force_host_cached, the game ran stable throughout the test. I was able to observe this behavior in three rounds of testing.

I am going to add the logs per VKD3D_LOG_FILE for both test scenarios at the end of this report.

Software information

Starfield, with settings from low to ultra

System information

Log files

log file after the freeze and crash: starfield-vkd3d - freeze-and-crash.log

log file of stable run per VKD3D_CONFIG=force_host_cached starfield-vkd3d - runs stable per force_host_cached.log

Thoxy67 commented 10 months ago

Thanks dude this definitely fix all of my crashes

omegatengu commented 10 months ago

You are a fucking saint, this fixed everything, and i've been working on this since t-5 days

SammyJames commented 10 months ago

This also works for me 🥹

xenanthropy commented 10 months ago

Don't wanna jinx myself, but the flags (not just force_host_cached) seems to have fixed my crashing! (AMD card) Will definitely do more extensive "testing" though :)

liberodark commented 10 months ago

Hi,

For me game work with this fix : https://github.com/HansKristian-Work/vkd3d-proton/pull/1679 Im not using VKD3D_CONFIG=force_host_cached GPU RX 7900 XTX

Best Regards

rokam commented 10 months ago

Hi,

For me game work with this fix : #1679 Im not using VKD3D_CONFIG=force_host_cached GPU RX 7900 XTX

Best Regards

That commit seems to set that flag based on the executable name.

fakhraldin commented 10 months ago

Hi,

For me game work with this fix : #1679 Im not using VKD3D_CONFIG=force_host_cached GPU RX 7900 XTX

Best Regards

In earlier tests, prior to my findings, i have already considered and tried the view-pressure-message-fix by applying this without any success.

As i mentioned in my report above, i used the following build.

VKD3D-Proton version: v2.9La5b0291b https://github.com/lutris/vkd3d/releases/tag/v2.9La5b0291b

And this build is more recent than the merge #1679 and therefore should already contain the latter, if i am not mistaken. However, i am glad the view-pressure-message-fix helped you, but in my case it didn't. Only VKD3D_CONFIG=force_host_cached helped. Best regards

PS: According to vkd3d devs both VKD3D_CONFIG=force_compute_root_parameters_push_ubo and "view-pressure-message-fix" are actually "completely irrelevant"

  1. view-pressure-message-fix: - "It's not a "fix". It just removes log spam."
  2. "VKD3D_CONFIG=force_compute_root_parameters_push_ubo is also completely irrelevant, and does not even exist anymore. It's been supersedes after NV_dgcc was merged. It would be a fix if using a pretty old vkd3d-proton build though."
DGauze commented 10 months ago

Wa Alaykum Salam brother,

Glad this helped out our AMD friends. Just want to say that it unfortunately doesn't fix the issues for Nvidia, at least not with my 4000 series card.

doitsujin commented 10 months ago

So what exactly is the issue anyway? Everyone always keeps talking about "crashes" but we don't even know what people are talking about exactly. What are those crashes supposed to be and how do you reproduce them?

Also, setting force_host_cached is not a bug fix, or even a valid workaround ot anything. All this does is force HVV allocations and uncached system memory allocations to go into a cached memory type instead, in order to accelerate captures.

Unless your system is under extreme memory pressure (not enough GTT?) or the game is literally broken (race condition, deliberately making wrong assumptions about page properties) it simply doesn't make any sense that this would fix any sort of issue.

fakhraldin commented 10 months ago

So what exactly is the issue anyway? Everyone always keeps talking about "crashes" but we don't even know what people are talking about exactly. What are those crashes supposed to be and how do you reproduce them?

I was able to save the log file of the crash. As for the reproduction part i did nothing special. I pressed new game and the intro scene starts with the main player together with two NPCs in the lift. After 1-2 minutes the crash occures, before the lift arrives. Sometimes it crashes a bit later in the caves but always after a couple of minutes without doing anything special.

Also, setting force_host_cached is not a bug fix, or even a valid workaround ot anything. All this does is force HVV allocations and uncached system memory allocations to go into a cached memory type instead, in order to accelerate captures.

Well, "view-pressure-message-fix" is also just supposed to do a trivial task, namley to just remove log spam, but it reportedly works for some people.

Unless your system is under extreme memory pressure (not enough GTT?) or the game is literally broken (race condition, deliberately making wrong assumptions about page properties) it simply doesn't make any sense that this would fix any sort of issue.

My system memory runs indeed per xmp profile, but i never experienced issues due to that with other games. As a rule out I will run the memory at standard frequency and report back. The GPU memory does run at vendor standard speed.

doitsujin commented 10 months ago

I never mentioned overclocking in any way whatsoever.

Anyway, the problem here is that those supposed crashes don't really happen for us, so not much we can do here until someone with the necessary knowledge actually manages to give us a useful report.

fakhraldin commented 10 months ago

I never mentioned overclocking in any way whatsoever.

I didn't say you mentioned it. I have taken your thought further and suspected memory errors due to the increased workloads. However i have just finished testing on this and i can rule out possible memory instabilities and also above 4G decoding as possible causes for the freeze and crash. Still VKD3D_CONFIG=force_host_cached reliably prevents those.

I am trying to help people, as good as i can. These informations are all i have at the moment. It is a hard puzzle. I can feel you. Maybe one of the other experienced devs might get a clue about this.

Blisto91 commented 10 months ago

The PR isn't about crashes. Just an attempt at doing some small optimizations for some inefficiencies.

fakhraldin commented 10 months ago

I deleted my comment with the PR's link. So we still don't know, what does cause those freezes and crashes.

dannyglover commented 10 months ago

I never mentioned overclocking in any way whatsoever.

I didn't say you mentioned it. I have taken your thought further and suspected memory errors due to the increased workloads. However i have just finished testing on this and i can rule out possible memory instabilities and also above 4G decoding as possible causes for the freeze and crash. Still VKD3D_CONFIG=force_host_cached reliably prevents those.

I am trying to help people, as good as i can. These informations are all i have at the moment. It is a hard puzzle. I can feel you. Maybe one of the other experienced devs might get a clue about this.

I found that the game crashed after pausing > changing the resolution scale > going back in-game. Sometimes immediately, mostly after about 10 seconds or so.

It was easily reproducible on my system at least. Adding the environment variable to the launch options stopped it from happening.

Does this happen on your end?

I'll grab a log to add to this later today, as it would be helpful.

fakhraldin commented 10 months ago

I found that the game crashed after pausing > changing the resolution scale > going back in-game. Sometimes immediately, mostly after about 10 seconds or so.

Does this happen on your end?

I confirm that. I went to the settings menu, changed to windowed mode, set a different resolution and went back to the game. I repeated this and changed the resolution back to default. When i tried to leave the settings menu. the picture froze. But the game didn't crash. In the background i heard a game dialogue going on. Pressing escape still paused and unpaused the game sucessfully. I heard that by the menu sound, that the pausing initiates.

So that bug seems, to be a different one, but also to be related to the ones in my issue report.

MMUTFX2053 commented 10 months ago

for those of you with rx400 and rx500 serries of gfx cards that have the missing textures issue, i found a fix for it, use this "ACO_DEBUG=noopt"

Mastergatto commented 10 months ago

for those of you with rx400 and rx500 serries of gfx cards that have the missing textures issue, i found a fix for it, use this "ACO_DEBUG=noopt"

Thank you! This fixed the main issue I had with the game (on Vega 56), so it looks like there are some optimizations involved that don't work well on those GPU series?

EDIT: There's a ticket on this on the mesa repository: https://gitlab.freedesktop.org/mesa/mesa/-/issues/9784 , I'm following it there.

RPINerd commented 10 months ago

So as of the update by bethesda today, I am now experiencing crashes every few minutes that log out my current session. Prior to this, the game has been remarkably stable with only the occasional group of crashes that seem to resolve themselves post-reboot. Searching the error message led me here! Using the VKD3D_CONFIG=force_host_cached flag anectdotally gave me a few extra minutes of stability, but certainly no fix.

I'm on a 6900XT with 3700X, 6.5.3-arch1-1, mesa 1:23.1.7-1

The crash point is:

2694.293:00cc:00d0:trace:unwind:dwarf_virtual_unwind r12=00000000000000cc r13=0000000251d80000 r14=0000000000000003 r15=0000000000000000 2694.293:00cc:00d0:warn:seh:dump_syscall_fault backtrace: __wine_syscall_dispatcher. 2694.293:00cc:00d0:warn:seh:dump_syscall_fault backtrace: returning to user mode ip=00000002c73ab564 ret=c0000005 X connection to :0 broken (explicit kill or server shutdown). 2694.296:012c:01c8:err:vkd3d-proton:dxgi_vk_swap_chain_present_signal_blit_semaphore: Failed to submit present discard, vr = -4. 2694.296:012c:01c8:err:vkd3d-proton:d3d12_command_queue_signal: Failed to submit signal operation, vr -4.

Full log here: steam-1716740.log

wsippel commented 10 months ago

For me, Starfield won't even start anymore after the update, just hangs at a black screen until I kill the process.

HansKristian-Work commented 10 months ago

I'm on a 6900XT with 3700X, 6.5.3-arch1-1, mesa 1:23.1.7-1

Does mesa-git help?

MMUTFX2053 commented 10 months ago

For me, Starfield won't even start anymore after the update, just hangs at a black screen until I kill the process.

did you try deleting the cache's ?

RPINerd commented 10 months ago

did you try deleting the cache's ?

Is there a built in button somewhere in steam to do this? Or do you have to hunt down the folder manually somewhere and just rm -rf?

RPINerd commented 10 months ago

I'm on a 6900XT with 3700X, 6.5.3-arch1-1, mesa 1:23.1.7-1

Does mesa-git help?

Well yes and no, and also no haha So the crash still happens, but the final entries in the log are new with the mesa-git. But to make things worse, there are also now significant flickering artifacts on screen

Tail of the log: 513.261:00cc:00d4:trace:unwind:dwarf_virtual_unwind rsi=000000000158fe60 rdi=00000002169eb600 rbp=000000000168ead8 rsp=000000000168ea20 513.261:00cc:00d4:trace:unwind:dwarf_virtual_unwind r8=00007f62893dcb20 r9=0000000000000001 r10=0000000000000000 r11=0000000000000246 513.261:00cc:00d4:trace:unwind:dwarf_virtual_unwind r12=000000000000c015 r13=0000000000000000 r14=0000000000000003 r15=0000000000000000 513.261:00cc:00d4:warn:seh:dump_syscall_fault backtrace: __wine_syscall_dispatcher. 513.261:00cc:00d4:warn:seh:dump_syscall_fault backtrace: returning to user mode ip=00000002c73ab564 ret=c0000005 X connection to :0 broken (explicit kill or server shutdown). pid 9368 != 9367, skipping destruction (fork without exec?)

Full log: steam-1716740.log

MMUTFX2053 commented 10 months ago

did you try deleting the cache's ?

Is there a built in button somewhere in steam to do this? Or do you have to hunt down the folder manually somewhere and just rm -rf?

you have to hunt it down, there are 2 in .cache folder, 1 for mesa and the other radv, then there is the cache that vkd3d generates where the starfield.exe is located, and then there is the one inside the prefix, /AppData/Local/Starfield/

RPINerd commented 10 months ago

you have to hunt it down, there are 2 in .cache folder, 1 for mesa and the other radv, then there is the cache that vkd3d generates where the starfield.exe is located, and then there is the one inside the prefix, /AppData/Local/Starfield/

oof.. cleared the cache's and it crashe IMMEDIATELY after loading into game lol

wsippel commented 10 months ago

For me, Starfield won't even start anymore after the update, just hangs at a black screen until I kill the process.

did you try deleting the cache's ?

Yeah. Deleted the prefix and verified the installation, too. Nothing. Starts with GE Proton though, so it might be a regression in Bleeding Edge.

EDIT: It works with Bleeding Edge again after I ran it once with GE. Weird.

RPINerd commented 10 months ago

Ooh I got some new lines!

180.277:0130:01bc:err:vkd3d-proton:dxgi_vk_swap_chain_wait_and_reset_acquire_fence: Failed to wait for fence, vr -4 180.277:0130:01bc:err:vkd3d-proton:dxgi_vk_swap_chain_submit_blit: Failed to wait for fence, vr -4 180.277:0130:01bc:err:vkd3d-proton:dxgi_vk_swap_chain_present_signal_blit_semaphore: Failed to submit present discard, vr = -4. 180.277:0130:01bc:err:vkd3d-proton:d3d12_command_queue_signal: Failed to submit signal operation, vr -4.

RPINerd commented 10 months ago

I have very unhelpfully "fixed" my game! (up to 10 hours of running with no crash)

The reason I say unhelpfully is because I went the nuclear option: returned to non-BE experimental, deleted all the dll files, ini files and starfield.exe in the steamapps folder, deleted the data (not saves) folder in the user directory, validated the install to reacquire everything, removed everything from the custom launch options Except for VKD3D_CONFIG=force_host_cached, overwrote with my custom ini files, and bobs your uncle!

When running without that launch option the game still crashed almost instantly upon entering gameplay even after all the cleanout steps.

I don't know what was going on but clearly something in my core game files got monkeyed with when I was trying out mods, to the point that uninstalling the mods alone did not fix the issue I actually had to reinstall the runtime stuff from steam to clear everything up

RPINerd commented 10 months ago

Just a final followup, I installed starUI and within 20 minutes had a crash again. Fair enough, mods are absolutely not the driver devs' problem and I don't expect you guys to be troubleshooting modded games. BUT, after removing the starUI files and rebooting the computer, the game was back to crashing itermittently! I repeated the "cleaning" process above and validated the game files to replace the ini/dll files, and after that no crashing for several hours...

So clearly there is something up with certain mods that are not playing nice with Linux even post uninstall?? Super weird but definitely something for others having this issue to consider. You can't apparently just uninstall the mod alone. I think it's gremlins. Definitely gremlins.

glorer commented 10 months ago

for those of you with rx400 and rx500 serries of gfx cards that have the missing textures issue, i found a fix for it, use this "ACO_DEBUG=noopt"

Problem, when i use that option mouse no longer work unless im pressing some button, i cannot stand and look around.

Someone have some idea?

Thx.

Katherine1 commented 9 months ago

With this build of vkd3d along with the NVIDIA vulkan beta drivers, the game is running a bit more smoothly, though I'm hitting the occasional spot where I'm getting some significant performance drops that keep happening until I restart the game. When this is happening, the proton log gets spammed with: 34010.917:0120:01d0:warn:vkd3d-proton:d3d12_device_QueryInterface: {0742a90b-c387-483f-b946-30a7e4e61458} not implemented, returning E_NOINTERFACE. steam-1716740.log

fakhraldin commented 2 weeks ago

Epilog to my issue report: I don't play this game anymore, but interestingly after i replaced my gigabyte motherboard some time ago, i got no crashes anymore.

My gigabyte model got a known problem, when enabling ReBAR in the BIOS. Its ReBAR implementation doesn't work with Linux. So i had to enable "Above 4G decoding" alone and tick off ReBAR to gain the benefit of ReBAR. ReBAR doesn't seem to have something to do with the issue, as i already experimented with this back then. I am mentioning it though to show that my mainboard model seems to do some things differently.

Apart from the new motherboard i used the exact same original hardware components of my old rig. With my new MSI motherboard i only got heavy texture flickering, which i could resolve by setting RADV_DEBUG=syncshaders. When i updated the game to version 1.9.51.0, this RADV_DEBUG variable was not necessary anymore. The textures worked fine then.

I wasn't sure whether i should inform about this or not, because the issue report has been closed some months ago. But it might help people, who are still strugling with it. It seems that motherboard manufacturers may have different implementations of features in their models, which may differ from the standard specifications and cause issues.