godotengine / godot

Godot Engine ā€“ Multi-platform 2D and 3D game engine
https://godotengine.org
MIT License
90.54k stars 21.08k forks source link

Game crashing on swap_buffers - Seemingly random #67404

Open SnaveSutit opened 2 years ago

SnaveSutit commented 2 years ago

Godot version

4.0 Beta2

System information

OS: Windows 11 | CPU: AMD Ryzen 5 3600X 6-Core | GPU: AMD Radeon RX 5700 XT (Driver: 22.20.19.16-221003a-384125E-AMD-Software-Adrenalin-Edition) | Rendering Backend: Vulkan

Issue description

From what I can tell from my search, this exact issue has not been reported. Very similar issues exist, but none that describe this particular crash.

Description Every so often while my game is running it will crash my GPU with this error: image Error source link

And AMD's crash detection will show up with this message: image

However, it doesn't seem to be any fault of mine. Even if there's nothing going on in-game, or it's just a blank window containing a scene with a single node, it will randomly crash my GPU and show that error. It does seem to be a 3D issue, I've been unable to reproduce it in a project without 3D rendering.

I've tried out a few fixes from issues similar to this one:

But so far none of them have worked. šŸ˜¢

Steps to reproduce

As far as I can tell, this is an AMD GPU specific issue. So reproducing it on other hardware will probably be impossible.

Minimal reproduction project

No response

SnaveSutit commented 2 years ago

I am willing to join a voice call on Discord to show off the error and share more information if needed. SnaveSutit#0042

SnaveSutit commented 2 years ago

Recently got this in the log files as well, not sure if it helps

USER ERROR: Vulkan: Did not create swapchain successfully.
   at: prepare_buffers (drivers/vulkan/vulkan_context.cpp:2056) - Condition "err != VK_SUCCESS" is true. Breaking.
USER ERROR: Condition "err" is true. Returning: ERR_CANT_CREATE
   at: swap_buffers (drivers/vulkan/vulkan_context.cpp:2133) - Condition "err" is true. Returning: ERR_CANT_CREATE
USER ERROR: Condition "err" is true. Returning: ERR_CANT_CREATE
   at: _update_swap_chain (drivers/vulkan/vulkan_context.cpp:1746) - Condition "err" is true. Returning: ERR_CANT_CREATE
Flavelius commented 1 year ago

Same for me on beta11, nvidia gtx1060. It began happening after i played around with environment settings and skies. The editor even crashes when opening the scene containing that world environment. Here's a stripped down project that also crashes on startup (this even locks up my whole pc for around 20s) CrashProject.zip (DefaultWorldMap.tscn under Shared/Art/Environments/Test is the offending scene, crashes/freezes reliably for me after having opened the project the second time)

Calinou commented 1 year ago

Same for me on beta11, nvidia gtx1060. It began happening after i played around with environment settings and skies. The editor even crashes when opening the scene containing that world environment. Here's a stripped down project that also crashes on startup (this even locks up my whole pc for around 20s) CrashProject.zip (DefaultWorldMap.tscn under Shared/Art/Environments/Test is the offending scene, crashes/freezes reliably for me after having opened it the second time)

I can't reproduce this on 4.0.beta12 on with the project you linked. I've tried opening all 3 scenes, closing them and opening them a second time:

image

Specs: Fedora 37, GeForce RTX 4090 (NVIDIA 525.60.11)

Flavelius commented 1 year ago

It freezes reliably on my pc, even after rebooting (4.0 beta 11, Win 10, 16gb ram, nvidia gtx1060 6gb (driver version 516.59)). Here's a phone capture (screen capture didn't work obviously):

https://user-images.githubusercontent.com/8841352/212459616-3c929de7-1b71-4eaa-9b5d-71cd221cf355.mp4

This is right after opening the project from the project manager (the offending scene is set as main scene). It only started happening after i opened the project the second time though (not sure if that's relevant or just coincidence)

Edit: on beta12 it's the same, only that there are fewer errors printed to the console (these are all): ERROR: Condition "err" is true. Returning: ERR_CANT_CREATE at: swap_buffers (drivers/vulkan/vulkan_context.cpp:2299) ERROR: Condition "err" is true. Returning: ERR_CANT_CREATE at: swap_buffers (drivers/vulkan/vulkan_context.cpp:2299)

Calinou commented 1 year ago

While the Godot editor is closed, can you try editing the .tscn/.tres files with a text editor and remove entries such as ssao_enabled = true until it opens successfully?

Flavelius commented 1 year ago

I'll try that later when I'm at my PC again. I already edited the environment resource directly (in-editor inspector) earlier with the scene closed though, where it didn't seem to have any effect, but that could also have been the result of some caching

Flavelius commented 1 year ago

Ok, it happens when i add a Sky to the environment settings (removed all references to it from the tres before), even when i try to set it in the editor it freezes. I'm now also using the newest nvidia drivers for my card (528.02), but that doesn't make a difference.

Edit: it does not freeze when i edit the environment settings (sky), while no 'WorldEnvironment' is currently in the scene. When i then add that to the scene it works just fine, but when i add the sky while a WorldEnvironment with that tres assigned is currently active it reliably freezes (and after project reloads from then, just like before).

Zireael07 commented 1 year ago

Have you changed any Sky settings from the default?

Flavelius commented 1 year ago

It happens right when I select new sky in the dropdown for the corresponding sky field directly inside the environment settings, i can't even get to its sub settings.

Zireael07 commented 1 year ago

Once you've added a sky (maybe in a copy of the scene/project so that it doesn't freeze your main project) you should IIRC be able to edit its values in the scene file in any text editor - I'm wondering whether it's a general "fail to work with the shader" situation or if one of the settings is to blame (maybe the radiance size)

Flavelius commented 1 year ago

I was able to create a sky by using the steps mentioned above, deleting the worldenvironment, editing the env-resource, then creating the corresponding node, and it seems the freeze happens when or as the sky is created with no sky material. if i assign one (all of them work the same) and restart the project, it doesn't freeze, but when i delete the assigned material and restart the project it freezes again. Interestingly, before restarting in this case while a WorldEnvironment is still in the scene i can freely delete and reassign the sky material without it freezing.

Flavelius commented 1 year ago

It also freezes reliably with PanoramaSkyMaterial (with or without texture) assigned to the sky (but not with physical or procedural sky, nor with a physical sky that has a night sky texture assigned)

Flavelius commented 1 year ago

My main project and the example do not crash anymore in beta16 under the same scenarios.

RenaKunisaki commented 1 year ago

Having the same problem with AMD on Artix Linux. Just having the editor open is enough; it likes to happen when I'm not even at the computer. It happens about once per day. Takes out the entire desktop session.

Apr 13 21:06:12 greymon kernel: traps: gdbus[10754] general protection fault ip:7fe1989db537 sp:7fe196ffc420 error:0 in libc
.so.6[7fe198969000+15a000]
Apr 13 21:06:41 greymon kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to process the buffer list -22!
Calinou commented 1 year ago

@RenaKunisaki Which graphics card model do you have, and which Vulkan driver are you using (RADV or AMDVLK)?

RenaKunisaki commented 1 year ago

My system is ASUS ROG Strix G513QY, which has two GPUs: Radeon integrated with 512MB (its name seems to be just "AMD Radeon Graphics"), and Radeon RX 6800M with 12GB. By running DRI_PRIME=1 godot I can force it to run on the latter and that's when it's been crashing. So far using it without that option it hasn't crashed, but I'll report back if it does. I'm using AMDVLK.

RenaKunisaki commented 1 year ago

Just had it happen again with the integrated GPU.

Apr 18 12:24:00 greymon kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* amdgpu_vm_validate_pt_bos() failed.
Apr 18 12:24:00 greymon kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to process the buffer list -22!
Apr 18 12:24:08 greymon root[28496]: ACPI group/action undefined: button/up / UP
Apr 18 12:24:12 greymon kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* amdgpu_vm_validate_pt_bos() failed.
Apr 18 12:24:12 greymon kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to process the buffer list -22!
Apr 18 12:24:12 greymon kernel: pagefault_out_of_memory: 1309 callbacks suppressed
Apr 18 12:24:12 greymon kernel: Huh VM_FAULT_OOM leaked out to the #PF handler. Retrying PF
Apr 18 12:24:12 greymon kernel: Huh VM_FAULT_OOM leaked out to the #PF handler. Retrying PF
Apr 18 12:24:12 greymon kernel: Huh VM_FAULT_OOM leaked out to the #PF handler. Retrying PF
Apr 18 12:24:12 greymon kernel: Huh VM_FAULT_OOM leaked out to the #PF handler. Retrying PF
Apr 18 12:24:12 greymon kernel: Huh VM_FAULT_OOM leaked out to the #PF handler. Retrying PF
Apr 18 12:24:12 greymon kernel: Huh VM_FAULT_OOM leaked out to the #PF handler. Retrying PF
Apr 18 12:24:12 greymon kernel: Huh VM_FAULT_OOM leaked out to the #PF handler. Retrying PF
Apr 18 12:24:12 greymon kernel: Huh VM_FAULT_OOM leaked out to the #PF handler. Retrying PF
Apr 18 12:24:12 greymon kernel: Huh VM_FAULT_OOM leaked out to the #PF handler. Retrying PF
Apr 18 12:24:12 greymon kernel: Huh VM_FAULT_OOM leaked out to the #PF handler. Retrying PF
Apr 18 12:30:07 greymon root[29265]: ACPI group/action undefined: button/up / UP
Apr 18 12:35:11 greymon root[29717]: ACPI group/action undefined: button/up / UP
Apr 18 12:36:22 greymon root[29835]: ACPI group/action undefined: button/up / UP
Apr 18 12:36:38 greymon kernel: godot.linuxbsd.[29853]: segfault at 1ab000001f9 ip 000001ab000001f9 sp 00007ffd40029f18 error 14 in godot.linuxbsd.template_release.x86_64[55fccb97f000+2a3000] likely on CPU 9 (core 4, socket 0)
Apr 18 12:36:38 greymon kernel: Code: Unable to access opcode bytes at 0x1ab000001cf.

Since my RAM usage is normally around 50% and there are no logs mentioning oom-killer, I assume either it was a VRAM allocation failure or godot somehow leaked 32GB within a few minutes. This crash actually seems to be accidentally exploiting CVE-2023-0047, at least in this instance. I guess this is probably a driver bug, but only godot seems to trigger it; I've never had it happen before I started using godot.

darksylinc commented 1 year ago

Hi!

Could you test this version to see if you can still repro the problem? Thanks.