godotengine / godot

Godot Engine – Multi-platform 2D and 3D game engine
https://godotengine.org
MIT License
88.84k stars 20.15k forks source link

fence_wait failed in rendering_device_driver_vulkan.cpp #94177

Open MartinFretigne opened 2 months ago

MartinFretigne commented 2 months ago

Tested versions

System information

Android 14 - Godot Engine v4.3.beta2.official.b75f0485b - Forward Mobile

Issue description

With my project, on Google Pixel 6a phone (Vulkan 1.3.269 - Forward Mobile - Using Device #0: ARM - Mali-G78), I get the following error USER ERROR: Unable to acquire framebuffer, continuously, after a while (its random, it happens sometimes after 1 minute, sometimes after 10). Its working fine with the compatibility renderer.

07-11 02:18:37.644  8470  8542 E godot   : USER ERROR: Condition "err != VK_SUCCESS" is true. Returning: FAILED
07-11 02:18:37.644  8470  8542 E godot   :    at: fence_wait (drivers/vulkan/rendering_device_driver_vulkan.cpp:2066)
07-11 02:18:37.644  8470  8542 E BufferQueueProducer: [SurfaceView[com.ggg.Game1/com.godot.game.GodotApp]#1(BLAST Consumer)1](id:211600000001,api:1,p:8470,c:8470) dequeueBuffer: attempting to exceed the max dequeued buffer count (2)
07-11 02:18:37.644  8470  8542 W vulkan  : dequeueBuffer timed out: Function not implemented (-38)
07-11 02:18:37.645  8470  8542 E godot   : USER ERROR: Unable to acquire framebuffer.
07-11 02:18:37.645  8470  8542 E godot   :    at: screen_prepare_for_drawing (servers/rendering/rendering_device.cpp:3503)
07-11 02:18:37.645  8470  8542 E godot   : USER ERROR: Condition "err != VK_SUCCESS" is true. Returning: FAILED
07-11 02:18:37.645  8470  8542 E godot   :    at: command_queue_execute_and_present (drivers/vulkan/rendering_device_driver_vulkan.cpp:2266)
07-11 02:18:37.645  8470  8542 E godot   : USER ERROR: Condition "err != VK_SUCCESS" is true. Returning: FAILED
07-11 02:18:37.645  8470  8542 E godot   :    at: command_queue_execute_and_present (drivers/vulkan/rendering_device_driver_vulkan.cpp:2266)
07-11 02:18:37.645  8470  8542 E godot   : USER ERROR: Condition "err != VK_SUCCESS" is true. Returning: FAILED
07-11 02:18:37.645  8470  8542 E godot   :    at: fence_wait (drivers/vulkan/rendering_device_driver_vulkan.cpp:2066)
07-11 02:18:37.645  8470  8542 E BufferQueueProducer: [SurfaceView[com.ggg.Game1/com.godot.game.GodotApp]#1(BLAST Consumer)1](id:211600000001,api:1,p:8470,c:8470) dequeueBuffer: attempting to exceed the max dequeued buffer count (2)
07-11 02:18:37.645  8470  8542 W vulkan  : dequeueBuffer timed out: Function not implemented (-38)
07-11 02:18:37.645  8470  8542 E godot   : USER ERROR: Unable to acquire framebuffer.
07-11 02:18:37.645  8470  8542 E godot   :    at: screen_prepare_for_drawing (servers/rendering/rendering_device.cpp:3503)

It may be related to UI elements, since when I delete some of them (Panels and RichTextLabels) the game wont freeze. It may also be a coincidence, since I was unable to reproduce with a basic scene + my UI. Sorry for the lack of info, I know it would be a miracle if someone understand the problem with the small amount of info I'm able to provide.

Anyway, I saw this bug about "screen_prepare_for_drawing Unable to acquire framebuffer" too, maybe its related ? https://github.com/godotengine/godot/issues/94104 Even though my issue happen on a mobile + mali and not on desktop + nvidia.

Steps to reproduce

N/A

Minimal reproduction project (MRP)

Run the project for a while. At some point the fire will stop moving and you should see the above errors in adb logcat. game1-mrp.zip

huwpascoe commented 2 months ago

If possible, please share the app's Android logcat events around the time of the crash. Might show why it's suddenly happening.

akien-mga commented 2 months ago

Anyway, I saw this bug about "screen_prepare_for_drawing Unable to acquire framebuffer" too, maybe its related ? #94104 Even though my issue happen on a mobile + mali and not on desktop + nvidia.

It does sound related to that issue, CC @DarioSamo.

@MartinFretigne Aside from the error spam, does the game/app work fine? On desktop + nvidia it was found to be a benign issue and we've just silenced the error, which might have fixed the issue for mobile + mali too.

DarioSamo commented 2 months ago

Anyway, I saw this bug about "screen_prepare_for_drawing Unable to acquire framebuffer" too, maybe its related ? #94104 Even though my issue happen on a mobile + mali and not on desktop + nvidia.

It does sound related to that issue, CC @DarioSamo.

Sounds unlikely if it keeps happening. It's benign on the other case because it just happens during resizing but it's just due to the swap chain out of date. From the sound of it in the log this sounds like it pretty much never recovers and the internal error doesn't sound very promising.

07-10 14:44:00.003 13570 13622 E BufferQueueProducer: [SurfaceView[com.ggg.Game1/com.godot.game.GodotApp]#1(BLAST Consumer)1](id:350200000001,api:1,p:13570,c:13570) dequeueBuffer: attempting to exceed the max dequeued buffer count (2)
07-10 14:44:00.003 13570 13622 W vulkan  : dequeueBuffer timed out: Function not implemented (-38)

That sounds like an internal driver error. I'd imagine this happens before the PR that added the message and it's just showing the error message instead of silently freezing.

MartinFretigne commented 2 months ago

l2.txt I attached the adb logcats logs. @huwpascoe

@akien-mga No, from the moment where the errors are spammed, the app is unusable, as if it is frozen.

DarioSamo commented 2 months ago

The very first error you get is:

07-11 02:18:37.644  8470  8542 E godot   : USER ERROR: Condition "err != VK_SUCCESS" is true. Returning: FAILED
07-11 02:18:37.644  8470  8542 E godot   :    at: fence_wait (drivers/vulkan/rendering_device_driver_vulkan.cpp:2066)

This shows screen prepare for drawing is clearly unrelated. You likely got a VK_DEVICE_LOST on that very first fence wait and are therefore getting a similar error every frame after. The GPU just froze at some point for whatever reason and it's just not recovering.

I'd clarify this on both the issue title and the initial post as it's very important, the error that is being repeated is just because another error triggered first.

MartinFretigne commented 2 months ago

If the GPU was frozen, would I still be able to use other apps ? Because only this app freeze when the problem happen while Android and other apps are still usable. By the way, this problem occurs on the two pixel 6a that I have. I will clarify @DarioSamo .

DarioSamo commented 2 months ago

If the GPU was frozen, would I still be able to use other apps?

Yes, by frozen I mean it's frozen to the Godot application. There's a few states where the application will never recover if the driver fails. If you investigate the error codes returned by the functions that failed, most likely vkWaitForFences, you'll likely encounter the DEVICE_LOST error.

I think you'll have to provide whatever project you're having trouble with here, as you can even get this error from content like custom shaders and such if they're not correct.

MartinFretigne commented 2 months ago

game1-mrp.zip I managed to reduce the size of my app by 99% and still got the same error (after about one hour). I added the zip of my app.

The main scene (mrp4.tscn) should be left running to reproduce the problem (freeze and errors).

I could try to reduce the size of my app even more, but I feel like the more I delete things, the less often the problem occurs.

huwpascoe commented 2 months ago

(There is something unusual, identical copies of a font embedded in theme.tres and mrp4.tscn, causing both to be very big. It'd be better for loading time and organization if the fonts were saved as a separate resource.)

One hour... what states did the app go through in that hour? Did it ever go into the background? Screen turn off? etc.

MartinFretigne commented 2 months ago

The app stayed in the foreground and the screen stayed on the whole time. I just started the app again, this time the error occurred in 4 minutes. Logs attached logcat2.txt

(I will look into the font embedded in the theme and scene, that's not deliberate, I don't care about the font at all at this point. Thank you. edit: I removed the font then tested again -> same issue. It was worth a try.)

huwpascoe commented 2 months ago

What we know

I think it's a race condition, not a driver thing.

huwpascoe commented 2 months ago

Fix Queue Synchronization

Looks like the work to resolve this might already be done.

darksylinc commented 2 months ago

Fix Queue Synchronization Looks like the work to resolve this might already be done.

I doubt that. That code seems to be a performance optimization.

The true problem is that the device appears to be lost. Even it avoids the error there, it's going to error 2 lines later with the call to vkResetFences.

Btw is it possible that you keep rendering more and more vertices?

ARM Mali has an upper bound of 180MBs of vertex data rendered per VkRenderPass. If this threshold is exceeded the driver will emit VK_ERROR_DEVICE_LOST.

AFAIK newer Malis don't have this limit but I'm not 90% sure, and I also don't know if your device has the limit.

darksylinc commented 2 months ago
07-10 14:44:00.003 13570 13622 E BufferQueueProducer: [SurfaceView[com.ggg.Game1/com.godot.game.GodotApp]#1(BLAST Consumer)1](id:350200000001,api:1,p:13570,c:13570) dequeueBuffer: attempting to exceed the max dequeued buffer count (2)
07-10 14:44:00.003 13570 13622 W vulkan  : dequeueBuffer timed out: Function not implemented (-38)

That sounds like an internal driver error. I'd imagine this happens before the PR that added the message and it's just showing the error message instead of silently freezing.

Yes and no. The error is saying that Godot has queued up the maximum number of swapchain (or anything else, like command submissions) and the GPU is not consuming them.

Whether this is happening because of a deadlock, a GPU fault, is anyone's guess and could be either Godot's or the driver's (or GPUs!) fault.

MartinFretigne commented 2 months ago

Btw is it possible that you keep rendering more and more vertices?

No. The scene in the zip is static, it does not instantiate objects during runtime after the _ready function. I guess I could add a printf in fence_wait to make sure the device is indeed 'lost', I don't believe I will have time to do that before I leave (this weekend, for about 4 weeks), but who knows. But at worst I will do it when I'm back (with any others suggestions I read here).

MartinFretigne commented 1 month ago

Using the latest code on master, I printed the error in fence_wait, it shows -4 (VK_ERROR_DEVICE_LOST). I don't know what to do from here.

vvvvvvitor commented 3 days ago

I just got this error on my game

E 0:00:02:0437   fence_wait: Condition "err != VK_SUCCESS" is true. Returning: FAILED
  <Origem C++>   drivers/vulkan/rendering_device_driver_vulkan.cpp:2066 @ fence_wait()