godotengine / godot

Godot Engine – Multi-platform 2D and 3D game engine
https://godotengine.org
MIT License
88.75k stars 20.12k forks source link

Android editor crash when interacting with some GUI elements #94741

Open matheusmdx opened 1 month ago

matheusmdx commented 1 month ago

Tested versions

Reproducible: Latest master v4.3.rc.custom_build.e343dbbcc, 4.3 betas 1, 2 and 3.

System information

Godot v4.3.rc (e343dbbcc) - Android - Vulkan (Mobile) - integrated Adreno (TM) 506 - (8 Threads)

Issue description

Android editor crashes with a signal 11 after simple intereactions, only happens in vulkan mobile render, compatibility render works normally:

Example 1 https://github.com/user-attachments/assets/19ae6608-1775-43d4-8f91-9ab2b86bcd5b
Example 2 https://github.com/user-attachments/assets/37c173c5-1701-4627-b223-f46c35cb60e2
Example 3 https://github.com/user-attachments/assets/7f4ec07a-30f9-4ba0-9b56-ee8af684932c
Backtrace ``` 07-25 10:55:47.602 9029 9066 F libc : Fatal signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 0x0 in tid 9066 (VkThread), pid 9029 (e.editor.v4.dev) 07-25 10:55:48.067 9124 9124 F DEBUG : *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** 07-25 10:55:48.067 9124 9124 F DEBUG : Build fingerprint: 'xiaomi/onc/onc:10/QKQ1.191008.001/V11.0.2.0.QFLMIXM:user/release-keys' 07-25 10:55:48.067 9124 9124 F DEBUG : Revision: '0' 07-25 10:55:48.067 9124 9124 F DEBUG : ABI: 'arm64' 07-25 10:55:48.084 9124 9124 F DEBUG : Timestamp: 2024-07-25 10:55:48-0300 07-25 10:55:48.084 9124 9124 F DEBUG : pid: 9029, tid: 9066, name: VkThread >>> org.godotengine.editor.v4.dev <<< 07-25 10:55:48.084 9124 9124 F DEBUG : uid: 10420 07-25 10:55:48.085 9124 9124 F DEBUG : signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 0x0 07-25 10:55:48.085 9124 9124 F DEBUG : Cause: null pointer dereference 07-25 10:55:48.085 9124 9124 F DEBUG : x0 0000000000000000 x1 0000000000000003 x2 0000007bc22b8800 x3 0000007ba0f10320 07-25 10:55:48.085 9124 9124 F DEBUG : x4 0000007bc22f1c00 x5 0000000000000000 x6 0000000000000000 x7 0000000000000001 07-25 10:55:48.085 9124 9124 F DEBUG : x8 0000000000000002 x9 0000000000000001 x10 0000000000000001 x11 0000000000000004 07-25 10:55:48.085 9124 9124 F DEBUG : x12 0000007bc3125370 x13 0000000000000000 x14 0000000000000000 x15 0000000000000002 07-25 10:55:48.085 9124 9124 F DEBUG : x16 0000007bc2c61ef8 x17 0000007bc312b000 x18 0000007bc838a000 x19 0000007bc312b000 07-25 10:55:48.085 9124 9124 F DEBUG : x20 0000007bc22b8800 x21 0000007bdbe35710 x22 0000000000000000 x23 0000007bdbe35410 07-25 10:55:48.085 9124 9124 F DEBUG : x24 0000000000000001 x25 0000007bb7b50e00 x26 0000000000000028 x27 0000007bc304c020 07-25 10:55:48.085 9124 9124 F DEBUG : x28 0000000000000000 x29 0000007bdbe38570 07-25 10:55:48.085 9124 9124 F DEBUG : sp 0000007bdbe34af0 lr 0000007bdbe35410 pc 0000007bc2951004 07-25 10:55:48.463 9124 9124 F DEBUG : 07-25 10:55:48.463 9124 9124 F DEBUG : backtrace: 07-25 10:55:48.464 9124 9124 F DEBUG : #00 pc 00000000000f5004 /vendor/lib64/hw/vulkan.msm8953.so (BuildId: 405d74bc1d0a34bd6c1cf5bd55b5d33a) 07-25 10:55:48.464 9124 9124 F DEBUG : #01 pc 00000000000a0468 /vendor/lib64/hw/vulkan.msm8953.so (qglinternal::vkCmdDrawIndexed(VkCommandBuffer_T*, unsigned int, unsigned int, unsigned int, int, unsigned int)+232) (BuildId: 405d74bc1d0a34bd6c1cf5bd55b5d33a) 07-25 10:55:48.464 9124 9124 F DEBUG : #02 pc 0000000003d8bdfc /data/app/org.godotengine.editor.v4.dev-cKyDzGZzSAHZt6_m2kCbzw==/lib/arm64/libgodot_android.so 07-25 10:55:48.464 9124 9124 F DEBUG : #03 pc 0000000006b29d64 /data/app/org.godotengine.editor.v4.dev-cKyDzGZzSAHZt6_m2kCbzw==/lib/arm64/libgodot_android.so 07-25 10:55:48.464 9124 9124 F DEBUG : #04 pc 0000000006b2b088 /data/app/org.godotengine.editor.v4.dev-cKyDzGZzSAHZt6_m2kCbzw==/lib/arm64/libgodot_android.so 07-25 10:55:48.464 9124 9124 F DEBUG : #05 pc 0000000006b32058 /data/app/org.godotengine.editor.v4.dev-cKyDzGZzSAHZt6_m2kCbzw==/lib/arm64/libgodot_android.so 07-25 10:55:48.464 9124 9124 F DEBUG : #06 pc 0000000006a79120 /data/app/org.godotengine.editor.v4.dev-cKyDzGZzSAHZt6_m2kCbzw==/lib/arm64/libgodot_android.so 07-25 10:55:48.464 9124 9124 F DEBUG : #07 pc 0000000006a78f34 /data/app/org.godotengine.editor.v4.dev-cKyDzGZzSAHZt6_m2kCbzw==/lib/arm64/libgodot_android.so 07-25 10:55:48.464 9124 9124 F DEBUG : #08 pc 0000000006c767b0 /data/app/org.godotengine.editor.v4.dev-cKyDzGZzSAHZt6_m2kCbzw==/lib/arm64/libgodot_android.so 07-25 10:55:48.464 9124 9124 F DEBUG : #09 pc 0000000006b34c0c /data/app/org.godotengine.editor.v4.dev-cKyDzGZzSAHZt6_m2kCbzw==/lib/arm64/libgodot_android.so 07-25 10:55:48.464 9124 9124 F DEBUG : #10 pc 0000000006b36ef8 /data/app/org.godotengine.editor.v4.dev-cKyDzGZzSAHZt6_m2kCbzw==/lib/arm64/libgodot_android.so 07-25 10:55:48.464 9124 9124 F DEBUG : #11 pc 0000000002dde450 /data/app/org.godotengine.editor.v4.dev-cKyDzGZzSAHZt6_m2kCbzw==/lib/arm64/libgodot_android.so 07-25 10:55:48.464 9124 9124 F DEBUG : #12 pc 0000000002d7bb0c /data/app/org.godotengine.editor.v4.dev-cKyDzGZzSAHZt6_m2kCbzw==/lib/arm64/libgodot_android.so 07-25 10:55:48.464 9124 9124 F DEBUG : #13 pc 0000000002d9d55c /data/app/org.godotengine.editor.v4.dev-cKyDzGZzSAHZt6_m2kCbzw==/lib/arm64/libgodot_android.so 07-25 10:55:48.464 9124 9124 F DEBUG : #14 pc 0000000000140350 /apex/com.android.runtime/lib64/libart.so (art_quick_generic_jni_trampoline+144) (BuildId: 44dbc6a587cb484a8b272d1608feb17c) 07-25 10:55:48.464 9124 9124 F DEBUG : #15 pc 00000000020399dc /memfd:/jit-cache (deleted) (org.godotengine.godot.vulkan.VkRenderer.onVkDrawFrame+44) 07-25 10:55:48.464 9124 9124 F DEBUG : #16 pc 00000000020385f4 /memfd:/jit-cache (deleted) (org.godotengine.godot.vulkan.VkThread.run+948) 07-25 10:55:48.464 9124 9124 F DEBUG : #17 pc 000000000013763c /apex/com.android.runtime/lib64/libart.so (art_quick_osr_stub+60) (BuildId: 44dbc6a587cb484a8b272d1608feb17c) 07-25 10:55:48.464 9124 9124 F DEBUG : #18 pc 00000000003380f8 /apex/com.android.runtime/lib64/libart.so (art::jit::Jit::MaybeDoOnStackReplacement(art::Thread*, art::ArtMethod*, unsigned int, int, art::JValue*)+1688) (BuildId: 44dbc6a587cb484a8b272d1608feb17c) 07-25 10:55:48.464 9124 9124 F DEBUG : #19 pc 00000000005ac56c /apex/com.android.runtime/lib64/libart.so (MterpMaybeDoOnStackReplacement+212) (BuildId: 44dbc6a587cb484a8b272d1608feb17c) 07-25 10:55:48.464 9124 9124 F DEBUG : #20 pc 0000000000136350 /apex/com.android.runtime/lib64/libart.so (MterpHelpers+240) (BuildId: 44dbc6a587cb484a8b272d1608feb17c) 07-25 10:55:48.464 9124 9124 F DEBUG : #21 pc 0000000000029f46 [anon:dalvik-classes2.dex extracted in memory from /data/app/org.godotengine.editor.v4.dev-cKyDzGZzSAHZt6_m2kCbzw==/base.apk!classes2.dex] (org.godotengine.godot.vulkan.VkThread.run+282) 07-25 10:55:48.464 9124 9124 F DEBUG : #22 pc 00000000002b4b14 /apex/com.android.runtime/lib64/libart.so (_ZN3art11interpreterL7ExecuteEPNS_6ThreadERKNS_20CodeItemDataAccessorERNS_11ShadowFrameENS_6JValueEbb.llvm.16703252159117058578+240) (BuildId: 44dbc6a587cb484a8b272d1608feb17c) 07-25 10:55:48.464 9124 9124 F DEBUG : #23 pc 0000000000592d18 /apex/com.android.runtime/lib64/libart.so (artQuickToInterpreterBridge+1032) (BuildId: 44dbc6a587cb484a8b272d1608feb17c) 07-25 10:55:48.464 9124 9124 F DEBUG : #24 pc 0000000000140468 /apex/com.android.runtime/lib64/libart.so (art_quick_to_interpreter_bridge+88) (BuildId: 44dbc6a587cb484a8b272d1608feb17c) 07-25 10:55:48.464 9124 9124 F DEBUG : #25 pc 0000000000137334 /apex/com.android.runtime/lib64/libart.so (art_quick_invoke_stub+548) (BuildId: 44dbc6a587cb484a8b272d1608feb17c) 07-25 10:55:48.464 9124 9124 F DEBUG : #26 pc 0000000000145fec /apex/com.android.runtime/lib64/libart.so (art::ArtMethod::Invoke(art::Thread*, unsigned int*, unsigned int, art::JValue*, char const*)+244) (BuildId: 44dbc6a587cb484a8b272d1608feb17c) 07-25 10:55:48.464 9124 9124 F DEBUG : #27 pc 00000000004b171c /apex/com.android.runtime/lib64/libart.so (art::(anonymous namespace)::InvokeWithArgArray(art::ScopedObjectAccessAlreadyRunnable const&, art::ArtMethod*, art::(anonymous namespace)::ArgArray*, art::JValue*, char const*)+104) (BuildId: 44dbc6a587cb484a8b272d1608feb17c) 07-25 10:55:48.464 9124 9124 F DEBUG : #28 pc 00000000004b2830 /apex/com.android.runtime/lib64/libart.so (art::InvokeVirtualOrInterfaceWithJValues(art::ScopedObjectAccessAlreadyRunnable const&, _jobject*, _jmethodID*, jvalue const*)+416) (BuildId: 44dbc6a587cb484a8b272d1608feb17c) 07-25 10:55:48.464 9124 9124 F DEBUG : #29 pc 00000000004f31ec /apex/com.android.runtime/lib64/libart.so (art::Thread::CreateCallback(void*)+1176) (BuildId: 44dbc6a587cb484a8b272d1608feb17c) 07-25 10:55:48.464 9124 9124 F DEBUG : #30 pc 00000000000e6a00 /apex/com.android.runtime/lib64/bionic/libc.so (__pthread_start(void*)+36) (BuildId: f58dc2e5c0832afee4aa38168e971c9d) 07-25 10:55:48.464 9124 9124 F DEBUG : #31 pc 0000000000084c6c /apex/com.android.runtime/lib64/bionic/libc.so (__start_thread+64) (BuildId: f58dc2e5c0832afee4aa38168e971c9d) ```


I bisected the issue to #84976:

Captura 2024-07-23 15-14-04-774883


Another detail is the editor don't crash if you create a Node2D or a Control node:

https://github.com/user-attachments/assets/74d2a91e-8b03-4541-9056-4627d24b0aa1


Investigating futher i found that if you do any action that create the editor_layout.cfg file (creating a node and saving the scene or creating a node and reloading current project from Project > Reload current project) makes the project don't crash anymore, but if i delete editor_layout.cfg again or create the node but don't do any of the two actions i said above and kill the editor, the crashes come back.

Steps to reproduce

See the videos above

Minimal reproduction project (MRP)

No needed, any new project can reproduce

akien-mga commented 1 month ago

I'm tentatively putting this in the 4.3 milestone / blockers that we track as it's a regression from a 4.3 PR. But I don't consider it high priority at this stage so close to the release, as it's a crash in the Adreno GPU driver and thus probably a driver bug. Figuring out a workaround might take a while as it might be difficult to reproduce.

Still, CC @DarioSamo who wrote the ARG PR, and @Alex2782 @m4gr3d @clayjohn who are used to debugging Android driver bugs.

DarioSamo commented 1 month ago

We have a list of devices that were affected by a similar regression. Perhaps this device just needs to be added to the detection here.

I say try forcing this particular workaround to true and see if it fixes the issue for you first.

https://github.com/godotengine/godot/blob/e343dbbcc1030f04dc5833f1c19d267a17332ca9/drivers/vulkan/rendering_context_driver_vulkan.cpp#L530-L553

Alex2782 commented 1 month ago

i have a PR for Adreno 5xx uniform crash https://github.com/godotengine/godot/pull/92611

    //TODO: check 'driverVersion'?
    // no crash on Fujitsu F-01L - Adreno (TM) 506, Vulkan 1.0.61, driverVersion = 54185879
    r_device.workarounds.force_material_uniform_set =
            r_device.vendor == VENDOR_QUALCOMM &&
            p_device_properties.deviceID >= 0x5000000 && // Adreno 5xx
            p_device_properties.deviceID <= 0x5999999;

It took at least 3 weeks to isolate this crash via Firebase Test Lab and print_line. I don't own Adreno 5xx yet to debug it properly via Android Studio.

Alex2782 commented 1 month ago

@matheusmdx To me it looks like an emulator, or is it something like a 'RemoteViewer' ?

from Logs: Build fingerprint: 'xiaomi/onc/onc:10 (Mi A2 Lite? also has an Adreno 506 on Firebase Test Lab)

image

matheusmdx commented 1 month ago

We have a list of devices that were affected by a similar regression. Perhaps this device just needs to be added to the detection here.

I say try forcing this particular workaround to true and see if it fixes the issue for you first.

https://github.com/godotengine/godot/blob/e343dbbcc1030f04dc5833f1c19d267a17332ca9/drivers/vulkan/rendering_context_driver_vulkan.cpp#L530-L553

I changed for a hardcoded true but the crash persists



i have a PR for Adreno 5xx uniform crash #92611

  //TODO: check 'driverVersion'?
  // no crash on Fujitsu F-01L - Adreno (TM) 506, Vulkan 1.0.61, driverVersion = 54185879
  r_device.workarounds.force_material_uniform_set =
          r_device.vendor == VENDOR_QUALCOMM &&
          p_device_properties.deviceID >= 0x5000000 && // Adreno 5xx
          p_device_properties.deviceID <= 0x5999999;

It took at least 3 weeks to isolate this crash via Firebase Test Lab and print_line. I don't own Adreno 5xx yet to debug it properly via Android Studio.

I cherry-picked this pr but unfortunately not worked too



@matheusmdx To me it looks like an emulator, or is it something like a 'RemoteViewer' ?

This is a remote view, scrcpy: https://github.com/Genymobile/scrcpy. I used that to make the testing more faster, but i also tested without using this to make sure that didn't had any interference in the tests.

I can help test any possible solution, just gimme the instrusctions/apk. I also can try get a better backtrace if exists any way to get a more complete one,

akien-mga commented 1 month ago

You can get a backtrace with debug symbols by passing debug_symbols=yes to SCons (or dev_build=yes, not sure if it expects dev stuff) and building the apk with ./gradlew generateDevTemplate in platform/android/java.

Btw @m4gr3d we really need to document this on https://docs.godotengine.org/en/latest/contributing/development/compiling/compiling_for_android.html I see the page was updated to instruct using the new generate_apk=yes SCons option, but I suppose this only handles the stripped case and not dev templates?

We should probably still document the various Gradle task for advanced users.

DarioSamo commented 1 month ago

I changed for a hardcoded true but the crash persists

Seems we're pretty much in the situation of having to debug and find yet another workaround for a particular Adreno device because the Render Graph changed the order of operations then. Unfortunately the regression will point you to that particular commit, but it's just a situation of being lucky enough to not run into the driver bug before and now we are.

You can try messing around with these two macros and see if you get any different results, as these will make the render graph basically regress into the behavior of the previous version: https://github.com/godotengine/godot/blob/e343dbbcc1030f04dc5833f1c19d267a17332ca9/servers/rendering/rendering_device.cpp#L104-L117

Alex2782 commented 1 month ago

My PR probably doesn't cover all editor functions yet, but the crash looks identical.

Details vkCmdDrawIndexed ---------------------------- `vkCmdDrawIndexed(VkCommandBuffer_T*, unsigned int, unsigned int, unsigned int, int, unsigned int)+232)` from `uniform` crash ``` F libc : Fatal signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 0x0 in tid 19822 (VkThread), pid 19502 (lex.shader_test) F DEBUG : backtrace: F DEBUG : #00 pc 00000000000f5004 /vendor/lib64/hw/vulkan.msm8953.so (BuildId: 4059f276877a7a61cc16b085624608be) F DEBUG : #01 pc 00000000000a0468 /vendor/lib64/hw/vulkan.msm8953.so (qglinternal::vkCmdDrawIndexed(VkCommandBuffer_T*, unsigned int, unsigned int, unsigned int, int, unsigned int)+232) (BuildId: 4059f276877a7a61cc16b085624608be) ``` **yours** ``` 07-25 10:55:48.463 9124 9124 F DEBUG : backtrace: 07-25 10:55:48.464 9124 9124 F DEBUG : #00 pc 00000000000f5004 /vendor/lib64/hw/vulkan.msm8953.so (BuildId: 405d74bc1d0a34bd6c1cf5bd55b5d33a) 07-25 10:55:48.464 9124 9124 F DEBUG : #01 pc 00000000000a0468 /vendor/lib64/hw/vulkan.msm8953.so (qglinternal::vkCmdDrawIndexed(VkCommandBuffer_T*, unsigned int, unsigned int, unsigned int, int, unsigned int)+232) (BuildId: 405d74bc1d0a34bd6c1cf5bd55b5d33a) ```

my old debug branch

void RenderingDeviceGraph::_run_draw_list_command For me it was more helpful at that time to output everything via print_line in order to compare logs with other devices.


However, you could also set PRINT_RENDER_GRAPH to 1, in which case the function _print_render_commands is used.

#include "rendering_device_graph.h"

#define PRINT_RENDER_GRAPH 0
#define FORCE_FULL_ACCESS_BITS 0
#define PRINT_RESOURCE_TRACKER_TOTAL 0
#define PRINT_COMMAND_RECORDING 0

DrawListInstruction::TYPE_BIND_UNIFORM_SET

Issue description from my PR

An issue occurs when the vkCmdDrawIndexed function is called multiple times between vkCmdBeginRenderPass and vkCmdEndRenderPass, using a shader that includes a uniform variable (#define MATERIAL_UNIFORMS_USED enabled). The crash is caused if the subsequent render operations (Nodes) within the same render pass do not use material shaders with a uniform variable.

In other words, if a shader with uniform variables is used within a render pass, all following render operations up to the end of the render pass must also use shaders with uniform variables. Otherwise, this can lead to a crash. This workaround ensures that this does not happen by making sure that shaders with uniform variables are consistently used between vkCmdBeginRenderPass and vkCmdEndRenderPass.


Maybe we can add a better global solution in there somewhere. You could perhaps try to always execute vkCmdEndRenderPass after vkCmdDrawIndexed. And if the list has not been completely processed then run vkCmdBeginRenderPass again to check if there are still crashes. I may have time at the weekend to check the effects on my devices and see if the uniform crash can be fixed.

matheusmdx commented 1 month ago

You can get a backtrace with debug symbols by passing debug_symbols=yes to SCons (or dev_build=yes, not sure if it expects dev stuff) and building the apk with ./gradlew generateDevTemplate in platform/android/java.

I tried this but ./gradlew generateDevTemplate just skip all the tasks and don't generate anything. The backtrace i put in this issue was generated using a apk builded with dev_build=yes and ./gradlew generateGodotEditor, i also tried using debug_symbols=yes instead but doesn't changed the backtrace.



You can try messing around with these two macros and see if you get any different results, as these will make the render graph basically regress into the behavior of the previous version:

https://github.com/godotengine/godot/blob/e343dbbcc1030f04dc5833f1c19d267a17332ca9/servers/rendering/rendering_device.cpp#L104-L117

Changing RENDER_GRAPH_REORDER to 0 stops the crash, changing RENDER_GRAPH_FULL_BARRIERS to 1 doesn't changed anything.



However, you could also set PRINT_RENDER_GRAPH to 1, in which case the function _print_render_commands is used.

Here what was printed, i just opened the editor and when editor loaded i triggered the crash: print render graph.txt

Alex2782 commented 1 month ago

Thanks for testing!

Changing RENDER_GRAPH_REORDER to 0 stops the crash

I will revise PR #92611. Firebase Test Lab has up to 7 Adreno 5xx devices, on one device I could not reproduce the uniform crash.

    //TODO: check 'driverVersion'?
    // no crash on Fujitsu F-01L - Adreno (TM) 506, Vulkan 1.0.61, driverVersion = 54185879
    r_device.workarounds.force_material_uniform_set =
            r_device.vendor == VENDOR_QUALCOMM &&
            p_device_properties.deviceID >= 0x5000000 && // Adreno 5xx
            p_device_properties.deviceID <= 0x5999999;

I'm not sure exactly which driver versions they are.

            p_device_properties.deviceID >= 0x6000000 && // Adreno 6xx
            p_device_properties.driverVersion < VK_MAKE_VERSION(512, 503, 0) 
DarioSamo commented 1 month ago

Changing RENDER_GRAPH_REORDER to 0 stops the crash, changing RENDER_GRAPH_FULL_BARRIERS to 1 doesn't changed anything.

Pretty much fits the exact same situation of the other Adreno crash that the workaround was introduced for, where basically Godot was lucky enough to not crash on this particular hardware, but reordering the operations triggers the error in the driver.

The problem is not reordering the graph in this particular hardware would just be hiding the issue, because as soon as the renderer changes its behavior, it could reintroduce the error again. We're much better off investigating what exact sequence of events makes the driver crash here so the render graph can insert workarounds as needed, which would guarantee the renderer never breaks on this hardware in the future.

Without this particular hardware however, we're left pretty much guessing at this point. I'm afraid you'll have to dig deeper into it, probably by simplifying the project as much as possible and looking at the output of the render graph, and potentially modifying what looks like could be the problem. When dealing with a driver bug, we don't really have much left to review on our side as it's basically dealing with a black box where some behavior that is known to be correct just doesn't work.

One possible hint I'll give is that the previous crash was related to the relation between compute and drawing, and the old version was guaranteed to dispatch compute first before doing any drawing on the frame. Reordering can cause drawing to happen before compute, and that's what triggered the crash. You said you verified the workaround didn't fix it for you, but so far it's sounding like the exact same issue. I think it's probably worth double checking.

Alex2782 commented 1 month ago

I'm afraid you'll have to dig deeper into it, probably by simplifying the project as much as possible and looking at the output of the render graph,

79760 #82602 #85097 #86037

Maybe same issues. There are MRP to reproduce it outside the editor. (my test project: ShaderTest.zip)

@matheusmdx: If you have time, please try it out. RENDER_GRAPH_REORDER = 0 -> no crash?


4.x Release Blockers and Status: Bad My suggestion would be to simply apply it to all Adreno 5xx, slightly (?) worse performance is still better than crashes.

Some stats / PlayStore Device Catalog ----------------------------------- `PlayStore Device Catalog` contains 1150 `Adreno 5xx` devices out of a total of 17339 (this is a share of approx. 6.63%) | Android API | Count | | ------------- | ------------- | | Level 33 (2y old, Android 13) | 2 | | Level 32 (2y old, Android 12L) | 1 | | Level 31 (3y old, Android 12) | 7 | | Level 30 (4y old, Android 11) | 75 | | Level 29 (5y old, Android 10) | 288 | | Level 28 (6y old, Android 9) | 275 | | Level 27 + 26 (7y old, Android 8)| 427 | | Level 25 + 24 (8y old, Android 7)| 191 | The older the Android devices are the more unusable they become, older Android versions also have older drivers and Vulkan API (1.0.x on Android 8 and 7). I think only from Android 9 devices Vulkan API 1.1.x is it worthwhile to invest more effort to fix exotic bugs. Up to 650 devices with Adreno 5xx GPU (approx. 3.75%). In Playstore, the installation figures would still have to be taken into account. For example, over 70% of our PlayStore customers already use Android 13+ devices, which is a "normal" app for ordering a cab. **Android 9 is even at only 2%!** 275 / 17339 * 100 = 1.59% * 2% * 100 = 0.0318% (?) of our customers could still have an Adreno 5xx with Android 9. 😃
Alex2782 commented 1 month ago

@zhmt: Xiaomi Redmi Note 11 Pro 5G, Snapdragon 695, Adreno 619 ?

We have driver (vulkan.msm8953.so) crashes on old Adreno 5xx devices:

07-25 10:55:48.463  9124  9124 F DEBUG   : backtrace:
07-25 10:55:48.464  9124  9124 F DEBUG   :       #00 pc 00000000000f5004  /vendor/lib64/hw/vulkan.msm8953.so (BuildId: 405d74bc1d0a34bd6c1cf5bd55b5d33a)
07-25 10:55:48.464  9124  9124 F DEBUG   :       #01 pc 00000000000a0468  /vendor/lib64/hw/vulkan.msm8953.so (qglinternal::vkCmdDrawIndexed(VkCommandBuffer_T*, unsigned int, unsigned int, unsigned int, int, unsigned int)+232) (BuildId: 405d74bc1d0a34bd6c1cf5bd55b5d33a)

According to your logs, it looks like a normal Godot issue libgodot_android.so. -> .NET / C# ? If no one has reported it yet, it would probably be more helpful to create a 'New issue'

2024-07-27 17:09:38.241  8698-8698  DEBUG       A  backtrace:
2024-07-27 17:09:38.241  8698-8698  DEBUG       A        #00 pc 0000000000000000  <unknown>
2024-07-27 17:09:38.241  8698-8698  DEBUG       A        #01 pc 000000000143a5a4  /data/app/~~mUlsDVbQH3XRclE-_me3Hw==/com.example.aaa-NxX8oExdQNHj9Oz-VbgQgw==/lib/arm64/libgodot_android.so
zhmt commented 1 month ago

@Alex2782 I searched issues, I dont think anyone else has reported it. I opened a new issue.

matheusmdx commented 1 month ago

@DarioSamo @Alex2782 I'll test the other mrp's and try find something. What i should look in the print render graph results? Like what result should be normal and what is a bug.


RENDER_GRAPH_REORDER = 0 -> no crash?

Yep, that stops the crash

matheusmdx commented 1 month ago

Also @akien-mga any idea why the debug symbols doesn't work? Get a full backtrace would help a lot.

Alex2782 commented 1 month ago

What i should look in the print render graph results?

My recommendation is to render fewer frames:

func _ready():
    Engine.max_fps = 5
    print("======== READY ========")

https://github.com/Alex2782/godot/blob/debug_vulkan_shader/servers/rendering/rendering_device_graph.cpp#L642

TYPE_DRAW_INDEXED = vkCmdDrawIndexed

07-25 10:55:48.463  9124  9124 F DEBUG   : backtrace:
07-25 10:55:48.464  9124  9124 F DEBUG   :       #00 pc 00000000000f5004  /vendor/lib64/hw/vulkan.msm8953.so (BuildId: 405d74bc1d0a34bd6c1cf5bd55b5d33a)
07-25 10:55:48.464  9124  9124 F DEBUG   :       #01 pc 00000000000a0468  /vendor/lib64/hw/vulkan.msm8953.so (qglinternal::vkCmdDrawIndexed(VkCommandBuffer_T*, unsigned int, unsigned int, unsigned int, int, unsigned int)+232) (BuildId: 405d74bc1d0a34bd6c1cf5bd55b5d33a)

Analyze what happens before the crash, but also what happens between several TYPE_DRAW_INDEXED and _run_draw_list_command executions. The Uniform (MaterialShader) crash can already be reproduced with 2 CanvasNodes:, I think it was like this:

https://github.com/godotengine/godot/issues/82602

image

akien-mga commented 1 month ago

Also @akien-mga any idea why the debug symbols doesn't work? Get a full backtrace would help a lot.

Since you're building the editor, I think there's no pre-defined task for not-stripping it.

You could add the doNotStrip line from generateDevTemplate to https://github.com/godotengine/godot/blob/607b230ffe120b2757c56bd3d52a7a0d4e502cfe/platform/android/java/build.gradle#L284 Or copy it and make a generateDevEditor task for that.

CC @m4gr3d

m4gr3d commented 1 month ago

@matheusmdx you can generate a dev build of the editor with no-stripping using the following command build:

./gradlew generateGodotEditor -PgenerateNativeLibs=true -PdoNotStrip=true

For reference:

For debugging the Android editor, I'd recommend using Android Studio. There's support for setting breakpoints both in java and c++ allowing you to walk through the code line by line to identify the source of the crash.

m4gr3d commented 1 month ago

You can get a backtrace with debug symbols by passing debug_symbols=yes to SCons (or dev_build=yes, not sure if it expects dev stuff) and building the apk with ./gradlew generateDevTemplate in platform/android/java.

Btw @m4gr3d we really need to document this on https://docs.godotengine.org/en/latest/contributing/development/compiling/compiling_for_android.html I see the page was updated to instruct using the new generate_apk=yes SCons option, but I suppose this only handles the stripped case and not dev templates?

We should probably still document the various Gradle task for advanced users.

Good idea, I'll update the documentation with instructions for advanced users.

Note that we'll need to merge https://github.com/godotengine/godot/pull/92859 to address a regression with how stripping is set in the latest versions of gradle.

matheusmdx commented 1 month ago

@m4gr3d I was able to build the apk with your instructions, now i just need some help how i do to debug using android studio, i tried use "attach debbuger to android process" but that didn't worked.


Also here the backtrace with a dev build:

07-28 13:58:51.800  7298  7342 F libc    : Fatal signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 0x0 in tid 7342 (VkThread), pid 7298 (e.editor.v4.dev)
07-28 13:58:52.240  8578  8578 F DEBUG   : *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ***
07-28 13:58:52.240  8578  8578 F DEBUG   : Build fingerprint: 'xiaomi/onc/onc:10/QKQ1.191008.001/V11.0.2.0.QFLMIXM:user/release-keys'
07-28 13:58:52.240  8578  8578 F DEBUG   : Revision: '0'
07-28 13:58:52.240  8578  8578 F DEBUG   : ABI: 'arm64'
07-28 13:58:52.278  8578  8578 F DEBUG   : Timestamp: 2024-07-28 13:58:52-0300
07-28 13:58:52.278  8578  8578 F DEBUG   : pid: 7298, tid: 7342, name: VkThread  >>> org.godotengine.editor.v4.dev <<<
07-28 13:58:52.278  8578  8578 F DEBUG   : uid: 10420
07-28 13:58:52.278  8578  8578 F DEBUG   : signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 0x0
07-28 13:58:52.278  8578  8578 F DEBUG   : Cause: null pointer dereference
07-28 13:58:52.278  8578  8578 F DEBUG   :     x0  0000000000000000  x1  0000000000000003  x2  0000006ffb2de000  x3  0000006fdb618220
07-28 13:58:52.278  8578  8578 F DEBUG   :     x4  0000006ffb2da400  x5  0000000000000000  x6  0000000000000000  x7  0000000000000001
07-28 13:58:52.278  8578  8578 F DEBUG   :     x8  0000000000000002  x9  0000000000000001  x10 0000000000000001  x11 0000000000000004
07-28 13:58:52.278  8578  8578 F DEBUG   :     x12 0000006ffd25c610  x13 0000000000000000  x14 0000000000000000  x15 0000000000000002
07-28 13:58:52.278  8578  8578 F DEBUG   :     x16 0000006ffbe8e6f8  x17 0000007018497800  x18 00000070024d8000  x19 0000007018497800
07-28 13:58:52.278  8578  8578 F DEBUG   :     x20 0000006ffb2de000  x21 00000070165f1640  x22 0000000000000000  x23 00000070165f1340
07-28 13:58:52.278  8578  8578 F DEBUG   :     x24 0000000000000001  x25 0000006fdb978d00  x26 0000000000000028  x27 0000006ffd183ce0
07-28 13:58:52.278  8578  8578 F DEBUG   :     x28 0000000000000000  x29 00000070165f44a0
07-28 13:58:52.278  8578  8578 F DEBUG   :     sp  00000070165f0a20  lr  00000070165f1340  pc  0000006ffc038004
07-28 13:58:52.433  8578  8578 F DEBUG   :
07-28 13:58:52.433  8578  8578 F DEBUG   : backtrace:
07-28 13:58:52.433  8578  8578 F DEBUG   :       #00 pc 00000000000f5004  /vendor/lib64/hw/vulkan.msm8953.so (BuildId: 405d74bc1d0a34bd6c1cf5bd55b5d33a)
07-28 13:58:52.433  8578  8578 F DEBUG   :       #01 pc 00000000000a0468  /vendor/lib64/hw/vulkan.msm8953.so (qglinternal::vkCmdDrawIndexed(VkCommandBuffer_T*, unsigned int, unsigned int, unsigned int, int, unsigned int)+232) (BuildId: 405d74bc1d0a34bd6c1cf5bd55b5d33a)
07-28 13:58:52.433  8578  8578 F DEBUG   :       #02 pc 00000000046c13e4  /data/app/org.godotengine.editor.v4.dev-L_4bEWJcXagmyz1K8N05Rw==/lib/arm64/libgodot_android.so (RenderingDeviceDriverVulkan::render_pass_create(VectorView<RenderingDeviceDriver::Attachment>, VectorView, <RenderingDeviceDriver::Subpass>, VectorView, <RenderingDeviceDriver::SubpassDependency>, unsigned int)+3552)
07-28 13:58:52.433  8578  8578 F DEBUG   :       #03 pc 000000000743a554  /data/app/org.godotengine.editor.v4.dev-L_4bEWJcXagmyz1K8N05Rw==/lib/arm64/libgodot_android.so (_ZN20RenderingDeviceGraph30_add_buffer_barrier_to_commandEN21RenderingDeviceDriver8BufferIDE8BitFieldINS0_17BarrierAccessBitsEES4_RiS5_+4)
Alex2782 commented 1 month ago

Already configured as in the description? https://docs.godotengine.org/en/latest/contributing/development/configuring_an_ide/android_studio.html

I also had some crashes where the debugger did not work properly, no breakpoints were positioned. Android Studio / Editor - Dev Build consumes an incredible amount of memory. At least 16 GB necessary, better 32 GB.

Because the error occurs in the driver, debugging is less useful: /vendor/lib64/hw/vulkan.msm8953.so (qglinternal::vkCmdDrawIndexed

matheusmdx commented 1 month ago

79760 #82602 #85097 #86037

Maybe same issues. There are MRP to reproduce it outside the editor. (my test project: ShaderTest.zip)

I tried the mrps from this issues but all of them crash for me on editor while loading, here the prints they generate before the crash

79760 CircleJumpGodot4.txt 82602 UniformTestProject.txt 85097 vk-uniform-a11.txt 86037 Spritesheet.CPU.Particles.txt Shader Test.txt

New project + click on renderer switch.txt


I'll do more test this week changing some code to see what happens, also if anyone want to test something else feel free to tell me.

I was able to use the android studio so i can get better backtraces and check parameters if necessary:

Captura 2024-07-28 21-01-10-161068 Captura 2024-07-28 21-02-26-473925 Captura 2024-07-28 21-03-45-226407

Alex2782 commented 1 month ago

@matheusmdx thanks! please try https://github.com/godotengine/godot/pull/92611 again.

PR should be prepared, RENDER_GRAPH_REORDER = 0 if it is an Adreno 5xx device.

Outdated Some information should appear in the logs as to whether the workaround has been activated and which driver version. Example on MacOS, which `driverVersion` is displayed on your **Redmi 7**? ``` ======== Workarounds ======== avoid_compute_after_draw: false avoid_render_graph_reorder: false ----------------------------- name: Apple M1 vendor: 4203 deviceID: 235209711 driverVersion: 0.0.2.2012 ``` ----------------------- > big impact on performance I have not yet been able to confirm this with a 2D benchmark, sometimes less, sometimes more +/- 3%, on MacOS M1 Dev. build: https://github.com/godotengine/godot/pull/92611#issuecomment-2259423522 ------------------ The crash happens in the graphics driver, which is like a black box if no sources have been released for it. ![image](https://github.com/user-attachments/assets/5db05ae7-a3f4-4309-b9ce-128f8c3043c4)

uniform ShaderTest.zip

I have tested on Firebase Test Lab, the 'uniform' shader is not fixed, with RENDER_GRAPH_REORDER = 0

matheusmdx commented 1 month ago

@Alex2782 Sorry for the late reply, i was a bit busy last week. This pr doesn't stop the crash, i took a look with android studio and seems that render_graph_reorder still as true after RenderingDevice initialization:

image

image

Alex2782 commented 1 month ago

Thank you! I'll try to revise it in the next few days.


@matheusmdx: render_graph_reorder initialization should now be correct: compare

matheusmdx commented 3 weeks ago

Now i didn't received a notification from your comment edit, but anyways @Alex2782 i can confirm now fixes the crash