godotengine / godot

Godot Engine – Multi-platform 2D and 3D game engine
https://godotengine.org
MIT License
91.21k stars 21.22k forks source link

Skinned Skeleton3D calling `add_child()` with specific classes cause crash on Android with directional shadow in forward+/mobile #90459

Closed TCROC closed 6 months ago

TCROC commented 7 months ago

NOTE:

Significant research has gone into this issue. The initial report is a bit dated. See Summary of current findings below the Original Issue Report for more up to date details.

Original Issue Report

Tested versions

System information

Godot v4.3.dev.mono (a7b860250) - Pop!_OS 22.04 LTS - Wayland - Vulkan (Forward+) - dedicated Intel(R) Arc(tm) A750 Graphics (DG2) () - AMD Ryzen 7 5800X 8-Core Processor (16 Threads)

Issue description

When running on Android, if shadows are enabled on the directional light while my model is in the scene, it will crash. I tested with a simple CsgBox3d and it was fine, but my Blocky_Guy.blend model causes it to crash. Disabling shadows works fine tho.

Previous to the current master, it was actually working with shadows, but crashing when I eqipped additional items to the player's skeleton at runtime. So I am unsure if its specifically an issue with shadows, or an issue with how Godot handles my model on Android.

Steps to reproduce

  1. Download the MRP
  2. ~Build godot with double and mono support~ We reproduced this without double and mono. Just standard godot causes it.
  3. Export to Android
  4. Run the scene on Android and see that it crashes

This was specifically tested against Galaxy S10e model number: SM-G970U1

Edit: Also reproduced on Google Pixel

Minimal reproduction project (MRP)

example-crash.zip

Edit:

Here is an MRP using a standard t-pose from mixamo. So it does not appear to be specific to my model: https://github.com/godotengine/godot/issues/90459#issuecomment-2049998710

Edit 2:

Here is a current summary of the findings for easy reading

Summary of current findings

Renderers Tested:

forward+ = crash ❌ mobile = crash ❌ gl_compatibility = works ✅

Bug introducing PRs:

  1. https://github.com/godotengine/godot/pull/87888
  2. https://github.com/godotengine/godot/pull/84976

MRP test summaries

  1. Skeleton mesh + directional light + shadows = crash ❌
  2. Skeleton mesh + directional light + shadows + changing skeleton + add_child in _process = crash ❌
  3. Skeleton mesh + directional light + shadows + bone attachment + add_child gpu particles 3d = crash ❌
  4. Skeleton mesh + directional light + shadows + add_child + bone attachment + child mesh with skeleton = crash ❌
  5. Skeleton mesh + directional light + shadows + add_child + bone attachment + child mesh without skeleton = works ✅

MRP Details

Skeleton mesh + directional light + shadows

Pr that introduced the bug: https://github.com/godotengine/godot/pull/87888

Steps to reproduce:

  1. Export any of the below MRPs to Android.
  2. Open the application
  3. Watch it instantly crash when it tries to render the character

Known workarounds:

  1. Build with DISABLED_DEPRECATED as suggested by @TokageItLab here: https://github.com/godotengine/godot/issues/90459#issuecomment-2052238329
  2. Disable shadows on the directional light
  3. use gl_compatibility renderer

MRP with Blocky Ball model: https://github.com/godotengine/godot/files/14925776/example-crash.zip

MRP with Mixamo T Pose model: https://github.com/godotengine/godot/files/14948327/example-crash2.zip

MRP with Godot Character Model from examples repo: https://github.com/godotengine/godot/files/14951312/example-crash3.zip

IMPORTANT:

The following crash examples require https://github.com/godotengine/godot/pull/87888 to be reverted, fixed, or a commit checked out before it was in the repo. Otherwise, the following examples will crash on startup due to the fact they all require a skeleton and mesh present.

Skeleton mesh + directional light + shadows + changing skeleton + add_child in _process

Pr that introduced the bug: https://github.com/godotengine/godot/pull/84976 Steps to reproduce:

  1. Export any of the below MRPs to Android.
  2. Open the application
  3. Watch it crash after 1 second when it attempts to equip the "crown" to the Blocky Ball character

Known Workarounds

  1. Disable shadows on directional light
  2. use gl_compatibility renderer

MRP with Blocky Ball model: https://github.com/godotengine/godot/files/14967690/example-crash-change-skeleton.zip

And here is the code for equipping:

extends Node

var secs: float = 0
var is_equipped: bool = false

# # This works
# func _ready() -> void:
#   var skeleton = get_node("Cube_Guy/Cube_Guy_Armature/Skeleton3D")
#   var crown = (load("res://Cosmetics_Hat_Crown.blend") as PackedScene).instantiate()
#   var mesh = crown.get_child(0).get_child(0).get_child(0) as MeshInstance3D
#   mesh.get_parent().remove_child(mesh)
#   crown.queue_free()
#   skeleton.add_child(mesh)

# This crashes
func _process(delta: float) -> void:
    if secs < 1:
        secs += delta
        return

    if is_equipped:
        return

    is_equipped = true
    var skeleton = get_node("Cube_Guy/Cube_Guy_Armature/Skeleton3D")
    var crown = (load("res://Cosmetics_Hat_Crown.blend") as PackedScene).instantiate()
    var mesh = crown.get_child(0).get_child(0).get_child(0) as MeshInstance3D
    mesh.get_parent().remove_child(mesh)
    crown.queue_free()
    skeleton.add_child(mesh)

Note that equipping in _ready works. Equipping in _process after waiting for a second crashes. I tried equipping in _process right away without a delay and that also worked.

Skeleton mesh + directional light + shadows + bone attachment + add_child gpu particles 3d

Pr that introduced the bug: https://github.com/godotengine/godot/pull/84976

Steps to reproduce:

  1. Export any of the below MRPs to Android.
  2. Open the application
  3. Watch it crash after 1 second when it attempts to equip the "crown" to the Blocky Ball character

MRP: example-crash-particles-3d.zip

Code for equipping:

extends Node

var secs: float = 0
var is_equipped: bool = false

# This crashes
#func _ready() -> void:
    #var skeleton = get_node("Cube_Guy/Cube_Guy_Armature/Skeleton3D")
    #var particles = load("res://gpu_particles_3d.tscn") as PackedScene
    #skeleton.get_node("Foot_Front_R").add_child(particles.instantiate())
    #skeleton.get_node("Foot_Front_L").add_child(particles.instantiate())
    #skeleton.get_node("Foot_Back_R").add_child(particles.instantiate())
    #skeleton.get_node("Foot_Back_L").add_child(particles.instantiate())

# This crashes
func _process(delta: float) -> void:
    if secs < 1:
        secs += delta
        return

    if is_equipped:
        return

    is_equipped = true
    var skeleton = get_node("Cube_Guy/Cube_Guy_Armature/Skeleton3D")
    var particles = load("res://gpu_particles_3d.tscn") as PackedScene
    skeleton.get_node("Foot_Front_R").add_child(particles.instantiate())
    skeleton.get_node("Foot_Front_L").add_child(particles.instantiate())
    skeleton.get_node("Foot_Back_R").add_child(particles.instantiate())
    skeleton.get_node("Foot_Back_L").add_child(particles.instantiate())

Note: In this situation, equipping in _ready and _process both cause a crash

Known Workarounds

  1. Disable shadows on directional light
  2. use gl_compatibility renderer

Skeleton mesh + directional light + shadows + add_child + bone attachment + child mesh with skeleton

Pr that introduced the bug: https://github.com/godotengine/godot/pull/84976

Steps to reproduce:

  1. Export any of the below MRPs to Android.
  2. Open the application
  3. Watch it crash after 1 second when it attempts to equip another Blocky Ball character as a hat on the main Blocky Ball character

MRP: example-crash-equip-mesh.zip

Code for equipping:

extends Node

var secs: float = 0
var is_equipped: bool = false

# This works
# func _ready() -> void:
#   var skeleton = get_node("Cube_Guy/Cube_Guy_Armature/Skeleton3D")
#   var hat = load("res://Cube_Guy_Hat.tscn") as PackedScene
#   skeleton.add_child(hat.instantiate())

# This crashes
func _process(delta: float) -> void:
    if secs < 1:
        secs += delta
        return

    if is_equipped:
        return

    is_equipped = true
    var skeleton = get_node("Cube_Guy/Cube_Guy_Armature/Skeleton3D")
    var hat = load("res://Cube_Guy_Hat.tscn") as PackedScene
    skeleton.add_child(hat.instantiate())

Note: In this situation, equipping in _ready works

Known Workarounds

  1. Disable shadows on directional light
  2. use gl_compatibility renderer

Skeleton mesh + directional light + shadows + add_child + bone attachment + child mesh without skeleton

This situation actually works! ✅ Here is an MRP demonstrating it working: example-working-mesh.zip

So if the model being added as a child of a bone attachment doesn't itself have a skeleton, it works fine. In our game, we allow users to equip pets as "hats" for fun. We want to play cool animations with them. Hence the reason we are equipping items with skeletons.

TCROC commented 7 months ago

Note that I'm not sure if it's mono and double related. That's just what I tested with.

TCROC commented 7 months ago

Here is a stack trace captured from debugging in Android Studio:

art_sigsegv_fault 0x0000006ca7f12544
art::FaultManager::HandleFault(int, siginfo *, void *) 0x0000006ca7f12aa4
art::SignalChain::Handler(int, siginfo *, void *) 0x0000006faa1dd340
__kernel_rt_sigreturn 0x0000006fca6c5668
!!!0000!7ecf83797c9f36db9a6d042c7e4763!3dad7f8ed7! 0x0000006c12b02250
!!!0000!84c95bb772bf7dd08fc240b7dd7f78!3dad7f8ed7! 0x0000006c12af5648
!!!0000!2a418ddbe06bac73e12cf0f41836dd!3dad7f8ed7! 0x0000006c12ab9420
vulkan::api::CmdDispatch(VkCommandBuffer_T *, unsigned int, unsigned int, unsigned int) 0x0000006f9eae27c0
RenderingDeviceGraph::_run_compute_list_command(RenderingDeviceDriver::CommandBufferID, const unsigned char *, unsigned int) rendering_device_graph.cpp:615
RenderingDeviceGraph::_run_render_commands(RenderingDeviceDriver::CommandBufferID, int, const RenderingDeviceGraph::RecordedCommandSort *, unsigned int, int &, int &) rendering_device_graph.cpp:784
RenderingDeviceGraph::end(RenderingDeviceDriver::CommandBufferID, bool, bool) rendering_device_graph.cpp:1949
RenderingDevice::_end_frame() rendering_device.cpp:4875
RenderingDevice::swap_buffers() rendering_device.cpp:4707
RenderingServerDefault::_draw(bool, double) rendering_server_default.cpp:95
Main::iteration() main.cpp:4023
OS_Android::main_loop_iterate(bool *) os_android.cpp:303
Java_org_godotengine_godot_GodotLib_step(JNIEnv *, jclass) java_godot_lib_jni.cpp:269
art_quick_generic_jni_trampoline 0x0000006ca7e22248
art_quick_invoke_static_stub 0x0000006ca7e18bec
art::ArtMethod::Invoke(art::Thread *, unsigned int *, unsigned int, art::JValue *, const char *) 0x0000006ca7e86010
art::interpreter::ArtInterpreterToCompiledCodeBridge(art::Thread *, art::ArtMethod *, art::ShadowFrame *, unsigned short, art::JValue *) 0x0000006ca7fea4cc
art::interpreter::DoCall<…>(art::ArtMethod *, art::Thread *, art::ShadowFrame &, const art::Instruction *, unsigned short, art::JValue *) 0x0000006ca7fe5068
art::interpreter::ExecuteSwitchImplCpp<…>(art::interpreter::SwitchImplContext *) 0x0000006ca7e2ce70
ExecuteSwitchImplAsm 0x0000006ca7e24bdc
art::interpreter::ExecuteSwitch(art::Thread *, const art::CodeItemDataAccessor &, art::ShadowFrame &, art::JValue, bool) 0x0000006ca7fe4a14
art::interpreter::Execute(art::Thread *, const art::CodeItemDataAccessor &, art::ShadowFrame &, art::JValue, bool, bool) 0x0000006ca7fdcdb4
art::interpreter::ArtInterpreterToInterpreterBridge(art::Thread *, const art::CodeItemDataAccessor &, art::ShadowFrame *, art::JValue *) 0x0000006ca7fe4588
art::interpreter::DoCall<…>(art::ArtMethod *, art::Thread *, art::ShadowFrame &, const art::Instruction *, unsigned short, art::JValue *) 0x0000006ca7fe5044
art::interpreter::ExecuteSwitchImplCpp<…>(art::interpreter::SwitchImplContext *) 0x0000006ca7e2dc70
ExecuteSwitchImplAsm 0x0000006ca7e24bdc
art::interpreter::ExecuteSwitch(art::Thread *, const art::CodeItemDataAccessor &, art::ShadowFrame &, art::JValue, bool) 0x0000006ca7fe4a14
art::interpreter::Execute(art::Thread *, const art::CodeItemDataAccessor &, art::ShadowFrame &, art::JValue, bool, bool) 0x0000006ca7fdcdb4
artQuickToInterpreterBridge 0x0000006ca834ed08
art_quick_to_interpreter_bridge 0x0000006ca7e2237c
art_quick_invoke_stub 0x0000006ca7e18968
art::ArtMethod::Invoke(art::Thread *, unsigned int *, unsigned int, art::JValue *, const char *) 0x0000006ca7e85ff4
art::InvokeVirtualOrInterfaceWithJValues<…>(const art::ScopedObjectAccessAlreadyRunnable &, _jobject *, art::ArtMethod *, const jvalue *) 0x0000006ca822e430
art::Thread::CreateCallback(void *) 0x0000006ca827e2ec
__pthread_start(void *) 0x0000006fad65ee48
__start_thread 0x0000006fad5fb458
clayjohn commented 7 months ago

As we discussed on RocketChat. It seems like the actual crash is happening within the Vulkan drivers on the device.

Unfortunately, you can't really debug drivers on Android devices. So we need to figure out the cause of the crash from outside the driver. We can do that in two ways:

  1. Put a breakpoint at vkCmdDispatch() and check that the values being passed into the function make sense
  2. Run the application with validation layers enabled and check for any errors that arise.

I think between the two, we should get a pretty good idea of if we are doing something wrong. If we aren't doing something wrong, then it should at least help point us towards a workaround.

TCROC commented 7 months ago

Sounds good. I'm currently working on building the validation layers from source so we can also step through those and look for interesting info. But I can right now check what is being passed to vkCmdDispatch. I'll set that break point real quick

TCROC commented 7 months ago

Here are the values:

This is the last line of godot code before entering the driver stack:

https://github.com/godotengine/godot/blob/a7b860250f305f6cbaf61c30f232ff3bbdfdda0b/servers/rendering/rendering_device_graph.cpp#L615

driver->command_compute_dispatch(p_command_buffer, dispatch_instruction->x_groups, dispatch_instruction->y_groups, dispatch_instruction->z_groups);

And then these are the values of the parameters:

p_command_buffer image 12970367393844968752

dispatch_instruction->x_groups image 74

dispatch_instruction->y_groups image 1

dispatch_instruction->z_groups image 1

And then here is the entire dispatch_instruction struct:

image

And then here is the driver struct:

image image

I wonder if Android Studio has a means of exporting structs as text. Someting akin to:

SomeStruct
|
- SomeField = "some val"

Would be nice to take a "snapshot" so to speak instead of sending snips

TCROC commented 7 months ago

Here is the stacktrace with vulkan validation enabled:

T-CROC - Wed Apr 10 2024 18 58 25 GMT-0400 (Eastern Daylight Time).txt

And here are the key points of interest (copied in from the chat in the godot dev server):

Very first error is just indicating that I built validation layers in debug mode. So we can just ignore that. Then we get to the first error of interest:

USER WARNING: Vulkan Debug Report: object - -5476376679864662384
Validation Performance Warning: [ UNASSIGNED-CoreValidation-SwapchainPreTransform ] Object 0: handle = 0xb400006cbc6f0690, type = VK_OBJECT_TYPE_PHYSICAL_DEVICE; | MessageID = 0x214b6dd0 | vkCreateSwapchainKHR(): pCreateInfo->preTransform (VK_SURFACE_TRANSFORM_IDENTITY_BIT_KHR) doesn't match the currentTransform (VK_SURFACE_TRANSFORM_ROTATE_90_BIT_KHR) returned by vkGetPhysicalDeviceSurfaceCapabilitiesKHR, the presentation engine will transform the image content as part of the presentation operation.
2024-04-10 18:56:15.691  3247-3353  godot                   com.example.examplecrash             E     at: _debug_report_callback (drivers/vulkan/rendering_context_driver_vulkan.cpp:324)

And then not too long after that, this repeats over and over again until it crashes:

USER ERROR: Vulkan Debug Report: object - -6869287954972407602
Validation Error: [ VUID-vkCmdDrawIndexed-magFilter-04553 ] Object 0: handle = 0xa0ab6300000008ce, type = VK_OBJECT_TYPE_DESCRIPTOR_SET; Object 1: handle = 0x47610f00000007e1, type = VK_OBJECT_TYPE_SAMPLER; Object 2: handle = 0x980b0000000002e, type = VK_OBJECT_TYPE_IMAGE_VIEW; | MessageID = 0x9c7248ee | vkCmdDrawIndexed: Descriptor set VkDescriptorSet 0xa0ab6300000008ce[] Sampler (VkSampler 0x47610f00000007e1[]) is set to use VK_FILTER_LINEAR with compareEnable is set to VK_FALSE, but image view's (VkImageView 0x980b0000000002e[]) format (VK_FORMAT_D16_UNORM) does not contain VK_FORMAT_FEATURE_SAMPLED_IMAGE_FILTER_LINEAR_BIT in its format features. The Vulkan spec states: If a VkSampler created with magFilter or minFilter equal to VK_FILTER_LINEAR and compareEnable equal to VK_FALSE is used to sample a VkImageView as a result of this command, then the image view's format features must contain VK_FORMAT_FEATURE_SAMPLED_IMAGE_FILTER_LINEAR_BIT (ht
2024-04-10 18:56:24.204  3247-3353  godot                   com.example.examplecrash             E     at: _debug_report_callback (drivers/vulkan/rendering_context_driver_vulkan.cpp:327)
clayjohn commented 7 months ago

The Vulkan profile for the affected device is here: https://vulkan.gpuinfo.org/displayreport.php?id=16301

I can reproduce the validation layer error when using this profile.

It appears that the crash is coming from the skeleton compute shader. I think there is a good chance that the crash is happening due to an issue with this specific skeleton. But we can't rule out that the Vulkan error is somehow related. So let's try to fix the validation layer error first and then re-evaluate and see if the crash goes away.

For the validation error, it seems that it comes from not supporting linear samplers with VK_FORMAT_D16_UNORM. On my device the validation layer is slightly different, but I think it is because it is falling back to another code path it thinks will be better support. I get the following warning print: format VK_FORMAT_D16_UNORM is simulating unsupported features! and then the validation layer error I get is:

ERROR: VALIDATION - Message Id Number: 1397510317 | Message Id Name: VUID-vkCmdDrawIndexed-None-06479
    Validation Error: [ VUID-vkCmdDrawIndexed-None-06479 ] Object 0: handle = 0x587167d18ee1, name = RID:644395418256182, type = VK_OBJECT_TYPE_DESCRIPTOR_SET; Object 1: handle = 0x58714d4c5890, name = RID:30064771077 View, type = VK_OBJECT_TYPE_IMAGE_VIEW; | MessageID = 0x534c50ad | vkCmdDrawIndexed: Descriptor set VkDescriptorSet 0x587167d18ee1[RID:644395418256182] in binding #5 index 0, VkImageView 0x58714d4c5890[RID:30064771077 View], image view format VK_FORMAT_D16_UNORM feature flags (VK_FORMAT_FEATURE_2_SAMPLED_IMAGE_BIT|VK_FORMAT_FEATURE_2_DEPTH_STENCIL_ATTACHMENT_BIT|VK_FORMAT_FEATURE_2_BLIT_SRC_BIT|VK_FORMAT_FEATURE_2_TRANSFER_SRC_BIT|VK_FORMAT_FEATURE_2_TRANSFER_DST_BIT) doesn't contain VK_FORMAT_FEATURE_2_SAMPLED_IMAGE_DEPTH_COMPARISON_BIT The Vulkan spec states: If a VkImageView is sampled with depth comparison, the image view's format features must contain VK_FORMAT_FEATURE_2_SAMPLED_IMAGE_DEPTH_COMPARISON_BIT (https://vulkan.lunarg.com/doc/view/1.3.250.1/linux/1.3-extensions/vkspec.html#VUID-vkCmdDrawIndexed-None-06479)
    Objects - 2
        Object[0] - VK_OBJECT_TYPE_DESCRIPTOR_SET, Handle 97244096335585, Name "RID:644395418256182"
        Object[1] - VK_OBJECT_TYPE_IMAGE_VIEW, Handle 97243651397776, Name "RID:30064771077 View"
clayjohn commented 7 months ago

For further testing, can you try with this diff:


diff --git a/servers/rendering/renderer_rd/environment/gi.cpp b/servers/rendering/renderer_rd/environment/gi.cpp
index c7752f8a86..80a83b761b 100644
--- a/servers/rendering/renderer_rd/environment/gi.cpp
+++ b/servers/rendering/renderer_rd/environment/gi.cpp
@@ -2672,7 +2672,7 @@ void GI::VoxelGIInstance::update(bool p_update_light_instances, const Vector<RID
                                        if (dynamic_maps.size() == 0) {
                                                // Render depth for first one.
                                                // Use 16-bit depth when supported to improve performance.
-                                               dtf.format = RD::get_singleton()->texture_is_format_supported_for_usage(RD::DATA_FORMAT_D16_UNORM, RD::TEXTURE_USAGE_DEPTH_STENCIL_A
TTACHMENT_BIT) ? RD::DATA_FORMAT_D16_UNORM : RD::DATA_FORMAT_X8_D24_UNORM_PACK32;
+                                               dtf.format = RD::get_singleton()->texture_is_format_supported_for_usage(RD::DATA_FORMAT_D16_UNORM, RD::TEXTURE_USAGE_DEPTH_STENCIL_A
TTACHMENT_BIT) ? RD::DATA_FORMAT_X8_D24_UNORM_PACK32 : RD::DATA_FORMAT_X8_D24_UNORM_PACK32;
                                                dtf.usage_bits = RD::TEXTURE_USAGE_DEPTH_STENCIL_ATTACHMENT_BIT;
                                                dmap.fb_depth = RD::get_singleton()->texture_create(dtf, RD::TextureView());
                                                RD::get_singleton()->set_resource_name(dmap.fb_depth, "VoxelGI Instance DMap FB Depth");
diff --git a/servers/rendering/renderer_rd/shaders/forward_clustered/scene_forward_clustered.glsl b/servers/rendering/renderer_rd/shaders/forward_clustered/scene_forward_clustered.
glsl
index 359d7799e5..372b63bb83 100644
--- a/servers/rendering/renderer_rd/shaders/forward_clustered/scene_forward_clustered.glsl
+++ b/servers/rendering/renderer_rd/shaders/forward_clustered/scene_forward_clustered.glsl
@@ -1975,7 +1975,7 @@ void fragment_shader(in SceneData scene_data) {
                                        vec4 trans_coord = directional_lights.data[i].shadow_matrix1 * trans_vertex;
                                        trans_coord /= trans_coord.w;

-                                       float shadow_z = textureLod(sampler2D(directional_shadow_atlas, SAMPLER_LINEAR_CLAMP), trans_coord.xy, 0.0).r;
+                                       float shadow_z = textureLod(sampler2D(directional_shadow_atlas, SAMPLER_NEAREST_CLAMP), trans_coord.xy, 0.0).r;
                                        shadow_z *= directional_lights.data[i].shadow_z_range.x;
                                        float z = trans_coord.z * directional_lights.data[i].shadow_z_range.x;

@@ -1985,7 +1985,7 @@ void fragment_shader(in SceneData scene_data) {
                                        vec4 trans_coord = directional_lights.data[i].shadow_matrix2 * trans_vertex;
                                        trans_coord /= trans_coord.w;

-                                       float shadow_z = textureLod(sampler2D(directional_shadow_atlas, SAMPLER_LINEAR_CLAMP), trans_coord.xy, 0.0).r;
+                                       float shadow_z = textureLod(sampler2D(directional_shadow_atlas, SAMPLER_NEAREST_CLAMP), trans_coord.xy, 0.0).r;
                                        shadow_z *= directional_lights.data[i].shadow_z_range.y;
                                        float z = trans_coord.z * directional_lights.data[i].shadow_z_range.y;

@@ -1995,7 +1995,7 @@ void fragment_shader(in SceneData scene_data) {
                                        vec4 trans_coord = directional_lights.data[i].shadow_matrix3 * trans_vertex;
                                        trans_coord /= trans_coord.w;

-                                       float shadow_z = textureLod(sampler2D(directional_shadow_atlas, SAMPLER_LINEAR_CLAMP), trans_coord.xy, 0.0).r;
+                                       float shadow_z = textureLod(sampler2D(directional_shadow_atlas, SAMPLER_NEAREST_CLAMP), trans_coord.xy, 0.0).r;
                                        shadow_z *= directional_lights.data[i].shadow_z_range.z;
                                        float z = trans_coord.z * directional_lights.data[i].shadow_z_range.z;

@@ -2006,7 +2006,7 @@ void fragment_shader(in SceneData scene_data) {
                                        vec4 trans_coord = directional_lights.data[i].shadow_matrix4 * trans_vertex;
                                        trans_coord /= trans_coord.w;

-                                       float shadow_z = textureLod(sampler2D(directional_shadow_atlas, SAMPLER_LINEAR_CLAMP), trans_coord.xy, 0.0).r;
+                                       float shadow_z = textureLod(sampler2D(directional_shadow_atlas, SAMPLER_NEAREST_CLAMP), trans_coord.xy, 0.0).r;
                                        shadow_z *= directional_lights.data[i].shadow_z_range.w;
                                        float z = trans_coord.z * directional_lights.data[i].shadow_z_range.w;

diff --git a/servers/rendering/renderer_rd/storage_rd/light_storage.cpp b/servers/rendering/renderer_rd/storage_rd/light_storage.cpp
index d1ff9fc362..666a1ace06 100644
--- a/servers/rendering/renderer_rd/storage_rd/light_storage.cpp
+++ b/servers/rendering/renderer_rd/storage_rd/light_storage.cpp
@@ -1995,7 +1995,7 @@ void LightStorage::shadow_atlas_free(RID p_atlas) {
 void LightStorage::_update_shadow_atlas(ShadowAtlas *shadow_atlas) {
        if (shadow_atlas->size > 0 && shadow_atlas->depth.is_null()) {
                RD::TextureFormat tf;
-               tf.format = shadow_atlas->use_16_bits ? RD::DATA_FORMAT_D16_UNORM : RD::DATA_FORMAT_D32_SFLOAT;
+               tf.format = shadow_atlas->use_16_bits ? RD::DATA_FORMAT_D32_SFLOAT : RD::DATA_FORMAT_D32_SFLOAT;
                tf.width = shadow_atlas->size;
                tf.height = shadow_atlas->size;
                tf.usage_bits = RD::TEXTURE_USAGE_SAMPLING_BIT | RD::TEXTURE_USAGE_DEPTH_STENCIL_ATTACHMENT_BIT;
@@ -2388,7 +2388,7 @@ void LightStorage::shadow_atlas_update(RID p_atlas) {
 void LightStorage::update_directional_shadow_atlas() {
        if (directional_shadow.depth.is_null() && directional_shadow.size > 0) {
                RD::TextureFormat tf;
-               tf.format = directional_shadow.use_16_bits ? RD::DATA_FORMAT_D16_UNORM : RD::DATA_FORMAT_D32_SFLOAT;
+               tf.format = directional_shadow.use_16_bits ? RD::DATA_FORMAT_D32_SFLOAT : RD::DATA_FORMAT_D32_SFLOAT;
                tf.width = directional_shadow.size;
                tf.height = directional_shadow.size;
                tf.usage_bits = RD::TEXTURE_USAGE_SAMPLING_BIT | RD::TEXTURE_USAGE_DEPTH_STENCIL_ATTACHMENT_BIT;
TCROC commented 7 months ago

I just tested with those changes. I still get the same error leading up to a crash:

Validation Error: [ VUID-vkCmdDrawIndexed-magFilter-04553 ] Object 0: handle = 0xa0ab6300000008ce, type = VK_OBJECT_TYPE_DESCRIPTOR_SET; Object 1: handle = 0x47610f00000007e1, type = VK_OBJECT_TYPE_SAMPLER; Object 2: handle = 0x980b0000000002e, type = VK_OBJECT_TYPE_IMAGE_VIEW; | MessageID = 0x9c7248ee | vkCmdDrawIndexed: Descriptor set VkDescriptorSet 0xa0ab6300000008ce[] Sampler (VkSampler 0x47610f00000007e1[]) is set to use VK_FILTER_LINEAR with compareEnable is set to VK_FALSE, but image view's (VkImageView 0x980b0000000002e[]) format (VK_FORMAT_D16_UNORM) does not contain VK_FORMAT_FEATURE_SAMPLED_IMAGE_FILTER_LINEAR_BIT in its format features. The Vulkan spec states: If a VkSampler created with magFilter or minFilter equal to VK_FILTER_LINEAR and compareEnable equal to VK_FALSE is used to sample a VkImageView as a result of this command, then the image view's format features must contain VK_FORMAT_FEATURE_SAMPLED_IMAGE_FILTER_LINEAR_BIT (ht
2024-04-11 08:40:39.710 25574-25614 godot                   com.example.examplecrash             E     at: _debug_report_callback (drivers/vulkan/rendering_context_driver_vulkan.cpp:327)
TCROC commented 7 months ago

Ok so I found that in drivers/vulkan/render_device_vulkan.cpp lines 1799 and 1800, if I manually set the magFilter to VK_FILTER_NEAREST, the validation errors go away. But it still crashes. Lets see if we can get anything interesting out of these crash logs now.

rendering_device_driver_vulkan.cpp

    sampler_create_info.magFilter = p_state.mag_filter == SAMPLER_FILTER_LINEAR ? VK_FILTER_NEAREST : VK_FILTER_NEAREST;
    sampler_create_info.minFilter = p_state.min_filter == SAMPLER_FILTER_LINEAR ? VK_FILTER_NEAREST : VK_FILTER_NEAREST;

Here is the stack trace:

art_sigsegv_fault 0x0000006ca7f12544
art::FaultManager::HandleFault(int, siginfo *, void *) 0x0000006ca7f12aa4
art::SignalChain::Handler(int, siginfo *, void *) 0x0000006faa1dd340
__kernel_rt_sigreturn 0x0000006fca6c5668
!!!0000!7ecf83797c9f36db9a6d042c7e4763!3dad7f8ed7! 0x0000006c1090b250
!!!0000!84c95bb772bf7dd08fc240b7dd7f78!3dad7f8ed7! 0x0000006c108fe648
!!!0000!2a418ddbe06bac73e12cf0f41836dd!3dad7f8ed7! 0x0000006c108c2420
DispatchCmdDispatch(VkCommandBuffer_T *, unsigned int, unsigned int, unsigned int) layer_chassis_dispatch.cpp:3491
vulkan_layer_chassis::CmdDispatch(VkCommandBuffer_T *, unsigned int, unsigned int, unsigned int) chassis.cpp:3147
vulkan::api::CmdDispatch(VkCommandBuffer_T *, unsigned int, unsigned int, unsigned int) 0x0000006f9eae27c0
<unknown> 0x0000006c24e89108
<unknown> 0x0000006c24e89910
<unknown> 0x0000006c24e8dd00
<unknown> 0x0000006c24e0f55c
<unknown> 0x0000006c24e0f460
<unknown> 0x0000006c24e8f2f8
<unknown> 0x0000006c22efae68
<unknown> 0x0000006c22eb1010
Java_org_godotengine_godot_GodotLib_step 0x0000006c22ec5038
art_quick_generic_jni_trampoline 0x0000006ca7e22248
art_quick_invoke_static_stub 0x0000006ca7e18bec
art::ArtMethod::Invoke(art::Thread *, unsigned int *, unsigned int, art::JValue *, const char *) 0x0000006ca7e86010
art::interpreter::ArtInterpreterToCompiledCodeBridge(art::Thread *, art::ArtMethod *, art::ShadowFrame *, unsigned short, art::JValue *) 0x0000006ca7fea4cc
art::interpreter::DoCall<…>(art::ArtMethod *, art::Thread *, art::ShadowFrame &, const art::Instruction *, unsigned short, art::JValue *) 0x0000006ca7fe5068
art::interpreter::ExecuteSwitchImplCpp<…>(art::interpreter::SwitchImplContext *) 0x0000006ca7e2ce70
ExecuteSwitchImplAsm 0x0000006ca7e24bdc
art::interpreter::ExecuteSwitch(art::Thread *, const art::CodeItemDataAccessor &, art::ShadowFrame &, art::JValue, bool) 0x0000006ca7fe4a14
art::interpreter::Execute(art::Thread *, const art::CodeItemDataAccessor &, art::ShadowFrame &, art::JValue, bool, bool) 0x0000006ca7fdcdb4
artQuickToInterpreterBridge 0x0000006ca834ed08
art_quick_to_interpreter_bridge 0x0000006ca7e2237c
<unknown> 0x000000009d0309e8
art::interpreter::ExecuteSwitchImplCpp<…>(art::interpreter::SwitchImplContext *) 0x0000006ca7e2c380
ExecuteSwitchImplAsm 0x0000006ca7e24bdc
art::interpreter::ExecuteSwitch(art::Thread *, const art::CodeItemDataAccessor &, art::ShadowFrame &, art::JValue, bool) 0x0000006ca7fe4a14
art::interpreter::Execute(art::Thread *, const art::CodeItemDataAccessor &, art::ShadowFrame &, art::JValue, bool, bool) 0x0000006ca7fdcdb4
artQuickToInterpreterBridge 0x0000006ca834ed08
art_quick_to_interpreter_bridge 0x0000006ca7e2237c
art_quick_invoke_stub 0x0000006ca7e18968
art::ArtMethod::Invoke(art::Thread *, unsigned int *, unsigned int, art::JValue *, const char *) 0x0000006ca7e85ff4
art::InvokeVirtualOrInterfaceWithJValues<…>(const art::ScopedObjectAccessAlreadyRunnable &, _jobject *, art::ArtMethod *, const jvalue *) 0x0000006ca822e430
art::Thread::CreateCallback(void *) 0x0000006ca827e2ec
__pthread_start(void *) 0x0000006fad65ee48
__start_thread 0x0000006fad5fb458
TCROC commented 7 months ago

I just tested with a t-pose model from mixamo and it also crashes: https://www.mixamo.com

So it appears that skeletons in general are causing godot to crash on Android. Should we update the title of the issue to reflect our findings?

TCROC commented 7 months ago

Here is the MRP with the humanoid from mixamo that also causes the crash:

example-crash2.zip

TCROC commented 7 months ago

And this is the logcat trace:

12:42:52.915  I  Late-enabling -Xcheck:jni
12:42:52.935  E  USNET: appName: com.example.examplecrash
12:42:52.937  D  Binder ioctl to enable oneway spam detection failed: Invalid argument
12:42:52.939  D  setConscryptValidator
12:42:52.939  D  setConscryptValidator - put
12:42:52.975  W  DexFile /data/data/com.example.examplecrash/code_cache/.studio/instruments-54177d1b.jar is in boot class path but is not in a known location
12:42:53.092  W  Current dex file has more than one class in it. Calling RetransformClasses on this class might fail if no transformations are applied to it!
12:42:53.154  W  Current dex file has more than one class in it. Calling RetransformClasses on this class might fail if no transformations are applied to it!
12:42:53.155  W  Redefining intrinsic method java.lang.Thread java.lang.Thread.currentThread(). This may cause the unexpected use of the original definition of java.lang.Thread java.lang.Thread.currentThread()in methods that have already been compiled.
12:42:53.155  W  Redefining intrinsic method boolean java.lang.Thread.interrupted(). This may cause the unexpected use of the original definition of boolean java.lang.Thread.interrupted()in methods that have already been compiled.
12:42:53.166  W  Current dex file has more than one class in it. Calling RetransformClasses on this class might fail if no transformations are applied to it!
12:42:53.295  W  Current dex file has more than one class in it. Calling RetransformClasses on this class might fail if no transformations are applied to it!
12:42:53.446  W  Current dex file has more than one class in it. Calling RetransformClasses on this class might fail if no transformations are applied to it!
12:42:53.451  D  handleBindApplication()++ app=com.example.examplecrash
12:42:53.452  D  Compat change id reported: 171979766; UID 10375; state: ENABLED
12:42:53.458  W  Application com.example.examplecrash is waiting for the debugger on port 8100...
12:42:53.458  I  Sending WAIT chunk
12:42:54.942  W  A resource failed to call close. 
12:42:57.854  I  Debugger has connected
12:42:57.855  I  waiting for debugger to settle...
12:42:58.057  I  waiting for debugger to settle...
12:42:58.259  I  waiting for debugger to settle...
12:42:58.461  I  waiting for debugger to settle...
12:42:58.665  I  waiting for debugger to settle...
12:42:58.868  I  waiting for debugger to settle...
12:42:59.070  I  waiting for debugger to settle...
12:42:59.274  I  waiting for debugger to settle...
12:42:59.478  I  debugger has settled (1332)
12:42:59.488  W  Slow operation: 6036ms so far, now at handleBindApplication: Before HardwareRenderer
12:42:59.514  W  Slow operation: 6063ms so far, now at handleBindApplication: After HardwareRenderer
12:43:00.593  V  ANGLE Developer option for 'com.example.examplecrash' set to: 'default'
12:43:00.594  V  App is not on the allowlist for updatable production driver.
12:43:00.615  D  LoadedApk::makeApplication() appContext.mOpPackageName=com.example.examplecrash appContext.mBasePackageName=com.example.examplecrash
12:43:00.615  D  No Network Security Config specified, using platform default
12:43:00.753  D  No Network Security Config specified, using platform default
12:43:00.776  D  handleBindApplication() --
12:43:00.798  D  RenderThread::requireGlContext()
12:43:00.800  I  QUALCOMM build                   : 3dad7f8ed7, I593c16c433
                 Build Date                       : 10/01/21
                 OpenGL ES Shader Compiler Version: EV031.32.02.02
                 Local Branch                     : 
                 Remote Branch                    : refs/tags/AU_LINUX_ANDROID_LA.UM.9.1.R1.11.00.00.604.073
                 Remote Branch                    : NONE
                 Reconstruct Branch               : NOTHING
12:43:00.800  I  Build Config                     : S P 10.0.7 AArch64
12:43:00.800  I  Driver Path                      : /vendor/lib64/egl/libGLESv2_adreno.so
12:43:00.949  I  PFP: 0x016ee190, ME: 0x00000000
12:43:00.958  D  RenderThread::setGrContext()
12:43:01.032  I  [INFO] isPopOver=false, config=true
12:43:01.032  I  updateCaptionType >> DecorView@ca403e0[], isFloating=false, isApplication=true, hasWindowControllerCallback=true, hasWindowDecorCaption=false
12:43:01.033  D  setCaptionType = 0, this = DecorView@ca403e0[]
12:43:01.044  I  getCurrentDensityDpi: from real metrics. densityDpi=480 msg=resources_loaded
12:43:01.050  V  Creating new Godot fragment instance.
12:43:03.738  D  registerListener :: 11, LSM6DSO Accelerometer, 20000, 0, 
12:43:03.750  D  registerListener :: 91, gravity  Non-wakeup, 20000, 0, org.godotengine.godot.Godot@45e16cc
12:43:03.770  D  registerListener :: 21, AK09918 Magnetometer, 20000, 0, org.godotengine.godot.Godot@45e16cc
12:43:03.803  D  registerListener :: 41, LSM6DSO Gyroscope, 20000, 0, org.godotengine.godot.Godot@45e16cc
12:43:03.846  I  notifyKeepScreenOnChanged: keepScreenOn=false
12:43:03.851  D  setRequestedVisible: visible=false, type=21, host=com.example.examplecrash/com.godot.game.GodotApp, from=android.view.InsetsSourceConsumer.hide:242 android.view.InsetsController.collectSourceControls:1215 android.view.InsetsController.controlAnimationUnchecked:1077 android.view.InsetsController.applyAnimation:1456 android.view.InsetsController.applyAnimation:1437 android.view.InsetsController.hide:1006 android.view.InsetsController.hide:981 android.view.ViewRootImpl.controlInsetsForCompatibility:3142 android.view.ViewRootImpl.setView:1555 android.view.WindowManagerGlobal.addView:509 
12:43:03.853  D  setRequestedVisible: visible=false, type=20, host=com.example.examplecrash/com.godot.game.GodotApp, from=android.view.InsetsSourceConsumer.hide:242 android.view.InsetsController.collectSourceControls:1215 android.view.InsetsController.controlAnimationUnchecked:1077 android.view.InsetsController.applyAnimation:1456 android.view.InsetsController.applyAnimation:1437 android.view.InsetsController.hide:1006 android.view.InsetsController.hide:981 android.view.ViewRootImpl.controlInsetsForCompatibility:3142 android.view.ViewRootImpl.setView:1555 android.view.WindowManagerGlobal.addView:509 
12:43:03.855  D  setRequestedVisible: visible=false, type=1, host=com.example.examplecrash/com.godot.game.GodotApp, from=android.view.InsetsSourceConsumer.hide:242 android.view.InsetsController.collectSourceControls:1215 android.view.InsetsController.controlAnimationUnchecked:1077 android.view.InsetsController.applyAnimation:1456 android.view.InsetsController.applyAnimation:1437 android.view.InsetsController.hide:1006 android.view.InsetsController.hide:981 android.view.ViewRootImpl.controlInsetsForCompatibility:3142 android.view.ViewRootImpl.setView:1555 android.view.WindowManagerGlobal.addView:509 
12:43:03.857  D  setRequestedVisible: visible=false, type=0, host=com.example.examplecrash/com.godot.game.GodotApp, from=android.view.InsetsSourceConsumer.hide:242 android.view.InsetsController.collectSourceControls:1215 android.view.InsetsController.controlAnimationUnchecked:1077 android.view.InsetsController.applyAnimation:1456 android.view.InsetsController.applyAnimation:1437 android.view.InsetsController.hide:1006 android.view.InsetsController.hide:981 android.view.ViewRootImpl.controlInsetsForCompatibility:3142 android.view.ViewRootImpl.setView:1555 android.view.WindowManagerGlobal.addView:509 
12:43:03.876  I  setView = com.android.internal.policy.DecorView@ca403e0 TM=true
12:43:03.917  I  onWindowVisibilityChanged(0) true org.godotengine.godot.GodotVulkanRenderView{3176227 VFE...C.. .F....I. 0,0-0,0} of ViewRootImpl@ebbd3b8[GodotApp]
12:43:03.966  I  Relayout returned: old=(0,0,2280,1080) new=(0,0,2280,1080) req=(2280,1080)0 dur=20 res=0x7 s={true -5476376673690511552} ch=true fn=-1
12:43:03.993  D  Binder ioctl to enable oneway spam detection failed: Invalid argument
12:43:03.994  D  eglCreateWindowSurface
12:43:03.997  I  windowStopped(false) true org.godotengine.godot.GodotVulkanRenderView{3176227 VFE...C.. .F....ID 0,0-2280,1080} of ViewRootImpl@ebbd3b8[GodotApp]
12:43:03.999  I  [DP] dp(1) 1 android.view.ViewRootImpl.reportNextDraw:11420 android.view.ViewRootImpl.performTraversals:4193 android.view.ViewRootImpl.doTraversal:2919 
12:43:04.011  I  pST: sr = Rect(0, 0 - 2280, 1080) sw = 2280 sh = 1080
12:43:04.011  I  onSSPAndSRT: pl = 0 pt = 0 sx = 1.0 sy = 1.0
12:43:04.012  I  pST: mTmpTransaction.apply, mTmpTransaction = android.view.SurfaceControl$Transaction@effd501
12:43:04.012  I  updateSurface: mVisible = true mSurface.isValid() = true
12:43:04.012  I  updateSurface: mSurfaceCreated = false surfaceChanged = true visibleChanged = true
12:43:04.013  I  surfaceCreated 1 #8 org.godotengine.godot.GodotVulkanRenderView{3176227 VFE...C.. .F....ID 0,0-2280,1080}
12:43:04.014  I  surfaceChanged (2280,1080) 1 #8 org.godotengine.godot.GodotVulkanRenderView{3176227 VFE...C.. .F....ID 0,0-2280,1080}
12:43:04.015  I  [DP] dp(2) 1 android.view.SurfaceView.updateSurface:1375 android.view.SurfaceView.lambda$new$1$SurfaceView:254 android.view.SurfaceView$$ExternalSyntheticLambda2.onPreDraw:2 
12:43:04.016  I  [DP] pdf(1) 1 android.view.SurfaceView.notifyDrawFinished:599 android.view.SurfaceView.performDrawFinished:586 android.view.SurfaceView.$r8$lambda$st27mCkd9jfJkTrN_P3qIGKX6NY:0 
12:43:04.016  I  Godot Engine v4.3.dev.mono.custom_build.a7b860250 (2024-04-09 08:42:45 UTC) - https://godotengine.org
12:43:04.030  I  uSP: rtp = Rect(0, 0 - 2280, 1080) rtsw = 2280 rtsh = 1080
12:43:04.030  I  onSSPAndSRT: pl = 0 pt = 0 sx = 1.0 sy = 1.0
12:43:04.031  I  aOrMT: uB = true t = android.view.SurfaceControl$Transaction@3cf7d94 fN = 1 android.view.SurfaceView.access$500:124 android.view.SurfaceView$SurfaceViewPositionUpdateListener.positionChanged:1728 android.graphics.RenderNode$CompositePositionUpdateListener.positionChanged:319 
12:43:04.031  I  aOrMT: vR.mWNT, vR = ViewRootImpl@ebbd3b8[GodotApp]
12:43:04.031  I  mWNT: t = android.view.SurfaceControl$Transaction@3cf7d94 fN = 1 android.view.SurfaceView.applyOrMergeTransaction:1628 android.view.SurfaceView.access$500:124 android.view.SurfaceView$SurfaceViewPositionUpdateListener.positionChanged:1728 
12:43:04.031  I  mWNT: merge t to BBQ
12:43:04.038  I  [ViewRootImpl@ebbd3b8[GodotApp]#0(BLAST Consumer)0](id:dbe00000000,api:1,p:3518,c:3518) queueBuffer: queued for the first time.
12:43:04.068  I  [DP] pdf(0) 1 android.view.ViewRootImpl.lambda$addFrameCompleteCallbackIfNeeded$3$ViewRootImpl:4995 android.view.ViewRootImpl$$ExternalSyntheticLambda16.run:6 android.os.Handler.handleCallback:938 
12:43:04.068  I  [DP] rdf()
12:43:04.068  D  reportDrawFinished (fn: -1) 
12:43:04.070  I  notifyKeepScreenOnChanged: keepScreenOn=true
12:43:04.284  D  searching for layers in '/data/app/~~lIfLcd4UqnnumILWcmjqMg==/com.example.examplecrash-Jt1xXKzin5CdtIGFYHTsfQ==/lib/arm64'
12:43:04.285  D  searching for layers in '/data/app/~~lIfLcd4UqnnumILWcmjqMg==/com.example.examplecrash-Jt1xXKzin5CdtIGFYHTsfQ==/base.apk!/lib/arm64-v8a'
12:43:05.519  I  Relayout returned: old=(0,0,2280,1080) new=(0,0,2280,1080) req=(2280,1080)0 dur=198 res=0x1 s={true -5476376673690511552} ch=false fn=2
12:43:05.520  I  updateBoundsLayer: t = android.view.SurfaceControl$Transaction@a34a183 sc = Surface(name=Bounds for - com.example.examplecrash/com.godot.game.GodotApp@0)/@0xff0df00 frame = 2
12:43:05.521  I  mWNT: t = android.view.SurfaceControl$Transaction@a34a183 fN = 2 android.view.ViewRootImpl.prepareSurfaces:2778 android.view.ViewRootImpl.performTraversals:4024 android.view.ViewRootImpl.doTraversal:2919 
12:43:05.521  I  mWNT: merge t to BBQ
12:43:05.537  I  mWNT: t = android.view.SurfaceControl$Transaction@7812339 fN = 2 android.view.SyncRtSurfaceTransactionApplier.applyTransaction:94 android.view.SyncRtSurfaceTransactionApplier.lambda$scheduleApply$0$SyncRtSurfaceTransactionApplier:71 android.view.SyncRtSurfaceTransactionApplier$$ExternalSyntheticLambda0.onFrameDraw:4 
12:43:05.537  I  mWNT: merge t to BBQ
12:43:05.554  I  Davey! duration=1634ms; Flags=0, FrameTimelineVsyncId=769797, IntendedVsync=205966706523332, Vsync=205966856523326, InputEventId=0, HandleInputStart=205966867697053, AnimationStart=205966867716480, PerformTraversalsStart=205966867747001, DrawStart=205968329020646, FrameDeadline=205966739856664, FrameInterval=205966867608876, FrameStartTime=16666666, SyncQueued=205968330364292, SyncStart=205968330777574, IssueDrawCommandsStart=205968330857417, SwapBuffers=205968333785803, FrameCompleted=205968341058667, DequeueBufferDuration=2365260, QueueBufferDuration=995261, GpuCompleted=205968341058667, SwapBuffersCompleted=205968334961376, DisplayPresentTime=0, 
12:43:05.611  I  mWNT: t = android.view.SurfaceControl$Transaction@315d2df fN = 3 android.view.SyncRtSurfaceTransactionApplier.applyTransaction:94 android.view.SyncRtSurfaceTransactionApplier.lambda$scheduleApply$0$SyncRtSurfaceTransactionApplier:71 android.view.SyncRtSurfaceTransactionApplier$$ExternalSyntheticLambda0.onFrameDraw:4 
12:43:05.611  I  mWNT: merge t to BBQ
12:43:05.620  D  added global layer 'VK_LAYER_KHRONOS_validation' from library '/data/app/~~lIfLcd4UqnnumILWcmjqMg==/com.example.examplecrash-Jt1xXKzin5CdtIGFYHTsfQ==/base.apk!/lib/arm64-v8a/libVkLayer_khronos_validation.so'
12:43:05.999  I  mWNT: t = android.view.SurfaceControl$Transaction@e04eb2c fN = 4 android.view.SyncRtSurfaceTransactionApplier.applyTransaction:94 android.view.SyncRtSurfaceTransactionApplier.lambda$scheduleApply$0$SyncRtSurfaceTransactionApplier:71 android.view.SyncRtSurfaceTransactionApplier$$ExternalSyntheticLambda0.onFrameDraw:4 
12:43:05.999  I  mWNT: merge t to BBQ
12:43:06.005  I  MSG_WINDOW_FOCUS_CHANGED 1 1
12:43:06.010  D  startInputInner - Id : 0
12:43:06.010  I  startInputInner - mService.startInputOrWindowGainedFocus
12:43:06.018  D  startInputInner - Id : 0
12:43:06.083  I  Loaded layer VK_LAYER_KHRONOS_validation
12:43:06.087  I  ===== BEGIN DUMP OF OVERRIDDEN SETTINGS =====
12:43:06.087  I  ===== END DUMP OF OVERRIDDEN SETTINGS =====
12:43:06.087  I  QUALCOMM build          : 3dad7f8ed7, I593c16c433
                 Build Date              : 10/01/21
                 Shader Compiler Version : EV031.32.02.02
                 Local Branch            : 
                 Remote Branch           : refs/tags/AU_LINUX_ANDROID_LA.UM.9.1.R1.11.00.00.604.073
                 Remote Branch           : NONE
                 Reconstruct Branch      : NOTHING
12:43:06.087  I  Build Config            : S P 10.0.7 AArch64
12:43:06.088  I  Driver Path             : /vendor/lib64/hw/vulkan.adreno.so
12:43:06.090  I  Vulkan Debug Report: object - -5476376679864662576
                 Validation Information: [ UNASSIGNED-CreateInstance-status-message ] Object 0: handle = 0xb400006cbc6f05d0, type = VK_OBJECT_TYPE_INSTANCE; | MessageID = 0x87d47f57 | Khronos Validation Layer Active:
                     Settings File: None. Default location is /\vk_layer_settings.txt.
                     Current Enables: None.
                     Current Disables: None.
12:43:06.090  E  USER WARNING: Vulkan Debug Report: object - -5476376679864662576
                 Validation Performance Warning: [ UNASSIGNED-CreateInstance-debug-warning ] Object 0: handle = 0xb400006cbc6f05d0, type = VK_OBJECT_TYPE_INSTANCE; | MessageID = 0xe8d1a9fe | VALIDATION LAYERS WARNING: Using debug builds of the validation layers *will* adversely affect performance.
12:43:06.090  E     at: _debug_report_callback (drivers/vulkan/rendering_context_driver_vulkan.cpp:324)
12:43:06.098  I  Vulkan Debug Report: object - -5476376679864653008
                 Validation Information: [ UNASSIGNED-cache-file-error ] Object 0: handle = 0xb400006cbc6f2b30, type = VK_OBJECT_TYPE_DEVICE; | MessageID = 0xf0bb3995 | Cannot open shader validation cache at /tmp/shader_validation_cache-10375.bin for reading (it may not exist yet)
12:43:06.100  I  Vulkan 1.1.128 - Forward Mobile - Using Device #0: Qualcomm - Adreno (TM) 640
12:43:06.104  I  [SurfaceView - com.example.examplecrash/com.godot.game.GodotApp@3176227@0#1(BLAST Consumer)1](id:dbe00000001,api:1,p:3518,c:3518) FrameBooster: VULKAN surface was catched
12:43:06.105  D  FrameBooster: InterpolationGui: UID 10375 detected as using Vulkan
12:43:06.105  E  USER WARNING: Vulkan Debug Report: object - -5476376679864662800
                 Validation Performance Warning: [ UNASSIGNED-CoreValidation-SwapchainPreTransform ] Object 0: handle = 0xb400006cbc6f04f0, type = VK_OBJECT_TYPE_PHYSICAL_DEVICE; | MessageID = 0x214b6dd0 | vkCreateSwapchainKHR(): pCreateInfo->preTransform (VK_SURFACE_TRANSFORM_IDENTITY_BIT_KHR) doesn't match the currentTransform (VK_SURFACE_TRANSFORM_ROTATE_90_BIT_KHR) returned by vkGetPhysicalDeviceSurfaceCapabilitiesKHR, the presentation engine will transform the image content as part of the presentation operation.
12:43:06.105  E     at: _debug_report_callback (drivers/vulkan/rendering_context_driver_vulkan.cpp:324)
12:43:09.271  D  Installing profile for com.example.examplecrash
12:43:09.886  D  PlayerBase::PlayerBase()
12:43:09.889  D  TrackPlayerBase::TrackPlayerBase()
12:43:09.889  I  Emulating old channel mask behavior (ignoring positional mask 0x3, using default mask 0x3 based on channel count of 2)
12:43:09.902  D  createTrack_l(0): AUDIO_OUTPUT_FLAG_FAST denied by server; frameCount 0 -> 1764
12:43:09.902  I  Need throttle time for OpenSLES player
12:43:09.911  I  
12:43:13.538  D  OnGodotSetupCompleted
12:43:13.541  D  OnGodotMainLoopStarted
12:43:14.925  I  [SurfaceView - com.example.examplecrash/com.godot.game.GodotApp@3176227@0#1(BLAST Consumer)1](id:dbe00000001,api:1,p:3518,c:3518) queueBuffer: queued for the first time.
12:46:13.097  W  restartIfDisabled(325): releaseBuffer() track 0xb400006dbc881040 disabled due to previous underrun, restarting
12:46:13.098  A  Fatal signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 0xf0 in tid 3677 (VkThread), pid 3518 (le.examplecrash)
12:46:13.100  E  USER ERROR: BUG: Unreferenced static string to 0: quarter_texture
12:46:13.100  E     at: unref (core/string/string_name.cpp:127)
12:46:13.100  E  USER ERROR: BUG: Unreferenced static string to 0: half_texture
12:46:13.100  E     at: unref (core/string/string_name.cpp:127)
12:46:13.100  E  USER ERROR: BUG: Unreferenced static string to 0: sky_buffers
12:46:13.100  E     at: unref (core/string/string_name.cpp:127)
12:46:13.100  E  USER ERROR: BUG: Unreferenced static string to 0: blur_0
12:46:13.100  E     at: unref (core/string/string_name.cpp:127)
12:46:13.100  E  USER ERROR: BUG: Unreferenced static string to 0: back_color
12:46:13.100  E     at: unref (core/string/string_name.cpp:127)
12:46:13.100  E  USER ERROR: BUG: Unreferenced static string to 0: back_depth
12:46:13.100  E     at: unref (core/string/string_name.cpp:127)
12:46:13.100  E  USER ERROR: BUG: Unreferenced static string to 0: Fog
12:46:13.100  E     at: unref (core/string/string_name.cpp:127)
12:46:13.100  E  USER ERROR: BUG: Unreferenced static string to 0: VRS
12:46:13.100  E     at: unref (core/string/string_name.cpp:127)
12:46:13.100  E  USER ERROR: BUG: Unreferenced static string to 0: _compression
12:46:13.100  E     at: unref (core/string/string_name.cpp:127)
12:46:13.230  A  FORTIFY: pthread_mutex_lock called on a destroyed mutex (0x6c1f5dde24)
TCROC commented 7 months ago

I just tested this with godot's player model here: https://github.com/godotengine/godot-demo-projects/blob/master/3d/platformer/player/player.glb

And it also crashed

Here's the new MRP using it example-crash3.zip

TCROC commented 7 months ago

After A LOT of debugging and tireless help from @clayjohn , I am reasonably confident to have bisected and determined this PR to be the root cause of the crash: https://github.com/godotengine/godot/pull/87888

Yay!!! :) Now I can sleep peacefully!

Edit:

Literally, its almost midnight and I'm very tired 😅 Excited to see what new discoveries await tomorrow!

akien-mga commented 7 months ago

CC @TokageItLab @reduz

TokageItLab commented 7 months ago

The only change related to the https://github.com/godotengine/godot/pull/87888 renderer was to reduce the duplicated Skin drawings, so perhaps a change in the BoneAttachment process is the problem. BoneAttachment has an abnormal process with overuse of signals compared to all other modules, and it may not be able to fully handle it on a particular platform.

@TCROC If there is no BoneAttachment, does it work correctly?

TokageItLab commented 7 months ago

I can come up with several possible fixes to BoneAttachment, but I have sent #90575 for now. Please test to see if it works.

Other possible fixes:

  1. Leave the existing implementation as it is and connect all signals in the BoneAttachment with deferred
  2. If #90575 has no effect, further #90575 is connected with deferred
TCROC commented 7 months ago

Just woke up. I'm gonna grab some coffee and check these out. Thanks @TokageItLab :)

TCROC commented 7 months ago

@TokageItLab

@TCROC If there is no BoneAttachment, does it work correctly?

It does not work even without BoneAttachment. In fact, this MRP here is using the godot player model and doesn't have any bone attachments on it: https://github.com/godotengine/godot/issues/90459#issuecomment-2050539348. I reproduced with both my Samsung and Google Pixel.

BUT

That isn't to say there isn't also an issue with bone attachment. In fact, before I opened this issue, I was working with @clayjohn in rocket chat to figure out why my game was crashing on Android when equipping items to my character. I found that changing an item's skeleton to bind to my character's skeleton or adding particle 3ds as a child of a bone attachment would induce a crash. So I decided to sync my fork with master and see if the skeleton changing and bone attachment bugs had been fixed. That is when I encountered this bug and my game just started crashing entirely before I got a chance to equip anything.

I'm hoping that they are all related and when we fix 1, we will fix all. But since this was so easily reproducible, I opened an issue with this as an MRP.

Now with all this said, I'm going to test out your PRs on my fork! :)

TL;DR; of above

  1. Discovered that changing skeletons at runtime caused crash
  2. Discovered that equipping particle 3d's to bone attachments caused crash
  3. Synced with master hoping 1 and 2 would fix
  4. Discovered that Skeleton + Directional Light + Shadow is now inducing a crash
  5. Going to test your PRs on my fork and report back! 🫡
TCROC commented 7 months ago

I just tested: https://github.com/godotengine/godot/pull/90575 and that did not fix this mrp: https://github.com/godotengine/godot/issues/90459#issuecomment-2050539348. Anything you would like me to try next?

And in the mean time, I'm going to revert the problematic PR out of my fork and test equipping items again. This will help narrow down whether we have 1 issue or multiple issues on our hands.

TokageItLab commented 7 months ago

Even after https://github.com/godotengine/godot/pull/87888, the scene should be working fine in most environments, and changed the related renderer part is really a little, so I'm wondering if https://github.com/godotengine/godot/pull/87888 is the cause truly.

The following experiment may reveal something:

TCROC commented 7 months ago

These are good experiments! I just tested the MRP: https://github.com/godotengine/godot/issues/90459#issuecomment-2050539348 with the revert in my fork here: https://github.com/Lange-Studios/godot/tree/test-revert-87888. The MRP is fixed with this revert. I will try these following experiments:

✅ = tested and works ❌ = tested and broken ❔ = untested

  1. My fork with revert ✅ mrp: https://github.com/godotengine/godot/issues/90459#issuecomment-2050539348 ❌ Changing skeletons at runtime: Details https://github.com/godotengine/godot/issues/90459#issuecomment-2052071636 ❌ Equipping particle gpu 3ds to bone attachment at runtime. Details: https://github.com/godotengine/godot/issues/90459#issuecomment-2052084494

  2. Godot master ✅ skeleton/bones without skins ✅ Procedurally add a Skeleton and bones on Godot by script with add_bone(): Note that this was without skin since skin causes crash. ❌ Localize the imported Skeleton and delete the imported file

TokageItLab commented 7 months ago

@TCROC I don't know if it has any effect, can you test the fix below?

https://github.com/TokageItLab/godot/tree/skeleton-skin-on-enter-tree

(Diff: https://github.com/TokageItLab/godot/commit/ca031f1e2bb6fd1205eb3ae01599ecf15096210f)

It apply pose to the skin immediately after enter tree instead of as a deferred process as the same with old skeleton.

TCROC commented 7 months ago

I'm currently compiling godot to test:

  • Equipping particle gpu 3ds to bone attachment at runtime
  • Changing skeletons at runtime

And as soon as that finishes compiling + I test it, I'll try your suggested diff above! :) Thanks!

TCROC commented 7 months ago

Changing skeletons at runtime with the revert crashes with the following logcat:

04-12 12:12:13.070 18950 19116 I godot   : Add child start: Cosmetics_Hat_Crown
04-12 12:12:13.070 18950 19116 E godot   : USER WARNING: Adding 'Cosmetics_Hat_Crown' as child to 'Skeleton3D' will make owner 'Crown' inconsistent. Consider unsetting the owner beforehand.
04-12 12:12:13.070 18950 19116 E godot   :    at: add_child (scene/main/node.cpp:1575)
04-12 12:12:13.070 18950 19116 I godot   : Add child end: Cosmetics_Hat_Crown
04-12 12:12:16.703 18950 19116 E godot   : USER ERROR: BUG: Unreferenced static string to 0: ShaderCompilation
04-12 12:12:16.704 18950 19116 E godot   :    at: unref (core/string/string_name.cpp:127)
04-12 12:12:16.704 18950 19116 E godot   : USER ERROR: BUG: Unreferenced static string to 0: Physics2DConstraintSolveIslands
04-12 12:12:16.704 18950 19116 E godot   :    at: unref (core/string/string_name.cpp:127)
04-12 12:12:16.704 18950 19116 E godot   : USER ERROR: BUG: Unreferenced static string to 0: Physics2DConstraintSetup
04-12 12:12:16.704 18950 19116 E godot   :    at: unref (core/string/string_name.cpp:127)
04-12 12:12:16.704 18950 19116 E godot   : USER ERROR: BUG: Unreferenced static string to 0: quarter_texture
04-12 12:12:16.704 18950 19116 E godot   :    at: unref (core/string/string_name.cpp:127)
04-12 12:12:16.704 18950 19116 E godot   : USER ERROR: BUG: Unreferenced static string to 0: half_texture
04-12 12:12:16.704 18950 19116 E godot   :    at: unref (core/string/string_name.cpp:127)
04-12 12:12:16.704 18950 19116 E godot   : USER ERROR: BUG: Unreferenced static string to 0: sky_buffers
04-12 12:12:16.704 18950 19116 E godot   :    at: unref (core/string/string_name.cpp:127)
04-12 12:12:16.704 18950 19116 E godot   : USER ERROR: BUG: Unreferenced static string to 0: blur_0
04-12 12:12:16.704 18950 19116 E godot   :    at: unref (core/string/string_name.cpp:127)
04-12 12:12:16.704 18950 19116 E godot   : USER ERROR: BUG: Unreferenced static string to 0: back_color
04-12 12:12:16.704 18950 19116 E godot   :    at: unref (core/string/string_name.cpp:127)
04-12 12:12:16.704 18950 19116 E godot   : USER ERROR: BUG: Unreferenced static string to 0: back_depth
04-12 12:12:16.704 18950 19116 E godot   :    at: unref (core/string/string_name.cpp:127)
04-12 12:12:16.704 18950 19116 E godot   : USER ERROR: BUG: Unreferenced static string to 0: Fog
04-12 12:12:16.704 18950 19116 E godot   :    at: unref (core/string/string_name.cpp:127)
04-12 12:12:16.704 18950 19116 E godot   : USER ERROR: BUG: Unreferenced static string to 0: VRS
04-12 12:12:16.704 18950 19116 E godot   :    at: unref (core/string/string_name.cpp:127)
04-12 12:12:16.705 18950 19116 E godot   : USER ERROR: BUG: Unreferenced static string to 0: name_changed
04-12 12:12:16.705 18950 19116 E godot   :    at: unref (core/string/string_name.cpp:127)
04-12 12:12:16.705 18950 19116 E godot   : USER ERROR: BUG: Unreferenced static string to 0: Variant
04-12 12:12:16.705 18950 19116 E godot   :    at: unref (core/string/string_name.cpp:127)
04-12 12:12:16.705 18950 19116 E godot   : USER ERROR: BUG: Unreferenced static string to 0: _compression
04-12 12:12:16.705 18950 19116 E godot   :    at: unref (core/string/string_name.cpp:127)
--------- beginning of crash
04-12 12:12:16.720 18950 19116 F libc    : FORTIFY: pthread_mutex_lock called on a destroyed mutex (0x6c2d1fe564)

This time we get a "USER WARNING" when changing the parent. I wonder if this has something to do with the crash. Now to test the particles and then going to test your suggested diff

TCROC commented 7 months ago

3D Gpu Particles parented to bone attachement:

04-12 12:26:54.704 24331 24371 E godot   : USER ERROR: BUG: Unreferenced static string to 0: quarter_texture
04-12 12:26:54.704 24331 24371 E godot   :    at: unref (core/string/string_name.cpp:127)
04-12 12:26:54.704 24331 24371 E godot   : USER ERROR: BUG: Unreferenced static string to 0: half_texture
04-12 12:26:54.704 24331 24371 E godot   :    at: unref (core/string/string_name.cpp:127)
04-12 12:26:54.704 24331 24371 E godot   : USER ERROR: BUG: Unreferenced static string to 0: sky_buffers
04-12 12:26:54.704 24331 24371 E godot   :    at: unref (core/string/string_name.cpp:127)
04-12 12:26:54.704 24331 24371 E godot   : USER ERROR: BUG: Unreferenced static string to 0: blur_0
04-12 12:26:54.704 24331 24371 E godot   :    at: unref (core/string/string_name.cpp:127)
04-12 12:26:54.704 24331 24371 E godot   : USER ERROR: BUG: Unreferenced static string to 0: back_color
04-12 12:26:54.704 24331 24371 E godot   :    at: unref (core/string/string_name.cpp:127)
04-12 12:26:54.704 24331 24371 E godot   : USER ERROR: BUG: Unreferenced static string to 0: back_depth
04-12 12:26:54.704 24331 24371 E godot   :    at: unref (core/string/string_name.cpp:127)
04-12 12:26:54.704 24331 24371 E godot   : USER ERROR: BUG: Unreferenced static string to 0: Fog
04-12 12:26:54.704 24331 24371 E godot   :    at: unref (core/string/string_name.cpp:127)
04-12 12:26:54.705 24331 24371 E godot   : USER ERROR: BUG: Unreferenced static string to 0: VRS
04-12 12:26:54.705 24331 24371 E godot   :    at: unref (core/string/string_name.cpp:127)
04-12 12:26:54.705 24331 24371 E godot   : USER ERROR: BUG: Unreferenced static string to 0: name_changed
04-12 12:26:54.705 24331 24371 E godot   :    at: unref (core/string/string_name.cpp:127)
04-12 12:26:54.705 24331 24371 E godot   : USER ERROR: BUG: Unreferenced static string to 0: Variant
04-12 12:26:54.705 24331 24371 E godot   :    at: unref (core/string/string_name.cpp:127)
04-12 12:26:54.705 24331 24371 E godot   : USER ERROR: BUG: Unreferenced static string to 0: _compression
04-12 12:26:54.705 24331 24371 E godot   :    at: unref (core/string/string_name.cpp:127)
--------- beginning of crash
04-12 12:26:54.707 24331 24371 F libc    : FORTIFY: pthread_mutex_lock called on a destroyed mutex (0x6c2c9fe564)

Nothing really useful out of that stacktrace. I'll enable validation layers and run both of those tests again to see if we get anything

TCROC commented 7 months ago

This is the vulkan validation error we get over and over again leading up to the crash:

04-12 12:43:52.225 26441 26483 E godot   : USER ERROR: Vulkan Debug Report: object - -2660799447520242049
04-12 12:43:52.225 26441 26483 E godot   : Validation Error: [ VUID-vkCmdDrawIndexed-magFilter-04553 ] Object 0: handle = 0xdb12f1000000167f, type = VK_OBJECT_TYPE_DESCRIPTOR_SET; Object 1: handle = 0xe025be0000001524, type = VK_OBJECT_TYPE_SAMPLER; Object 2: handle = 0x980b0000000002e, type = VK_OBJECT_TYPE_IMAGE_VIEW; | MessageID = 0x9c7248ee | vkCmdDrawIndexed: Descriptor set VkDescriptorSet 0xdb12f1000000167f[] Sampler (VkSampler 0xe025be0000001524[]) is set to use VK_FILTER_LINEAR with compareEnable is set to VK_FALSE, but image view's (VkImageView 0x980b0000000002e[]) format (VK_FORMAT_D16_UNORM) does not contain VK_FORMAT_FEATURE_SAMPLED_IMAGE_FILTER_LINEAR_BIT in its format features. The Vulkan spec states: If a VkSampler created with magFilter or minFilter equal to VK_FILTER_LINEAR and compareEnable equal to VK_FALSE is used to sample a VkImageView as a result of this command, then the image view's format features must contain VK_FORMAT_FEATURE_SAMPLED_IMAGE_FILTER_LINEAR_BIT (ht
04-12 12:43:52.225 26441 26483 E godot   :    at: _debug_report_callback (drivers/vulkan/rendering_context_driver_vulkan.cpp:327)
04-12 12:43:52.235 26441 26483 I BufferQueueProducer: [SurfaceView - com.langestudios.blockyballot/com.godot.game.GodotApp@86b8628@0#1(BLAST Consumer)1](id:674900000001,api:1,p:26441,c:26441) queueBuffer: queued for the first time.

I'll go try hardcoding the magfilter to make the error go away like I did earlier in rocket chat with @clayjohn . Then I'll also try building with single precision instead of double precision just to check the low hanging fruit off the list. Then I'll be back to test those diffs above.

TCROC commented 7 months ago

Okay I determined that my main project too heavily relies on double precision (due to our deterministic physics engine / network rollback). So instead I'm going to now try your diffs above. And if those don't fix it, then I'll get started on an MRP demonstrating the issues that still exist after the revert. Hopefully getting us closer to the root cause! :)

TCROC commented 7 months ago

@TCROC I don't know if it has any effect, can you test the fix below?

https://github.com/TokageItLab/godot/tree/skeleton-skin-on-enter-tree

(Diff: TokageItLab@ca031f1)

It apply pose to the skin immediately after enter tree instead of as a deferred process as the same with old skeleton.

@TokageItLab I just tried this diff on the MRP with the godot player model. Still crashes. I'm going to try these 3 questions you had on the MRP now:

Godot master ✅ skeleton/bones without skins ✅ Procedurally add a Skeleton and bones on Godot by script with add_bone() ❌ Localize the imported Skeleton and delete the imported file

TCROC commented 7 months ago

These are good experiments! I just tested the MRP: #90459 (comment) with the revert in my fork here: https://github.com/Lange-Studios/godot/tree/test-revert-87888. The MRP is fixed with this revert. I will try these following experiments:

✅ = tested and works ❌ = tested and broken ❔ = untested

1. My fork with revert
   ✅ mrp: [3D scene with Skeletons crashes on Android after SkeletonModifier3D refactoring #90459 (comment)](https://github.com/godotengine/godot/issues/90459#issuecomment-2050539348)
   ❌ Changing skeletons at runtime: Details [3D scene with Skeletons crashes on Android after SkeletonModifier3D refactoring #90459 (comment)](https://github.com/godotengine/godot/issues/90459#issuecomment-2052071636)
   ❌ Equipping particle gpu 3ds to bone attachment at runtime. Details: [3D scene with Skeletons crashes on Android after SkeletonModifier3D refactoring #90459 (comment)](https://github.com/godotengine/godot/issues/90459#issuecomment-2052084494)

2. Godot master
   ✅ skeleton/bones without skins
   ✅ Procedurally add a Skeleton and bones on Godot by script with add_bone(): Note that this was without skin since skin causes crash.
   ❌ Localize the imported Skeleton and delete the imported file

Hey @TokageItLab. Just finished running all these tests and these were the results. This was the script that I used for procedural bones:

extends Node

# Called when the node enters the scene tree for the first time.
func _ready() -> void:
    get_node("player/Skeleton/Skeleton3D").add_bone("test")
    var test_skel = Skeleton3D.new()
    add_child(test_skel)
    test_skel.add_bone("test2")

Also another thing to keep in mind is that as far as I can tell, you need a Directional light + Shadows enabled to cause the crash. At least with the master source code. I haven't tried disabling shadows on the revert.

What would you like me to try next if anything?

Edit:

The quote reply rendered kinda strange. Here's a link to the original comment where I updated things: https://github.com/godotengine/godot/issues/90459#issuecomment-2051783289

TokageItLab commented 7 months ago

@TCROC Thank you! It's interesting that it works correctly without Skin... Maybe it is important factor. Another thing that may be related is that #87888 had a fix of PhysicalBone. Skeleton3D currently has a PhysicalBoneSimulator as a internal child, but if you build with the DISABLE_DEPRECATED option enabled, it won't include it, so I'd like you to test that.

TCROC commented 7 months ago

I'll go test that now! :)

TCROC commented 7 months ago

@TokageItLab I have some interesting discoveries! :) Here is the list:

  1. Building with DISABLE_DEPRECATED breaks csharp / mono. Fortunately its an easy fix. We simply need to add some compiler definitions / flags to the Compat.cs files to not compile these when DISABLE_DEPRECATED is passed:

I simply commented those out on my local for now while we debug this.

  1. It fixes the MRP. Yay!!! 🎉🎉🎉. It behaves the same as when I revert the PR.

  2. It does not fix equipping items to a character by changing that item's skeleton or adding particle3ds as a child of a bone attachment. Both of these scenarios still result in a crash.

Is there anything you would like me to try next?

TCROC commented 7 months ago

@TokageItLab And one more really interesting thing that I just tested.

If I disabled shadows on the directional light, EVERYTHING works fine. Including changing skeletons of items and making particle gpu 3ds children of bone attachments.

This was the case for the MRP. I had not tested it on the Blocky Ball OT project. I just did and can confirm disabling shadows fixes things.

Maybe shadows are the issue? Maybe not and shadows simply push it over the edge due to a different mistake? Not sure. Is there anything you would like me to explore in this area (lighting and shadows)?

TokageItLab commented 7 months ago

@TCROC If it (DISABLE_DEPRECATED) fix MRP, it becomes possible that what was thought to be two causes is actually one.

So if the PhysicalBoneSimulator added to Skeleton cause crashing due to accessing the PhysicalBody, the precision error mentioned above may be causing everything. Regarding PhysicalBoneSimulator, recent crashes may be prevented by not accessing the PhysicalBody if the Skeleton does not activate it.

I may create a patch to avoid the crash for Skeleton later, but if you can do bisecting again of the other two crashes in more old commit, it might be able to find the cause.

TCROC commented 7 months ago

Sounds good! I will start the bisecting process! :)

TCROC commented 7 months ago

Just wanted to celebrate here quick before I go get dinner! So I started the bisecting process on my main project since I hadn't figured out how to reproduce any of the other crashes in an MRP. But, that turned out to take WAY too long to recompile over and over again bisecting. So I decided I do indeed need an MRP... AND I SUCCEEDED! :) I created an MRP to reproduce changing the skeleton of an item at runtime. Its a little messy at the moment, so I'll clean it up and share after later tonight :)

But basically:

I initially tried equipping the item to the player on _Ready, but that didn't crash. It worked just fine. Waiting 5 seconds and doing it in _Process caused the crash.

TCROC commented 7 months ago

This will make bisecting MUCH easier! :)

TokageItLab commented 7 months ago

@TCROC

If I disabled shadows on the directional light, EVERYTHING works fine. Including changing skeletons of items and making particle gpu 3ds children of bone attachments.

Will skeletons with skins work on the latest master without revert SkeletonModifier nor DISABLE_DEPRECATED in the case of disabled shadows? If that doesn't work, I would like you to test what happens when you place a PhysicalBody that does not use Skeleton.

TCROC commented 7 months ago

Just woke up. Going to grab some coffee. Then I'll test this and share the new MRP

TCROC commented 7 months ago

@TokageItLab I ran your tests above you asked as well as some additional things I thought of and have a bunch of new information! :)

First off:

Will skeletons with skins work on the latest master without revert SkeletonModifier nor DISABLE_DEPRECATED in the case of disabled shadows?

✅ Yes. When I disable shadows, everything works. Even on latest master without revert nor DISABLE_DEPRECATED.

If that doesn't work, I would like you to test what happens when you place a PhysicalBody that does not use Skeleton.

❔ The above worked so I couldn't test this. I actually initially tried and was getting confused and then realized I'm not going to get far enough to test because it will crash 😅. I missed the if in your question 😅

And here is the new MRP demonstrating changing skeletons in _process and causing crash:

example-crash-change-skeleton.zip

And this is the script that does the equipping:

extends Node

var secs: float = 0
var is_equipped: bool = false

# # This works
# func _ready() -> void:
#   var skeleton = get_node("Cube_Guy/Cube_Guy_Armature/Skeleton3D")
#   var crown = (load("res://Cosmetics_Hat_Crown.blend") as PackedScene).instantiate()
#   var mesh = crown.get_child(0).get_child(0).get_child(0) as MeshInstance3D
#   mesh.get_parent().remove_child(mesh)
#   crown.queue_free()
#   skeleton.add_child(mesh)

# This crashes
func _process(delta: float) -> void:
    if secs < 1:
        secs += delta
        return

    if is_equipped:
        return

    is_equipped = true
    var skeleton = get_node("Cube_Guy/Cube_Guy_Armature/Skeleton3D")
    var crown = (load("res://Cosmetics_Hat_Crown.blend") as PackedScene).instantiate()
    var mesh = crown.get_child(0).get_child(0).get_child(0) as MeshInstance3D
    mesh.get_parent().remove_child(mesh)
    crown.queue_free()
    skeleton.add_child(mesh)

The _ready() method that is commented out actually works fine and doesn't cause a crash. So if I equip and item right away on scene load, no crash. However, if I wait 1 second and do it in _process, it will crash.

I have been bisecting with this project and have not yet found a working version of godot. I started with the latest Godot_v4.3-dev5 and reproduced the crash. I checked out code a ways back in history (January I think? I need to confirm) and still reproduced the crash. I'm beginning to think we have 2 possible situations:

  1. We have 2 bugs. 1 that has been in the repo for a while and one that was newly introduced here: https://github.com/godotengine/godot/issues/90459#issuecomment-2050911264

  2. We have 1 bug that has been in the repo for a while and this commit made it appear more easily: https://github.com/godotengine/godot/issues/90459#issuecomment-2050911264

I'm personally hoping its option 2 so we can fix multiple bugs with 1 code fix.

I'm going to continue bisecting and hopefully find a working commit. Is there anything else you would like me to try while I do that? Also please let me know if you want any help reproducing that MRP. I will gladly hop on a call with you if you are unable to reproduce it and would like to.

TCROC commented 7 months ago

Yay!! Good news! I found a version that works: https://godotengine.org/download/archive/4.0-stable/

Granted it is almost a year back... BUT at least I have a window to work with when bisecting! :) I'll keep working off official releases until I find which official release broke it to narrow my window

TCROC commented 7 months ago

Ok making progress on the bisecting. Good news it appears to be isolated to 4.3. I tested 4.2.2-rc3 and it worked fine

TCROC commented 7 months ago

AYO!!!! New big find! :) The bug was introduced in: https://godotengine.org/download/archive/4.3-dev2/

So now I just need to bisect the commits between 4.3-dev1 and 4.3-dev2 :)

TCROC commented 7 months ago

@TokageItLab I found the problematic commit!

https://github.com/godotengine/godot/commit/e9695d9fa22113f41348a0d2c7edb1138efc2b22

And here is the PR with it:

https://github.com/godotengine/godot/pull/84976

I tried reverting it in my fork but it wasn't as simple as just "taking their changes" on everything when reverting. So I couldn't quickly test if the revert fixes it off master.

But we found it! :) This is when the bug was introduced! At least in regards for this MRP: https://github.com/godotengine/godot/issues/90459#issuecomment-2053669239

Checking out the PR just before this PR, the MRP works. Then checking out that PR, it crashes.

TokageItLab commented 7 months ago

@TCROC Thanks for the effort! I think we have focus on the cause of the problem quite well. Finally, it would be great if you could identify which environment have crashes with MRP between MONO and GDScript, and precision for double and float, respectively.

TCROC commented 7 months ago

@TokageItLab You are very welcome! :)

I tested tested the MRP without mono and with single precision. This was the scons command:

scons compiledb=false deprecated=false generate_apk=yes arch=arm64 vulkan=yes platform=android debug_symbols=true target=template_debug

I also reproduced using the "standard" Linux editor and "standard" export templates here: https://godotengine.org/download/archive/4.3-dev2

Editor: https://github.com/godotengine/godot-builds/releases/download/4.3-dev2/Godot_v4.3-dev2_linux.x86_64.zip

Export Templates: https://github.com/godotengine/godot-builds/releases/download/4.3-dev2/Godot_v4.3-dev2_export_templates.tpz

TCROC commented 7 months ago

And my main project uses both mono and double precision. I also reproduce the crashes in there. So I have tested in both kinds and the crash occurred in both.

TCROC commented 7 months ago

So I guess to summarize:

"Single Precision + Mono Disabled" = crash "Double Precision + Mono Enabled" = crash

Would you like me to test any other variations of this matrix?