godotengine / godot

Godot Engine – Multi-platform 2D and 3D game engine
https://godotengine.org
MIT License
90.97k stars 21.16k forks source link

C# Game crashes randomly on changing scenes with USER ERROR: FATAL: Condition "gchandle.is_released()" is true #83762

Open MattSayer opened 1 year ago

MattSayer commented 1 year ago

Godot version

v4.1.1.stable.mono.official.bd6af8e0e

System information

Godot v4.1.1.stable.mono - Windows 10.0.19045 - Vulkan (Forward+) - dedicated NVIDIA GeForce RTX 3080 (NVIDIA; 31.0.15.3623) - Intel(R) Core(TM) i7-10700KF CPU @ 3.80GHz (16 Threads)

Issue description

When switching scenes in my C# project, randomly the game will crash with the following stack trace:

crashlog

USER ERROR: FATAL: Condition "gchandle.is_released()" is true.
   at: mono_object_disposed_baseref (modules/mono/csharp_script.cpp:1835)

In the console output, this occurs as well:

Cannot open file '/root/godot/modules/mono/glue/GodotSharp/GodotSharp/Godot.SourceGenerators.Internal/Godot.SourceGenerators.Internal.UnmanagedCallbacksGenerator/Godot.NativeInterop.NativeFuncs.generated.cs'.
  Failed to read file: '/root/godot/modules/mono/glue/GodotSharp/GodotSharp/Godot.SourceGenerators.Internal/Godot.SourceGenerators.Internal.UnmanagedCallbacksGenerator/Godot.NativeInterop.NativeFuncs.generated.cs'.
  Cannot load C# script file '/root/godot/modules/mono/glue/GodotSharp/GodotSharp/Godot.SourceGenerators.Internal/Godot.SourceGenerators.Internal.UnmanagedCallbacksGenerator/Godot.NativeInterop.NativeFuncs.generated.cs'.
  Failed loading resource: /root/godot/modules/mono/glue/GodotSharp/GodotSharp/Godot.SourceGenerators.Internal/Godot.SourceGenerators.Internal.UnmanagedCallbacksGenerator/Godot.NativeInterop.NativeFuncs.generated.cs. Make sure resources have been imported by opening the project in the editor at least once.

The issue only occurs sporadically, which is making it very difficult to debug. Scene-switching is handled by an Autoload singleton, which loads the next scene in the background via ResourceLoader.LoadThreadedRequest while loading immediately (i.e. blocked, not threaded) a loading-screen scene and disposing of the previous scene.

I would really appreciate any help in debugging this!

Steps to reproduce

It randomly occurs when switching scenes, with no clear indication of why most of the time it works fine but other times sporadically crashes.

Minimal reproduction project

I've included the autoload script responsible for switching between scenes, as it's my best guess as to what is causing this crash. I can upload the full project if necessary, but with how sporadic the crash is creating an MRP is incredibly difficult.

GameManager.zip

AThousandShips commented 1 year ago

Can you please try this with a supported version, like 4.1.2 (only the latest patch version is supported, and it might already be fixed)

MattSayer commented 1 year ago

Thanks! I'll update to 4.1.2 and report back.

MattSayer commented 1 year ago

So I've updated to 4.1.2 and have simplified my scene-switching code to just GetTree().Root.RemoveChild(_currentScene); _currentScene.QueueFree();

Followed by a deferred call to load the next scene

PackedScene nextScene = (PackedScene)GD.Load(_nextLevelPath); _currentScene = nextScene.Instantiate(); GetTree().Root.AddChild(_currentScene); GetTree().CurrentScene = _currentScene;

I haven't hit the original issue, but now I've encountered a new crash condition. Like the original, it is occurring seemingly randomly with no clear cause.

USER ERROR: Condition "!sdata.is_valid()" is true. Returning nullptr at: instantiate (scene/resources/packed_scene.cpp:171) USER ERROR: Parameter "p_child" is null. at: add_child (scene/main/node.cpp:1397)

Followed by a null reference exception when my code tries to reference the new scene. I've added extra logging to determine whether it's the GD.Load method returning null, or the Instantiate call, but haven't been able to replicate as of yet.

MattSayer commented 1 year ago

In trying to further debug this error with try catch blocks and null checks, I'm now getting yet another fresh crash error:

Fatal error: System.AccessViolationException: Attempted to read or write protected memory. This is often an indication that other memory is corrupt.

The first line of the stack trace is: at Godot.NativeCalls.godot_icall_1_377(IntPtr, IntPtr, Int32)

followed by the load level method in my singleton class.

Faradax commented 10 months ago

I'm seeing the original problem in v4.2.1.stable.mono.official [b09f793f5] as well:

ERROR: FATAL: Condition "gchandle.is_released()" is true. at: mono_object_disposed_baseref (modules/mono/csharp_script.cpp:1857)

For me, this also happens randomly as well (~half of the time?). Differing from the original report, I also randomly get this error if I start the specific scene directly. So no background loading or scene switching code of mine is executed - I just launch the Scene via "F6" in the editor. Interestingly enough, another similar scene does not have this problem. Not sure if that's of any help, but: The crashing scene has 2096 nodes than and references 62 resources. The working scene has around 3000 nodes and references 33 Resources.

hippogyz commented 10 months ago

Have you met this problem in exported project?

I also meet this problem in godot 4.2.1. In my case, here's some points for triggering this error (not sure):

1. Call QueueFree() to a Node which takes complicated references. (for example, CustomCharacterBody3D extends from CharacterBody3D) 2. This node should be the last instance in current SceneTree. 3. In editor mode.

However, in the exported game, the crash never happens under same operation.

------------------ Update ------------------ It seems my supposition about exported project is wrong. Previously, my uncrashed version takes Native-AOT for exporting. However, the exported project still crashes randomly if I cancel the AOT option when exporting.

In addition, the reason I cancel AOT is that the AOT exported game performs badly when Node.GetChild() is called. This method takes huge cost on c# reflection for some reason.

------------------ Update ------------------

The crash should be caused by disposing custom Resource when C# gc works.

As the doc mentioned, https://docs.godotengine.org/en/stable/classes/class_resource.html

In C#, resources will not be freed instantly after they are no longer in use. Instead, garbage collection will run periodically and will free resources that are no longer in use. This means that unused resources will linger on for a while before being removed.

I found a temporary method for avoiding this crash, which just handle the loaded resource by a static reference like this:

public static T LoadResource<T>(string resPath) where T : Resource
{
    if(!loadedResources.TryGetValue(resPath, out var loadedRes))
    {
        if(!ResourceLoader.Exists(resPath))
            return null;
        loadedRes = ResourceLoader.Load<Resource>(resPath);
        loadedResources[resPath] = loadedRes;
    }
    return loadedRes as T;
}
readonly private static Dictionary<string, Resource> loadedResources = [];

Use the static method to replace ResourceLoader.Load().

hippogyz commented 10 months ago

Also mentioned in #75131

DarkKowalski commented 10 months ago

I got some clues.

  1. In CSharpInstance::refcount_decremented() , gchandle.handle.type has been set to WEAK_HANDLE, https://github.com/godotengine/godot/blob/4.2.1-stable/modules/mono/csharp_script.cpp#L1976
  2. Then CLR released the object referenced by the gchandle
  3. And then CSharpInstance::refcount_incremented() reset gchandle.handle to nullptr and got !target_alive, https://github.com/godotengine/godot/blob/4.2.1-stable/modules/mono/csharp_script.cpp#L1941
  4. Finally we went into Dispose with gchandle.value == nullptr && gchandle.type == WEAK_HANDLE, https://github.com/godotengine/godot/blob/4.2.1-stable/modules/mono/csharp_script.cpp#L1861

This code would easily fix it but I am not sure whether it will introduce any other bugs or not.

// modules/mono/csharp_script.cpp
void CSharpInstance::mono_object_disposed_baseref(GCHandleIntPtr p_gchandle_to_free, bool p_is_finalizer, bool &r_delete_owner, bool &r_remove_script_instance) {
#ifdef DEBUG_ENABLED
    CRASH_COND(!base_ref_counted);
    CRASH_COND(!gchandle.is_weak() && gchandle.is_released());
#endif
dsh0416 commented 10 months ago
CRASH_COND(!gchandle.is_weak() && gchandle.is_released());

Interesting investigation, which perfectly matches the crash dump that we are currently investigating. I have made a PR https://github.com/godotengine/godot/pull/87045 on this, and I hope this should fix, or at least partially fixed the problem mentioned in this issue.

paulloz commented 7 months ago

@DarkKowalski are you by any chance able to share the code you used to make those observations?

MrBrax commented 5 months ago

Encountered the same issue in my project, also using threaded loading. Good thing there's a workaround at least.

orochii commented 3 months ago

Had this error showing up, didn't consider it being an issue regarding the ResourceLoader, but hippogyz's solution actually helped.

Before, the game would run for anywhere between 1 and 2 minutes then hard crash with only the errors in Godot's Debugger. I also checked that on my case whatever I was doing was crashing unattended (loading some stuff on startup and so on). Tried at least 5 times or so just for better statistics (?) and none went over the 3 minutes mark of runtime.

Now it's been running for about 1 hour and a half, and it hasn't crashed so far.

Hopefully this "report" helps in some way.

bs-mwoerner commented 1 month ago

I had never encountered this problem before, but I just released a demo to Steam and immediately got several reports that the game would not progress past the loading screen. This is on Godot 4.2.2. A log file I got had

USER ERROR: Condition "!sdata.is_valid()" is true. Returning: nullptr
   at: instantiate (scene/resources/packed_scene.cpp:220)

Replacing the ResourceLoader.LoadThreadedRequest() approach with one based on GD.Load()[1] fixed the problem for those users.

I think what happens here is that LoadThreadedRequest() produces a corrupted PackedScene object if a .NET garbage collection happens while the loading is in progress. This will affect only those systems that actually do a garbage collection during this period. For loads that happen early in the game, this is probably pretty deterministic: If it fails on a system, it will (almost) always fail on that system. If it doesn't, then it is unlikely to fail on that system in the future. Loads during the game will have more varying circumstances, so I would expect the problem to be more random there.

GD.Load() may not entirely prevent the problem, but because it completes the entire load on the main thread and during a single frame, it's probably much less likely for a background garbage collection to occur during this time. It shouldn't make a difference whether the result is then stored in a static dictionary, because at this point, the problematic part should already be over and Load()'s return value itself is a reference to the resource.

If I continuously force garbage collections to occur on a separate thread during a Load(), I typically get an access violation in the engine when the destructor of some Variant object tries to dereference an invalid pointer. I think with .NET's background garbage collection, a Dispose() method may be called at any time and off the main thread as soon as a managed object is not actively referenced. Apparently, this can cause this thread to come in contact with code that is not thread-safe.

I suppose we'll need a deeper investigation if we want to truly get to the bottom of this. For the time being, a sensible recommentation would be Don't use RessourceLoader.LoadThreadedRequest() in production code with C#, use RessourceLoader.Load() or GD.Load() instead.

[1] To avoid confusion: GD.Load() is a shorthand for ResourceLoader.Load(). Using either is fine.