godotengine / godot

Godot Engine – Multi-platform 2D and 3D game engine
https://godotengine.org
MIT License
89.46k stars 20.25k forks source link

Performance: Cube with an empty _Process method is slow in debugger but fast in exported binary #93232

Open warappa opened 3 months ago

warappa commented 3 months ago

Tested versions

System information

Windows 11 - Godot 4.3-beta1

Issue description

When instantiating a simple cube mesh 10000-20000 times, I noticed a difference in performance between "Starting from Editor" and "Starting Exported Binary".

Initially, both execution variants are performant. Then I add a C# script and the performance stays the same. But then I add an empty(!) _Process method and the performance tanks.

Steps to reproduce

Setup Project

  1. Create new project
  2. Create a root of type Node3D
  3. Inside it, create a new CsgBox3D node
  4. Convert it to a mesh with addon CSGToMesh
  5. Remove the CsgBox3D and rename the mesh to SimpleCube
  6. Save it as its own branch
  7. Attach an empty C# script to SimpleCube
    public partial class SimpleCube : MeshInstance3D
    {
    }
  8. In the root node import the simple_cube.tscn

Instantiate 10000 Times - Good Performance

  1. In its _Ready method create 20000 instances
  2. See that it performs well (>60FPS on my system (Ryzen 5800X, RX 6800XT))

Add Empty _Process Method - Bad Performance

  1. Edit SimpleCube.cs to have an empty _Process method
    public partial class SimpleCube : MeshInstance3D
    {
    public override void _Process(double delta)
    {
        base._Process(delta);
    }
    }
  2. Start again and see the performance is much slower (~ 30FPS on my system)
  3. See in the profiler that most time is spent on Process Time

Export a Release - Good Performance

  1. Export it as a binary
  2. Run the exported binary and see that the performance is good again

You can add something like a counter to _Process just to ensure it is not optimized away by the compiler, eg.

public partial class SimpleCube : MeshInstance3D
{
    public double Counter = 0;
    public override void _Process(double delta)
    {
        base._Process(delta);

        Counter++;
    }
}

Minimal reproduction project (MRP)

SlowProcessRepo.zip

AThousandShips commented 3 months ago

Does it happen if you remove the base._Process? I don't think it's needed here as _Process is handled by the engine itself and doesn't rely on virtual methods (none of the code templates or official demos use the parent call)

warappa commented 3 months ago

Yes, even with just a completely empty method it has this performance issues.

public partial class SimpleCube : MeshInstance3D
{
    public override void _Process(double delta)
    {
    }
}
Hilderin commented 3 months ago

I investigated a similar problem for the issue #89217

The problem seems the way the engine calls the C# methods. There are a lot of string comparisons to check which method to call in the generated C# partial class.

Example:

[global::System.ComponentModel.EditorBrowsable(global::System.ComponentModel.EditorBrowsableState.Never)]
protected override bool InvokeGodotClassMethod(in godot_string_name method, NativeVariantPtrArgs args, out godot_variant ret)
{
    if (method == MethodName._Process && args.Count == 1) {
        _Process(global::Godot.NativeInterop.VariantUtils.ConvertTo<double>(args[0]));
        ret = default;
        return true;
    }
    return base.InvokeGodotClassMethod(method, args, out ret);
}

I did not create a PR because it needs quite a lot of modifications to execute calls from delegates or pointer functions but I made it work. I still need to do some testing and adjustments. I'll continue to work on it and keep you updated.

tatudev commented 3 weeks ago

Has anyone found any good workarounds to this in the meantime? At the moment C# seems unusable if you want to have over 100 nodes with scripts in a given scene (which is likely over half of the games using C# in the first place). My current workaround is to substitute an Enemy node, with an Enemy.cs script with:

Doing this I was able to go from 120 enemies on a scene to about 500 enemies on screen without multi-threading. Although this works performance-wise, it interfaces very poorly with existing Godot APIs and I'm very quickly accruing technical debt. I might be overblowing it, but this seems like a blocking issue for most C# projects.

Hilderin commented 3 weeks ago

I'm surprised you have problems with only 50 CharacterBody3D. Usually, even a thousand of nodes and even in Debug, C# is still fast. Not as fast as GDScript on empty objects, but still very fast. You can see some more benchmarks here: #89217. Maybe there's something else that is wrong. Are you able to upload an minimum reproducible project? I suggest you create an new issue with more details and such.

tatudev commented 3 weeks ago

I have not been able to repro the issue in a fresh project, the most I've been able to check is that performance decreases with the number of functions in a file. If I'm just spawning Node3Ds, with just 1 PhysicsProcess function I saturate at 16k nodes, with about 20 empty functions added, I saturate at 8k. This is not even close to the numbers I saw when profiling my project. I'll try reworking the code in my current project and will check if performance drops. If that's the case I will create a new issue to report my findings.

Sorry about cluttering the thread with this and thanks for the response, Hilderin.

Hilderin commented 3 weeks ago

You have a really good point about the number of methods in a C# script. That will affect the performance drastically with the current implementation. Each time Godot Engine tries to call a method to your C# script, it iterates on all method name in your script. That was what I was working for to optimize a couple months back but never finish it. I'll try to take another look.