godotengine / godot

Godot Engine – Multi-platform 2D and 3D game engine
https://godotengine.org
MIT License
91.07k stars 21.18k forks source link

Godot 4.x is significantly slower than 3.5.1 in creating nodes #71182

Open Overvault-64 opened 1 year ago

Overvault-64 commented 1 year ago

Godot version

4.0.beta10

System information

Windows 10 and Android

Issue description

In my projects I often need to create UI at runtime, instantiating a lot of elements. This brought to my attention a relevant performance issue in Godot 4: when creating nodes, or instantiating them, it's about 4 times slower than Godot 3.

I've repeated the tests several times with my test projects making both versions instantiate a scene (a button containing 3 child nodes) different amount of times, from 1000 to 8000, and the result goes from more than 4 times to almost 4 times slower.

Godot 4 crashed when instantiating more than 8000 of those scenes, while Godot 3 handled more than 10000 comfortably. My hardware was below 30% workload all the time.

I've also tried creating the button at runtime and it makes no significant difference compared to instantiating a pre-made scene.

image

image

This wouldn't be a real-world problem for most developers (except me, I guess) if it wasn't way amplified on mobile. My mobile test device required 6-7 times more time to complete the task on both versions, leading to an unhealthy 21 seconds with 8000 scenes on the Godot 4 build. The Godot 4/Godot 3 lag ratio on mobile is quite the same as on pc.

Minimal projects attached

Steps to reproduce

Start the projects

Minimal reproduction project

node.test.zip

Calinou commented 1 year ago

Related to https://github.com/godotengine/godot/issues/61929.

Are you using an official engine build in both 3.x and 4.0.beta (i.e. with the same optimizations)?

wardPlaced commented 1 year ago

Doesnt that mean that the engine is capable to handle more nodes with less of a ressource uptake or spike thereof? Isnt it possible to adjust these things in the profiler? I think Godot has one already. So that needs more thorough observation and testing. Plus, running them alongside is not indicative. Some CPUs are capable of repeating workloads when they understood them I heard... they dont seem to be in circulation. Can people change that. Sorry for being so unfactual. Yes, optimization does not matter if you can adjust the thread size, but that could limit mobile phone performance, I mean Android specificaly, because you are locking yourself out of certain devices that dont have the RAM to deal with the work load, even with blast processing and spiked CPUs, which they all are in the Mobile hemisphere, so clocktimes must be an issue for the new release, and if so, why are they not? If its all software anyways and we are reading from self updating text files then thats Linux fixing that. I feel like this is way more interessting if you think closer than with an open eye, BUT heres the deal... it just takes longer because people want to have more nodes in their projects created by hand or while running, right? so that takes more power on average to handle because thats how streams and data work, once you turn it into something closer to a machine, you can rearrange it and it will still present the same way to the user with less data taken up in all regards. Sounds crazy its the truth, not advised for smaller data sizes, like a thousands buttons... but you have a keen eye... good job! im not an expert! but that seems to be the general thing when it comes to handeling LOTS of things at THE SAME time. everything else is just an illusion of that and its handled one by one, which doesnt create the necessary behavior anyway.

Overvault-64 commented 1 year ago

Related to #61929.

Are you using an official engine build in both 3.x and 4.0.beta (i.e. with the same optimizations)?

Yes

Overvault-64 commented 1 year ago

Still slow in beta13

image

Overvault-64 commented 1 year ago

4.0.1 (stable) image

Calinou commented 1 year ago

Can you reproduce this with a self-compiled editor build with the production=yes module_text_server_advanced_enabled=no module_text_server_fb_enabled=yes SCons options? This uses a simpler and faster TextServer that has advanced features disabled (no right-to-left or complex scripts).

Riteo commented 1 year ago

@Calinou good point. While the issue title describes "nodes" in general, this benchmark uses UI nodes, which aren't exactly simple.

Overvault-64 commented 1 year ago

@Calinou I don't have a compile environment set up, but @Riteo 's comment made me think that I can benchmark different kind of nodes and look at the results. This way I could see which node types are harder on the engine and maybe identify a common cause. Makes sense?

Calinou commented 1 year ago

but @Riteo 's comment made me think that I can benchmark different kind of nodes and look at the results. This way I could see which node types are harder on the engine and maybe identify a common cause. Makes sense?

You can try to do that, but the best way to isolate the bottleneck is to switch TextServers as I mentioned. I get a strong feeling the slowness is due to text shaping, not node creation. Text shaping in 4.0 regularly comes up as one of the most demanding operations when I look at results in a C++ profiler (the editor profiler won't show it).

You can also use a C++ profiler on a debug build of the engine.

Overvault-64 commented 1 year ago

@Calinou I hope I did it right Here are the results but I don't know how to read them

I've used the godot-4.0-editor-debug-windows-msvc2022 build

Overvault-64 commented 1 year ago

4.1-beta1 image

Overvault-64 commented 1 year ago

4.1-stable (still the same exact hardware and configuration) image

Overvault-64 commented 1 year ago

4.2.beta1 image

Zireael07 commented 1 year ago

A hunch: What if you disable advanced text server when compiling?

Overvault-64 commented 1 year ago

A hunch: What if you disable advanced text server when compiling?

I can't compile :(

Overvault-64 commented 10 months ago

v4.3.dev1.official [9d1cbab1c] image

duarteroso commented 7 months ago

I am seeing something similar but unlike @Overvault-64 I don't have a 3.x version to compare with.

Calling Node.Instantiate<Control>() 20x takes a considerable amount of time on Android. The game does not freeze but it can clearly be seen that it takes a couple of seconds for the UI to render.

Running 4.3-dev5.mono

https://github.com/godotengine/godot/assets/11413364/d06f1135-744f-4f9c-8ca6-796480e447a8

In the video, after pressing the button, 20 other buttons will be instantiated (here I left them as simple as possible and instantiate Controls instead, so none of those buttons are actually visible). Notice that the title of the next menu (5x5) takes a couple of seconds to appear.

Veradictus commented 7 months ago

Still present even in 4.2.2. To mitigate this I have created queues for each instance I need to load a bunch of elements at once, and parse one instantiation per frame.

luckyabsoluter commented 6 months ago

(scene as PackedScene).instantiate() and (node as Node).add_child() (also contain Label nodes) are slow but editor profiler can't catch it. Keep it in mind.

merksk8 commented 6 months ago

Adding more information for if it's useful.

On version 4.2.2 (stable) and 4.3 (dev5), using C#, seems to slow down the instance of a PackedScene if it contains either Shader Materials or Particle Process Material with "Resource Local To Scene" toggled on.

The difference in my tests it's ~x1.5 times slower with Shader Materials and ~x10 times slower in the case of Particle Process Materials.

Timings are without adding as child, only instantiating the PackedScene:

Node2D + Sprite2D without any Shader Material Finished instantiating 10000 nodes: res://scene_no_materials.tscn Total time 99 ms. 992842 ticks

Node2D + Sprite2D, with a Shader Material, Resource Local To Scene turned OFF Finished instantiating 10000 nodes: res://scene_with_material.tscn Total time 73 ms. 739936 ticks

Node2D + Sprite2D, with a Shader Material, Resource Local To Scene turned ON Finished instantiating 10000 nodes: res://scene_with_instantiated_material.tscn Total time 133 ms. 1335912 ticks

Node2D + GPUParticles2D, with a ParticleProcessMaterial, Resource Local To Scene turned OFF Finished instantiating 10000 nodes: res://scene_with_particles.tscn Total time 78 ms. 787835 ticks

Node2D + GPUParticles2D, with a ParticleProcessMaterial, Resource Local To Scene turned ON Finished instantiating 10000 nodes: res://scene_with_instantiated_particles.tscn Total time 782 ms. 7829485 ticks

Calinou commented 6 months ago

On version 4.2.2 (stable) and 4.3 (dev5), using C#, seems to slow down the instance of a PackedScene if it contains either Shader Materials or Particle Process Material with "Resource Local To Scene" toggled on.

This is unrelated to the issue mentioned here, as the cause is entirely different.

In this situation, a shader needs to be compiled every time the PackedScene is instanced, because the Shader instance is unique. You need to ensure the shader resource is shared across instances somehow. Also, ParticleProcessMaterial needs more time to compile than a bare ShaderMaterial as it's much more complex (it's a premade ShaderMaterial with dozens of uniforms and potentially hundreds of lines of code).

Excessive shader amounts will also slow down drawing because of the high number of state changes/draw calls required.

merksk8 commented 6 months ago

This is unrelated to the issue mentioned here, as the cause is entirely different.

I see, sorry then for mixing the topic!

And also thanks for the clear explanation, I'll have that in mind and will use it only when it's really really needed.

duarteroso commented 6 months ago

@Calinou Any updates on this? Do you need help testing/debugging it? I'm not familiar with the Godot development process but can spend some time trying (before the sun comes back 🌞 )

Calinou commented 6 months ago

Any updates on this? Do you need help testing/debugging it?

I suggest testing what I mentioned here: https://github.com/godotengine/godot/issues/71182#issuecomment-1483147660

Make sure to compile with release optimizations as well (production=yes), so that the result is more comparable with official builds, and use MinGW instead of MSVC if targeting Windows (as that's what official binaries use).

duarteroso commented 6 months ago

Any updates on this? Do you need help testing/debugging it?

I suggest testing what I mentioned here: #71182 (comment)

Make sure to compile with release optimizations as well (production=yes), so that the result is more comparable with official builds, and use MinGW instead of MSVC if targeting Windows (as that's what official binaries use).

In my use case there are no labels/texts involved. Simply by instantiating a bunch of scenes with only a Control gives me this significant delay.

Will still try to run from source and take it from there 🚀

alexandre-langlais commented 4 months ago

I'm in the same boat as OP, I'm making a sudoku with a level sélection screen, each level represented by a button. On Android it takes about 2 seconds to instantiate 1000 buttons. So for now I must limit my game to 3000 levels to keep an acceptable load time.

Calinou commented 4 months ago

On Android it takes about 2 seconds to instantiate 1000 buttons. So for now I must limit my game to 3000 levels to keep an acceptable load time.

You should look into pagination, so that you don't need to create thousands of buttons at the same time.

alexandre-langlais commented 4 months ago

On Android it takes about 2 seconds to instantiate 1000 buttons. So for now I must limit my game to 3000 levels to keep an acceptable load time.

You should look into pagination, so that you don't need to create thousands of buttons at the same time.

That would certainly be a workaround, but it is more code for me and less convenient for the end user to have to click through the pages than just scrolling.

Calinou commented 4 months ago

That would certainly be a workaround, but it is more code for me and less convenient for the end user to have to click through the pages than just scrolling.

In this case, you need something like https://github.com/godotengine/godot-proposals/issues/9678. It's just not technically feasible to have thousands of active buttons at once in the SceneTree and expecting it to perform well. Even web browsers struggle when faced with similar situations, despite having decades of optimizations baked in.

Overvault-64 commented 4 months ago

It's just not technically feasible to have thousands of active buttons at once in the SceneTree and expecting it to perform well.

Certainly this is not a common scenario and I think it should always be avoided if possible. In fact, when I detected this issue I immediately changed approach, as the performance drop in Godot 4 was noticeably affecting my project. Nonetheless, I thought it was interesting to investigate the reasons behind the gap with Godot 3, so I continued to monitor the situation. Indeed, the issue here is not about 4's absolute performance, but relative to 3.