Open Overvault-64 opened 1 year ago
Related to https://github.com/godotengine/godot/issues/61929.
Are you using an official engine build in both 3.x and 4.0.beta (i.e. with the same optimizations)?
Doesnt that mean that the engine is capable to handle more nodes with less of a ressource uptake or spike thereof? Isnt it possible to adjust these things in the profiler? I think Godot has one already. So that needs more thorough observation and testing. Plus, running them alongside is not indicative. Some CPUs are capable of repeating workloads when they understood them I heard... they dont seem to be in circulation. Can people change that. Sorry for being so unfactual. Yes, optimization does not matter if you can adjust the thread size, but that could limit mobile phone performance, I mean Android specificaly, because you are locking yourself out of certain devices that dont have the RAM to deal with the work load, even with blast processing and spiked CPUs, which they all are in the Mobile hemisphere, so clocktimes must be an issue for the new release, and if so, why are they not? If its all software anyways and we are reading from self updating text files then thats Linux fixing that. I feel like this is way more interessting if you think closer than with an open eye, BUT heres the deal... it just takes longer because people want to have more nodes in their projects created by hand or while running, right? so that takes more power on average to handle because thats how streams and data work, once you turn it into something closer to a machine, you can rearrange it and it will still present the same way to the user with less data taken up in all regards. Sounds crazy its the truth, not advised for smaller data sizes, like a thousands buttons... but you have a keen eye... good job! im not an expert! but that seems to be the general thing when it comes to handeling LOTS of things at THE SAME time. everything else is just an illusion of that and its handled one by one, which doesnt create the necessary behavior anyway.
Related to #61929.
Are you using an official engine build in both 3.x and 4.0.beta (i.e. with the same optimizations)?
Yes
Still slow in beta13
4.0.1 (stable)
Can you reproduce this with a self-compiled editor build with the production=yes module_text_server_advanced_enabled=no module_text_server_fb_enabled=yes
SCons options? This uses a simpler and faster TextServer that has advanced features disabled (no right-to-left or complex scripts).
@Calinou good point. While the issue title describes "nodes" in general, this benchmark uses UI nodes, which aren't exactly simple.
@Calinou I don't have a compile environment set up, but @Riteo 's comment made me think that I can benchmark different kind of nodes and look at the results. This way I could see which node types are harder on the engine and maybe identify a common cause. Makes sense?
but @Riteo 's comment made me think that I can benchmark different kind of nodes and look at the results. This way I could see which node types are harder on the engine and maybe identify a common cause. Makes sense?
You can try to do that, but the best way to isolate the bottleneck is to switch TextServers as I mentioned. I get a strong feeling the slowness is due to text shaping, not node creation. Text shaping in 4.0 regularly comes up as one of the most demanding operations when I look at results in a C++ profiler (the editor profiler won't show it).
You can also use a C++ profiler on a debug build of the engine.
@Calinou I hope I did it right Here are the results but I don't know how to read them
I've used the godot-4.0-editor-debug-windows-msvc2022
build
4.1-beta1
4.1-stable (still the same exact hardware and configuration)
4.2.beta1
A hunch: What if you disable advanced text server when compiling?
A hunch: What if you disable advanced text server when compiling?
I can't compile :(
v4.3.dev1.official [9d1cbab1c]
I am seeing something similar but unlike @Overvault-64 I don't have a 3.x version to compare with.
Calling Node.Instantiate<Control>()
20x takes a considerable amount of time on Android. The game does not freeze but it can clearly be seen that it takes a couple of seconds for the UI to render.
Running 4.3-dev5.mono
https://github.com/godotengine/godot/assets/11413364/d06f1135-744f-4f9c-8ca6-796480e447a8
In the video, after pressing the button, 20 other buttons will be instantiated (here I left them as simple as possible and instantiate Controls instead, so none of those buttons are actually visible). Notice that the title of the next menu (5x5) takes a couple of seconds to appear.
Still present even in 4.2.2. To mitigate this I have created queues for each instance I need to load a bunch of elements at once, and parse one instantiation per frame.
(scene as PackedScene).instantiate()
and (node as Node).add_child()
(also contain Label nodes) are slow but editor profiler can't catch it.
Keep it in mind.
Adding more information for if it's useful.
On version 4.2.2 (stable) and 4.3 (dev5), using C#, seems to slow down the instance of a PackedScene if it contains either Shader Materials or Particle Process Material with "Resource Local To Scene" toggled on.
The difference in my tests it's ~x1.5 times slower with Shader Materials and ~x10 times slower in the case of Particle Process Materials.
Timings are without adding as child, only instantiating the PackedScene:
Node2D + Sprite2D without any Shader Material Finished instantiating 10000 nodes: res://scene_no_materials.tscn Total time 99 ms. 992842 ticks
Node2D + Sprite2D, with a Shader Material, Resource Local To Scene turned OFF Finished instantiating 10000 nodes: res://scene_with_material.tscn Total time 73 ms. 739936 ticks
Node2D + Sprite2D, with a Shader Material, Resource Local To Scene turned ON Finished instantiating 10000 nodes: res://scene_with_instantiated_material.tscn Total time 133 ms. 1335912 ticks
Node2D + GPUParticles2D, with a ParticleProcessMaterial, Resource Local To Scene turned OFF Finished instantiating 10000 nodes: res://scene_with_particles.tscn Total time 78 ms. 787835 ticks
Node2D + GPUParticles2D, with a ParticleProcessMaterial, Resource Local To Scene turned ON Finished instantiating 10000 nodes: res://scene_with_instantiated_particles.tscn Total time 782 ms. 7829485 ticks
On version 4.2.2 (stable) and 4.3 (dev5), using C#, seems to slow down the instance of a PackedScene if it contains either Shader Materials or Particle Process Material with "Resource Local To Scene" toggled on.
This is unrelated to the issue mentioned here, as the cause is entirely different.
In this situation, a shader needs to be compiled every time the PackedScene is instanced, because the Shader instance is unique. You need to ensure the shader resource is shared across instances somehow. Also, ParticleProcessMaterial needs more time to compile than a bare ShaderMaterial as it's much more complex (it's a premade ShaderMaterial with dozens of uniforms and potentially hundreds of lines of code).
Excessive shader amounts will also slow down drawing because of the high number of state changes/draw calls required.
This is unrelated to the issue mentioned here, as the cause is entirely different.
I see, sorry then for mixing the topic!
And also thanks for the clear explanation, I'll have that in mind and will use it only when it's really really needed.
@Calinou Any updates on this? Do you need help testing/debugging it? I'm not familiar with the Godot development process but can spend some time trying (before the sun comes back 🌞 )
Any updates on this? Do you need help testing/debugging it?
I suggest testing what I mentioned here: https://github.com/godotengine/godot/issues/71182#issuecomment-1483147660
Make sure to compile with release optimizations as well (production=yes
), so that the result is more comparable with official builds, and use MinGW instead of MSVC if targeting Windows (as that's what official binaries use).
Any updates on this? Do you need help testing/debugging it?
I suggest testing what I mentioned here: #71182 (comment)
Make sure to compile with release optimizations as well (
production=yes
), so that the result is more comparable with official builds, and use MinGW instead of MSVC if targeting Windows (as that's what official binaries use).
In my use case there are no labels/texts involved. Simply by instantiating a bunch of scenes with only a Control gives me this significant delay.
Will still try to run from source and take it from there 🚀
I'm in the same boat as OP, I'm making a sudoku with a level sélection screen, each level represented by a button. On Android it takes about 2 seconds to instantiate 1000 buttons. So for now I must limit my game to 3000 levels to keep an acceptable load time.
On Android it takes about 2 seconds to instantiate 1000 buttons. So for now I must limit my game to 3000 levels to keep an acceptable load time.
You should look into pagination, so that you don't need to create thousands of buttons at the same time.
On Android it takes about 2 seconds to instantiate 1000 buttons. So for now I must limit my game to 3000 levels to keep an acceptable load time.
You should look into pagination, so that you don't need to create thousands of buttons at the same time.
That would certainly be a workaround, but it is more code for me and less convenient for the end user to have to click through the pages than just scrolling.
That would certainly be a workaround, but it is more code for me and less convenient for the end user to have to click through the pages than just scrolling.
In this case, you need something like https://github.com/godotengine/godot-proposals/issues/9678. It's just not technically feasible to have thousands of active buttons at once in the SceneTree and expecting it to perform well. Even web browsers struggle when faced with similar situations, despite having decades of optimizations baked in.
It's just not technically feasible to have thousands of active buttons at once in the SceneTree and expecting it to perform well.
Certainly this is not a common scenario and I think it should always be avoided if possible. In fact, when I detected this issue I immediately changed approach, as the performance drop in Godot 4 was noticeably affecting my project. Nonetheless, I thought it was interesting to investigate the reasons behind the gap with Godot 3, so I continued to monitor the situation. Indeed, the issue here is not about 4's absolute performance, but relative to 3.
Godot version
4.0.beta10
System information
Windows 10 and Android
Issue description
In my projects I often need to create UI at runtime, instantiating a lot of elements. This brought to my attention a relevant performance issue in Godot 4: when creating nodes, or instantiating them, it's about 4 times slower than Godot 3.
I've repeated the tests several times with my test projects making both versions instantiate a scene (a button containing 3 child nodes) different amount of times, from 1000 to 8000, and the result goes from more than 4 times to almost 4 times slower.
Godot 4 crashed when instantiating more than 8000 of those scenes, while Godot 3 handled more than 10000 comfortably. My hardware was below 30% workload all the time.
I've also tried creating the button at runtime and it makes no significant difference compared to instantiating a pre-made scene.
This wouldn't be a real-world problem for most developers (except me, I guess) if it wasn't way amplified on mobile. My mobile test device required 6-7 times more time to complete the task on both versions, leading to an unhealthy 21 seconds with 8000 scenes on the Godot 4 build. The Godot 4/Godot 3 lag ratio on mobile is quite the same as on pc.
Minimal projects attached
Steps to reproduce
Start the projects
Minimal reproduction project
node.test.zip