godotengine / godot

Godot Engine – Multi-platform 2D and 3D game engine
https://godotengine.org
MIT License
91.21k stars 21.22k forks source link

[Bullet] Setting 3D body transform after `add_child` makes it very slow in procedural generation use cases #45598

Open Zylann opened 3 years ago

Zylann commented 3 years ago

Godot 3.2.4 279f4fbb5f83c2df6e8499799a2aa0c14f85aa2d Windows 10 64 bits Bullet

Issue description: After someone on Discord had an issue generating a small forest of trees, I investigated the reason in depth. This is more complicated than it sounds, so take some time to read the full story ;)

In a test project, I created a tree scene. It is a StaticBody, with a sphere as MeshInstance and a CollisionShape with a CapsuleShape.

The issue begins in the main scene. The following script takes 15 seconds:

const TreeScene = preload("res://tree.tscn")

func _ready():
    var before = OS.get_ticks_msec()
    for i in range(3000):
        var x = rand_range(-100, 100)
        var z = rand_range(-100, 100)
        var tree = TreeScene.instance()
        add_child(tree)
        tree.transform = Transform(Basis(), Vector3(x, 0, z))

In a web build it took almost a minute.

However, this modified version of _ready runs in less than one second:

func _ready():
    var before = OS.get_ticks_msec()
    for i in range(3000):
        var x = rand_range(-100, 100)
        var z = rand_range(-100, 100)
        var tree = TreeScene.instance()
        tree.transform = Transform(Basis(), Vector3(x, 0, z)) # <-- Moved up
        add_child(tree)

Setting the transform before add_child fixes it.

But there is more. I didn't stop there, I went profiling to find exactly what is the cause of the slowdown. And it's not what I thought it was:

This is a profiling scope stack of add_child, down to the slow part: image

So... if this all happens in add_child... why setting the transform after add_child makes it go bad? Why is it not slow all the time?

It is important to know that when transform is assigned on a CollisionObject with this binding of Bullet, the transform will not be applied immediately. It will be applied a frame (or two?) later, during physics process. That means in the first code snippet, all of our trees will end up in the same spot in the Bullet physics world. This situation is very bad, because all bodies will overlap and create pairs. So when Godot's implementation removes the bodies, it will be very slow for Bullet to unpair them.

But if we set the transform before, the collision object will be created in PhysicsServerBullet with a non-zero position straight away. They will spawn in different location and won't overlap each other. PhysicsServerBullet will STILL do this strange thing of removing and re-adding the body in add_child, but it won't hit the same performance bottleneck as the first snippet, so it will go by unnoticed.

Conclusion

I think the Bullet implementation in Godot is perfectible. It does not happen with GodotPhysics, and I think it's more due to the Bullet integration rather than Bullet itself. The fact it does the things mentioned above is questionnable, but I'm not expert in this area of the codebase.

More importantly, the original use case is quite common. It often occurs that you want to spawn a bunch of scenes procedurally in your game, and they often have collisions. It's very easy for beginners to hit this problem, without noticing that a simple order of assignment can change things dramatically.

Assigning properties before add_child is not intuitive to everyone, but it's actually important in other areas of the engine, like controls. It's also better known to experienced programmers that it's best to configure first, and then add. So we thought it should be emphasized in the doc... Oops? As you can see on this link, this is one place we could have added that explanation, but it teaches things the "wrong" way around, after add_child. There might be other places.

Anyways, all this to raise this "apparently simple" issue. It's got several layers so I thought I would raise it.

Minimal reproduction project: InstanceManyNodes.zip

Tracy capture: 3000_nodes_added.zip

lawnjelly commented 3 years ago

@pouleyKetchoupp noticed this same thing in the godot physics use of the BVH a couple of weeks ago, and fixed with this PR: #45319

The problem with adding all these nodes on top of each other at the origin before setting the transform is that they all use the same space in the tree, they all collide and potentially send callbacks, then uncollide when you finally set them to the start position. This is far more efficient to do in one step.

pouleyKetchoupp commented 3 years ago

I agree the documentation and tutorials should generally teach users to set parameters before adding objects to physics. I've just opened #45638 to address that part separately.

Concerning the issue in the Bullet physics server: It might be improved (if not fixed) by #42643, so it would be interesting to test this specific use case. CC @AndreaCatania

lawnjelly commented 3 years ago

Also note that the inefficiency of adding objects at the origin then translating occurs in the rendering too, it is probably just less pronounced than with physics.

We could consider having a generic WARN_PRINT_ONCE in the editor if it detects adding a boatload of visual instances at the origin, that might catch both cases? This could be debug / tools only.