appsinacup / godot-rapier-physics

Godot Rapier Physics – A 2D and 3D drop-in replacement for the Godot engine that adds stability and fluids.
https://godot.rapier.rs
MIT License
312 stars 16 forks source link

Performance slower than it used to be in c++ version #139

Open Ughuuu opened 2 months ago

Ughuuu commented 2 months ago

I started to investigate out why the performance is so slow. On Rust version I am getting 9k circles, on new one I am getting 6k circles. So about 33% slower.

I am using Instruments app on mac which gives very good insight into what takes a lot of time from app and what functions take a lot of time. However as the result of this app is not exportable, I will either write by hand here by copy pasting or make screenshots. The following is a 5s overview of what calls take the most time from a scene where a lot of circles are created and hit the ground:

Initial Overview

image

We have here the main thread with 84% usage, 4.69s out of 5.55s. The other threads:

seem to be Rayon calling into Rapier, and there isn't much that can be done there. In these threads there is also a callback from our lib where we filter objects, however as the percentage isn't that big, our investigation won't focus on this.

image

That being said, we are left with the Main Thread, so lets go into that.

Note that going forward, percentages are based on the thing we are focused on. Here it's going to be based on Main Thread duration, 4.66s.

Main Thread (4.66s)

The main thread is then split into 4 parts:

image
  1. 43.8% (2.06s) call to RapierPhysicsServer2D. (flush queries)
  2. 37.5% (1.76s) call to RenderingServerDefault::_draw. This is godot code. In here we have most notably vulkan functions, and nothing much to do with us.
  3. 13.6% (640ms) call to RapierPhysicsServer2D. (step)
  4. 3.6% (173ms) call to SceneTree::physics_process. This is godot code. It seems to handle updating of objects to be drawn.

Flow 2 and 4 don't seem to call back into our lib. So we focus on 1 and 4.

Main Thread -> Flow 3(640ms, 13.6%) - Step

The purpose of this function is to simulate the whole physics world with a step interval. Handles collisions and everything.

image

Same as before, gdext hides away the function, but our specific class calls into step function, so it's most likely that.

Here going down into the step function, it splits into:

  1. 73.1% call into rapier lib. Here there isn't much we can do about it, aside for a callback down the line image

with a small percentage is in our code as well, handle_contact_force :

image

Not much to be said about this, and even if we do take this out of the equation and re-run everything, we don't see a big change in the time.

Also, seeing as the main physics process part, the Step function only takes this little, and the rest alone is just the Flush Queries part, means probably the bottle neck is there.

Main Thread -> Flow 1 (2.05s, 43.8%) - Flush Queries

The purpose of this is to call into the Callables godot offer to some physics objects. These objects call these in case of collision events or active change events, etc.

image

Notice that we have a call into RapierPhysicsServer2D, but it doesn't say specifically where. This is because of how gdext works, it seems to hide away the call. But we do see later down the call it calls into our implementation of flush_queries, so it's coming from the PhysicsServer2D flush_queries method. This method is mainly going to be calling back into godot by calling Callables (that it previously handed down to physics objects) that had collision events or just are active.

image

If we open this flow, we can see that we have:

  1. 69.4% (1.43s) Callable::callv function. We will go into this a bit later, as the other ones are small.
  2. 6.4% (133ms) godot_core::meta::class_name::ClassName::to_string_name
image
  1. 5.5% (108ms) godot_core::builtin::callable::Callable::bindv - quite expensive for just binding the arguments it seems.
image
  1. 3.6% (76ms) godot_core::builtin::callable::Callable::is_valid. I like how here it uses os_unfair_lock. Haha.

    image
  2. 2% (54ms) godot_core::gen::builtin_classes::array::InnerArray::push_back

image

Then the rest seem to be like this, keep getting smaller and smaller:

image

Main -> Flow1 -> Callable::callv (1.43s)

image

Going inside the Callable, it seems godot calls back into the library 5 times (from RigidBody2D::_sync_body_state):

  1. 352ms (27.8%) Here we see a split into drop_in_place and RawGd::bind :
image
  1. 323ms (25.5%) Here most of it is from drop_in_place and RawGd::bind:
image
  1. 199ms (15.7%) Same as previous call, drop_in_place and RawGd::bind
image
  1. 168ms (13.3%) Same
image
  1. 164ms (12.9%) Same image
Ughuuu commented 2 months ago

The following commit was mode to reduce the calls to core::fmt::write https://github.com/Ughuuu/gdext/commit/934ea4ce8f7cfe9bcecb8de8c278a1efb62ac294

This improved a lot performance. There is still bottleneck in the godot_cell::borrow_state::BorrowState::increment_shared function. Other small things that can be improved:

Ughuuu commented 1 month ago

Another interesting issue to look at is this: https://github.com/godotengine/godot-proposals/issues/10389