Open Ughuuu opened 2 months ago
As a means of providing data for this, I am benchmarking using Instruments app on mac to get the 2 largest times, step
and flush_queries
. I will focus on 3D for this.
step function
21%flush_queries
6%step function
10%flush_queries
10%am running without dev as there is some issue with latest jolt and latest godot.
step function
17%flush_queries
37%Note this is written with godot-rust so the bottleneck of the callable and everything is even higher.
I know it doesn't show exactly 1 to 1 comparison of what could be implemented, but I did in Rapier a change where I disabled the active callback: RapierPhysicsServer3D.body_set_state_sync_callback(body.get_rid(), Callable())
and am easily able to get 10k bodies simulated.
I also made function that returns transforms of bodies and active bodies, and it's still fast. I know it's not the same as it does not update the bodies, but I'm sure some improvement would be gained for other physics engines too.
Another note, simulating Rapier locally (without godot) yields about 10-12k shapes. So the simulation itself should be a lot higher than 5k. The slowness comes from godot calling every callable 1 by 1.
I know some might think, oh but isn't this enough performance already? No.
When running godot on the browser, divide all these numbers by 5 or so. So if on desktop you get 3000 objects in godot physics, on the browser you get maybe 500 objects. So on the browser if you want to get large numbers, you need to improve performance by a lot. Also, I am running locally on a macbook m2 pro. So I will get large numbers. But old android devices might struggle.
The problem is not the callback, the problem is that the whole scene system is not optimal to work like this. If you have so many objects use the servers directly and bypass the scene.
I still think it could be a good idea to profile what is going on at the scene side, though, there may be some silly bottlenecks there, but I think whathever it is, a single C function pointer callback should never have this impact.
It's also worth keeping in mind that if eliminating the Callable
overhead is the goal here, then you also need to consider the overhead of interacting with Godot collections like Packed*Array
from GDExtension, since pretty much every interaction with them (including append
and operator[]
) will have to cross the extension boundary (i.e. invoke a function pointer). I'm not sure how godot-rust implements Callable
though, so it might still make a sizeable difference there, but for godot-cpp I wouldn't expect to see as big of a difference between the two approaches, assuming you're still fetching the same state.
iterate through all RID's and match then with actual object [...] For this to work, there needs to be a place where we would store a reference to these nodes
As talked about on Rocket Chat, the way you'd do this right now is to ask the physics server for the body's instance ID (ObjectID
) and then fetch the corresponding node with that using ObjectDB::get_instance
, which will involve some non-trivial overhead as well.
On top of that you need to somehow filter out the nodes that still need to have their _body_state_changed
method invoked, since you still have things like _integrate_forces
to invoke and contact signals to emit, for every active RigidBody*D
. On the 3D side you also have PhysicalBone3D
which similarly needs to be filtered out and have its specific _body_state_changed
method invoked. The naive approach to this would be to use Object::cast_to
I guess, but that involves dynamic_cast
, which (last I checked) is quite slow.
I still think it could be a good idea to profile what is going on at the scene side, though, there may be some silly bottlenecks there, but I think whathever it is, a single C function pointer callback should never have this impact.
I see, interesting approach. Will check this also then and see where the bottleneck comes from.
It's also worth keeping in mind that if eliminating the
Callable
overhead is the goal here, then you also need to consider the overhead of interacting with Godot collections likePacked*Array
from GDExtension, since pretty much every interaction with them (includingappend
andoperator[]
) will have to cross the extension boundary (i.e. invoke a function pointer). I'm not sure how godot-rust implementsCallable
though, so it might still make a sizeable difference there, but for godot-cpp I wouldn't expect to see as big of a difference between the two approaches, assuming you're still fetching the same state.
That's a good point, maybe the overhead in rust comes from the way the gdextension works: flush_queries -> godot_rust(callables) -> godot -> godot_rust (get_data)
For these things, godot_rust needs to bind mutable self, and there are some things that add up. While if there was just: flush_queries -> godot_rust (get_data) -> do stuff
Would be better in that regard.
As talked about on Rocket Chat, the way you'd do this right now is to ask the physics server for the body's instance ID (
ObjectID
) and then fetch the corresponding node with that usingObjectDB::get_instance
, which will involve some non-trivial overhead as well.On top of that you need to somehow filter out the nodes that still need to have their
_body_state_changed
method invoked, since you still have things like_integrate_forces
to invoke and contact signals to emit, for every activeRigidBody*D
. On the 3D side you also havePhysicalBone3D
which similarly needs to be filtered out and have its specific_body_state_changed
method invoked. The naive approach to this would be to useObject::cast_to
I guess, but that involvesdynamic_cast
, which (last I checked) is quite slow.
I think I would still leave that integrate_forces function, I would just change active_body callback.
Not 100% sure about it, but at least on the callstack (for godot-rapier) I do see some dynamic_casts
happening in the investigation anyway.
For these things, godot_rust needs to bind mutable self, and there are some things that add up. While if there was just: flush_queries -> godot_rust (get_data) -> do stuff
Less work is better, of course, but given the Object
-agnostic design of the server architecture in Godot I feel like there's always going to be one or two relatively expensive steps added before "do stuff" there.
I think I would still leave that integrate_forces function, I would just change active_body callback.
Right, but _integrate_forces
(and the contact signals) are invoked as part of that callback, as seen here. So I would imagine your options are then to either keep invoking the callback from the physics server (somewhat defeating the point of all of this) or to have Godot invoke _body_state_changed
(and thus _integrate_forces
and the signals) directly on the bodies it gets from this new space_get_active_bodies
.
That last option would presumably involve some kind of filtering/downcasting, since you'll have bodies of type BODY_MODE_RIGID
, BODY_MODE_RIGID_LINEAR
and BODY_MODE_KINEMATIC
in that list. Not only that, some of the bodies could have been added directly through the physics server by the end-user, and their respective instance IDs might reference script-defined classes, or might not even be ObjectID
s at all, and instead just be some index into an array somewhere.
I do see some
dynamic_casts
happening in the investigation anyway
Object::cast_to
is very prevalent in Godot.
Describe the project you are working on
Godot Rapier Physics Addon.
Describe the problem or limitation you are having in your project
The 2 functions that take the most time are:
step
flush_queries
Notably the
flush_queries
one takes about as much as the step (for large enough number of objects). This is not right. The flow of the calls forflush_queries
are as follows:body_set_state_sync_callback
. This sets a callback to be called when this body is active. Then Godot calls back into the body to get some information, such as velocities, positions, sleep state. This happens at Godot side in the_sync_body_state
function. This function is internalDescribe the feature / enhancement and how it helps to overcome the problem or limitation
I would propose the flush_queries would not flush the queries, but instead a different approach is taken.
Right now flush_queries calls the callbacks, and then it's the nodes job to get what info they need, decoupling things a lot. This is great for architecture design. However for performance it's not as efficient as it could be.
I propose we do this update in a larger batch, rather than one by one.
Describe how your proposal will work, with code, pseudo-code, mock-ups, and/or diagrams
I propose we have a function
PhysicsServer::space_get_active_bodies(space_rid) -> Array[RID]
. This function returns the RID of all active objects.Then, we have another function,
PhysicsServer::space_get_bodies_transforms(space_rid, array_bodies_rid) -> Array[Transform]
. This function returns an array of transforms for the bodies.Then, we have another function,
PhysicsServer::space_get_bodies_linear_velocities(space_rid, array_bodies_rid) -> PackedVectorArray
. This function returns an array of linear velocities for the bodies.Same for angular velocity and is_sleeping.
The idea here would be we:
_sync_body_state
function to instead of getting pointer to body_state to get the actual data it needs).For this to work, there needs to be a place where we would store a reference to these nodes as to be able to iterate them quickly (eg. right now the flush_queries is called from main.cpp file. We would need to then make another call to some other service to send these info there(eg. get the nodes and update them)). Thile this might be a bit comlicated to implement, the benefits would be quite big for performance gains.
If this enhancement will not be used often, can it be worked around with a few lines of script?
A workaround for a person extending the physics server would be to offer such a function to get active bodies and a function to get the body state(eg. transform, velocity, etc). and then have them update the nodes manually. This is tricky as right now there is no way to disable node updates from outside afaik.
Eg. in 2d to do:
Basically update the rigidbody or other such nodes properties without also propagating a physics update event. Another workaround woudl be to disable the physics server and put it in some mode for these updates, but all of this would be complicated for a workaround.
Is there a reason why this should be core and not an add-on in the asset library?
Very hard to do in an addon, needs to be core.