godotengine / godot

Godot Engine – Multi-platform 2D and 3D game engine
https://godotengine.org
MIT License
89.71k stars 20.83k forks source link

Terrible performance with more than a few CharacterBody3D's moving around in a production level. #93184

Open jitspoe opened 3 months ago

jitspoe commented 3 months ago

Tested versions

4.0 - 4.2.2 stable, with and without Jolt addon.

System information

Windows 10, Vulkan forward +, Nvidia 3070

Issue description

When using a production level of detail level, and more than around 50 characters moving, the framerate absolutely tanks, like sub-15 FPS. I did some custom profiling with tracy, and most of the performance hit was from the movement code called from both move_and_slide and move_and_collide. When the performance goes below a certain threshold, it starts doing multiple physics updates per frame to make up for it, which then makes the performance worse, up to 8 physics updates per frame, which is the max, then things run at like 10fps in slow motion.

It's possible the issue is that I'm importing everything under one StaticBody3D. Perhaps each shape should be its own static body for the broadphase tree optimization (or whatever this physics engine uses)? Edit: I modified the importer to make a unique StaticBody3D for each collision shape, and that didn't improve performance. Could be that everything is centered at 0, 0, 0, though, and centering each StaticBody3D at the middle of each shape would improve things? Not sure how the tree is implemented (if there even is one, which might be why the perf is so bad if there isn't one).

Steps to reproduce

Load a reasonably complex level with a lot of convex collision shapes. I'm importing one using the BSP importer I made. Throw in around 60+ CharacterBody3Ds and have them move around. Note the framrate is awful.

Minimal reproduction project (MRP)

EDIT: Here's a new project, since the old one caused a lot of confusion: https://github.com/user-attachments/files/16141717/test_character_body_shape_cast.zip

Old project (performance collapses due to multiple physics updates per frame): test_character_body_perf.zip

Attached is the example project. I have a few different things you can try. One is my complex level (still Quake level of detail, as it was originally a Quake map). The other is a simpler one made with boxes (can be swapped out on the main_scene.tscn. Also, you can set USE_MOVE_AND_SLIDE to true in the character_body_3d script to test simpler movement which performs a bit better. I also included what I was using to do step moving (default). This uses more moves and thus has worse performance.

jmarceno commented 3 months ago

I too have been experiencing similar problems in my project. FPS tanks to sub 15 or even single digits when I have around 10 or 15 enemies navigating. After a lot of trial and error I found that my issue is mainly on the physics side too.

Testing your MRP, I got something similar to what I´ve been seeing on mine (v-sync off): My config: 3700x / 3060 12Gb / 32GB RAM

Main Scene Default Physics with move and slide FALSE: ~12 FPS - GPU at ~6% Default Physics with move and slide TRUE : ~19 to 109 FPS (VERY inconsistent and all over the place) - GPU at ~25%

Jolt with move and slide FALSE: ~420 FPS - GPU at ~37% Jolt with move and slide TRUE : ~800 FPS - GPU at ~89%

Level Boxes All tests above 1000 FPS and 97%+ GPU utilization, so it is hard to really say what is happening, but I'm glad to do more tests.

As it stands, a full game, even with Jolt, will have a hard time keeping a good framerate, as we are already consuming more than half of the CPU budget. Don't know if those results where in line with what you are seeing, but considering this level has a simple geometry, and nothing more is happening, I would expect to be GPU limited on every single scenario.

Another thing that I noted is that my CPU apparently never got hit that hard? Even when I'm clearly CPU bound (Uninformed guess here as I have little to no exp testing physics engines)

_Proc snapshot while running with Default Physics and move_and_slide FALSE_ image (the test ran for more than 60s, so it spans the whole graph)

huwpascoe commented 3 months ago

Switch to jolt because the built-in physics is just... don't use it.

With jolt enabled, went from 15fps to 100~fps, a good start.

Next, rather than a static body with hundreds of convex shapes, this is static level geometry so make it polygon soup. Didn't feel like messing with the importer so used the debugging method Mesh.create_trimesh_collision() on the non-transparent visual data, which created:

1x StaticBody3D 1x ConcaveCollisionShape3D

Now it runs at 700fps and hundreds of nodes eliminated.

Calinou commented 3 months ago

Static level geometry should generally always use concave collision shapes (trimesh), not convex. See https://github.com/godotengine/godot/issues/59738.

jitspoe commented 3 months ago

Jolt is definitely faster, but still unusable in my actual project (getting 10-15 FPS with all enemies enabled). I've also tried using triangle collision (Very simple to test -- just modify the bsp_reader.gd and set USE_TRIANGLE_COLLISION to true), but still got very poor performance. Curious how it improved things for you so much.

Also, convex collision shapes are typically much faster in physics engines, so it's kind of crazy if trimesh collision is faster. It almost seems like some early broadphase exclusion algorithm is missing or not functioning properly.

AThousandShips commented 3 months ago

Having a single trimesh instead of many convex is naturally going to be faster, especially slowing down the culling which while improving things will slow down if you have a ton of shapes packed together, the step to cull might be the bottleneck here if you have very many shapes in a small space

For static bodies, given reasonably large size, I'd say, in increasing order of performance:

Naturally breaking things up into reasonably large sections is the best

Edit: remember especially that the more shapes you have the worse the performance with many bodies, each body has to process all of those shapes, so it grows steeper with many bodies and many shapes in the statics

hakro commented 3 months ago

While testing OP's MRP, I found out that deleting CharacterBodies' from the LevelSewer1 is also very, very slow. Each one takes about 10 seconds to get deleted. And the editor freezes during that time.

I thought about creating a separate issue, but wanted to mention that here first, in case it would be a symptom related to OP's FPS drop.

My specs: Godot v4.3.beta1 - TUXEDO OS 3 22.04.4 - Wayland - Vulkan (Forward+) - integrated Intel(R) Graphics (RPL-P) () - 13th Gen Intel(R) Core(TM) i7-13700H (20 Threads)

huwpascoe commented 3 months ago

just modify the bsp_reader.gd and set USE_TRIANGLE_COLLISION to true), but still got very poor performance.

It should output one concave shape attached to one static object. Best create a visual mesh to debug it's working correctly.

Also, convex collision shapes are typically much faster in physics engines, so it's kind of crazy if trimesh collision is faster.

As long as it doesn't move, tri-mesh can be completely optimized. A good physics engine will generate a structure for fast lookup when given a large static mesh. Essentially an improved version of that BSP format for modern hardware.

Janders1800 commented 3 months ago

I don't know how Godot optimizes collisions internally, so take this with a grain of salt, but I've taken a look at the level structure and I believe each character is testing against 7k+ collisions shapes, since all the level's collisions are inside a single staticbodie.

Maybe changing the plugin so it builds a staticbodie per collision shape would make the physics discard collisions. My understanding is that the physics engine is doing something like this; ok this characterbody is colliding with this staticbodie, let me get the collision shape.... (finds 7k+ shapes) sweet baby Jesus on a bicycle!

jitspoe commented 3 months ago

I tried modifying the BSP importer to make a separate StaticBody3D for every collision shape, but that did not improve performance. It could be because everything was centered at 0, 0, 0.

Definitely seems like there should be some sort of tree or something to early-out most of the shapes, but it either doesn't exist, or it's not setting the bounds correctly for convex hulls.

jitspoe commented 3 months ago

Just double checked with triangle collisions, and I'm still getting sub 15 FPS:

image

Here's the project with triangle collision if you don't want to mess with changing the importer consts. I also added another importer const: SINGLE_STATIC_BODY. If set to false, it will create a unique static body for every convex shape. I haven't yet tried to center the static bodies within the shape, so everything is at 0, 0, 0, as mentioned previously, though considering the perf is about the same with triangles, I think something might be bugged with the tree or whatever is used to cull out things that aren't nearby.

test_character_body_perf_tri_collision.zip

jitspoe commented 3 months ago

So I did some more testing and spawned 30,000 cubes in the other test map. Perf was pretty decent. Then I changed the collision to a convex collision shape and perf tanked, so general convex shapes are a LOT slower than boxes (even if they're also just a box).

jitspoe commented 3 months ago

So here is some more detailed profiling I did with Tracy.

image

As I mentioned before, once you drop below a certain framerate, you start doing 8 physics updates per frame, which further tanks the framerate if the physics is the bottleneck.

Zooming in a bit, we see that the move_and_collide is taking around 30-40 microseconds:

image

A huge chunk of that is in CheckIfStuck:

image

About 1/3 of that is for culling the AABB, which gets called multiple times. One potential optimization would be to just cull once for the entire move with a little extra epsilon added to account for unstuck movement, etc.:

image

Not sure if the same optimization would apply for Jolt.

After the stuck check, there's the actual attempt motion, which has a lot of solve_distance calls and then the recovered check. Not sure how much that can be optimized:

image

belzecue commented 3 months ago

@jitspoe Can I ask why the project is running 160 physics tick?

huwpascoe commented 3 months ago

I haven't yet tried to center the static bodies within the shape, so everything is at 0, 0, 0,

var a := AABB(points[0])
for p in points:
  a.extend(p)
var center := a.get_center()
for i in range(points.size()):
  points[i] -= center
AThousandShips commented 3 months ago

Then I changed the collision to a convex collision shape and perf tanked, so general convex shapes are a LOT slower than boxes (even if they're also just a box).

This is to be expected, except areas for optimization that might be missed a primitive shape will always be more performant, it does a lot of simplifications in the equations and the convex shape doesn't know it's a cube

jitspoe commented 3 months ago

@jitspoe Can I ask why the project is running 160 physics tick?

I want my games to be fast and responsive on high refresh rate monitors, so I want to guarantee a physics update every frame for 144-150hz monitors. Sadly, I might have to make some sacrifices if it's not possible to improve the physics performance. Perhaps there's a way to do higher update rates for the player only? Some games like sim racing games run the physics at like 1000hz, so I don't think 160 is unreasonable.

I haven't yet tried to center the static bodies within the shape, so everything is at 0, 0, 0,

var a := AABB(points[0])
for p in points:
  a.extend(p)
var center := a.get_center()
for i in range(points.size()):
  points[i] -= center

Thanks! I might give this a try later, but I already experimented having box shapes centered vs. 0,0,0, and that didn't seem to make a difference.

Then I changed the collision to a convex collision shape and perf tanked, so general convex shapes are a LOT slower than boxes (even if they're also just a box).

This is to be expected, except areas for optimization that might be missed a primitive shape will always be more performant, it does a lot of simplifications in the equations and the convex shape doesn't know it's a cube

The actual perf difference isn't as bad as I thought. The overall physics update goes from ~5ms to ~7ms, but with the domino effect of multiple updates per frame, that causes things to go from ~500fps to 20fps in practice.

With the boxes, the cull aabb is about the same (as I would hope) but the solve_static is faster: image

Zireael07 commented 3 months ago

@jitspoe Those racing games that run physics at 1000Hz have totally customized physics

(Also those ridiculous Hz do not protect from occasional hilarious physics bugs)

Calinou commented 3 months ago

For imported levels with trimesh collision, https://github.com/godotengine/godot/pull/82649 should help improve performance. Convex collision performance should be imporved by https://github.com/godotengine/godot/pull/63702.

I want my games to be fast and responsive on high refresh rate monitors, so I want to guarantee a physics update every frame for 144-150hz monitors. Sadly, I might have to make some sacrifices if it's not possible to improve the physics performance. Perhaps there's a way to do higher update rates for the player only? Some games like sim racing games run the physics at like 1000hz, so I don't think 160 is unreasonable.

If you have physics interpolation (which I strongly recommend to handle framerate variations in general), players can be using any refresh rate and the game will look smooth.

While there's a definitive advantage from bumping the default physics tick rate from 60 Hz to 120 Hz, there isn't much benefit to going above 120 Hz physics in terms of input lag. Going from 120 Hz to 160 Hz only reduces the physics step time by 2.1 ms, while going from 60 Hz to 120 Hz reduces it by 8.3 ms. This is particularly the case if your game's movement is floaty (e.g. slow acceleration/friction), in which case noticing input lag from the physics step is harder.

but with the domino effect of multiple updates per frame, that causes things to go from ~500fps to 20fps in practice.

If you want to reduce the "spiral of death" effect of multiple physics steps per frame, reduce Max Physics Steps per Frame in the Project Settings to a lower value. This will cause slowdown when the game can't keep up though.

jitspoe commented 3 months ago

The down side to interpolation is then you're putting players with a higher framerate at a delay. So if you have 60 hz physics running at 60fps, you'll have stuff respond immediately. If you're running at a higher framerate, you'll be setting the interpolation target that frame and interpolating over the proceeding frames, so it'll actually be LESS responsive. That said, it doesn't seem like the interpolation option even exists in Godot 4.

And even if it did, I tried setting the physics update rate as low as 20 and I'm still getting sub-20 FPS in my actual project. Something is seriously wrong here. I thought it was just the death spiral of 8 physics updates per frame, but I dropped that limit down to 3, and I'm still getting terrible framerates. The physics process alone is taking over 16ms, which means it's impossible to hit 60fps even with 1 physics update per frame.

image

Each test_body_motion is taking 100-200 microseconds, x3 average for each body (to do ground/step checks and whatnot). If you have 100 enemies, that's 30-60 ms.

If we could somehow get this more in check, I'd be happy to run other things at a lower update rate if I could run the player at a higher tick rate. Is there a way to do that?

jitspoe commented 3 months ago

@mihe Any chance you could take a look at this on the Jolt side and see if there's any low hanging fruit to fix performance wise? Jolt performance is better, but still too low to update more than a few things every frame.

clayjohn commented 3 months ago

I think we need to re-center this discussion. There are a number of open questions that this report has raised. I think it would be worth trying to answer the questions separately.

  1. Is 50 CharacterBody3Ds too many? Is there a better approach to having many enemies with physics?
  2. What is the impact of high fixed FPS physics updates? i.e. should this be kept to a low number?
  3. Is there an inherent limitation to convex shapes that makes them so slow, or can they be optimized?
  4. Can the situation be improved by separating ConvexShapes into their own static bodies (not centered at 0,0,0)?

In addition to the open questions there are some clear insights from this:

  1. Physics is creating a bottleneck on the main thread. Perhaps there is room for multithreading
  2. Jolt is significantly faster than Godot physics, we have a lot of room to improve performance in Godot physics
jitspoe commented 3 months ago

I think we need to re-center this discussion. There are a number of open questions that this report has raised. I think it would be worth trying to answer the questions separately.

I can address several of these based on the research I've been doing.

1. Is 50 CharacterBody3Ds too many? Is there a better approach to having many enemies with physics?

I feel like if the engine can't handle 50+ active enemies, that's a huge detriment. That's something games have been doing since the 90's. Games like DOOM would bombard players with massive hordes of enemies. It should be able to handle hundreds if not thousands. Also, I haven't even started with ragdolls, which will add a significant number of additional capsules per enemy.

2. What is the impact of high fixed FPS physics updates? i.e. should this be kept to a low number?

This could be a different discussion, especially to open the possibility of multiple physics updates at different rates. That said, in my actual project, the physics is taking more than 16

3. Is there an inherent limitation to convex shapes that makes them so slow, or can they be optimized?

The convex shapes aren't THAT much slower than primitives (Initially, it looked like an extreme difference because of the spiral of death causing 8x physics updates per frame, but in reality there's maybe a 20-30% improvement to the solving calls which is probably < 10% total difference overall).

4. Can the situation be improved by separating ConvexShapes into their own static bodies (not centered at 0,0,0)?

No. I've tested several different combinations of centering convex shapes, centering static bodies and having no offset of the convex shape, etc. I haven't found anything that provides a notable improvement.

In addition to the open questions there are some clear insights from this:

1. Physics is creating a bottleneck on the main thread. Perhaps there is room for multithreading

Possibly, but Quake did 3D physics single threaded back in the 90's and achieved great performance. Multithreading gets very difficult to debug, so I'd prefer to avoid it where possible. 😅

Also, if the physics update takes more than 16ms, doing it in its own thread is still insufficient, and trying to handle multiple bodies all moving in their own threads that could interact with each other is ... 😬

2. Jolt is significantly faster than Godot physics, we have a lot of room to improve performance in Godot physics

Sadly, even Jolt is not fast enough for my use case. I thought maybe there was some fundamental issue with the way the Godot physics server was set up or something that impacted both Godot and Jolt physics, but I haven't found any evidence of this so far, so I'm unsure what the next steps are.

I thought I had a course of action here based on what I found with the MRP, but my actual project has much worse performance for some reason.

Even if I drop the physics update rate to 60hz and limit to 3 physics updates per frame and even if I replace all the convex shapes with box shapes, I'm still getting sub 20 FPS with 100 enemies.

Zireael07 commented 3 months ago

Doom or Quake had very simple physics, limited to very simple shapes.

But yeah a modern game engine should be able to handle 50 physics bodies with ease

huwpascoe commented 3 months ago

I tested both with and without USE_TRIANGLE_COLLISION, with respective tick rates and for fun, duplicated LevelSewer1 side by side to run twice the amount of level and characters.

USE_TRIANGLE_COLLISION FALSE TRUE
Godot - 60hz 200-300 fps 235-390 fps
Godot - 160hz 10 fps 12 fps
2x Godot - 60hz 4 fps 4 fps
2x Godot - 160hz 6 fps 8 fps
Jolt - 60hz 800~ fps 900~ fps
Jolt - 160hz 400-600 fps 750~ fps
2x Jolt - 60hz 500~ fps 580~ fps
2x Jolt - 160hz 15 fps 300 => 20 fps
Editor/Import performance very slow normal

Remember that nodes aren't free, each one has to be parsed, reference counted, mapped to a corresponding physics body, and all for a static structure that's never touched at edit-time. It slows the editor down horrendously. Convex data is also intended to be reused, and since every mesh gets it's own convex data with this import, it's entirely redundant. There's no advantage to using separate convex meshes over triangles for this case. None.

Interestingly, Jolt finally did fail on the 2x 160hz test with a gradual cascade.

Zireael07 commented 3 months ago

How does the "nodes aren't free" comment compare to e.g. Unity or Unreal that can run the equivalent ootb without performance problems?

mihe commented 3 months ago

Just to shed some light on why body_test_motion (and move_and_collide/move_and_slide as a result) is taking so long, you can essentially think of it as doing the following:

(Mapping these collision checks to collide_shape isn't entirely true for Godot Physics, since it uses solve_distance for the cast as opposed to solve_static, but whatever. You also have the AABB culling on top of this as mentioned above.)

Godot Jolt differs a bit here as well. First of all I vary the amount of collision checks for the cast based on the distance, mainly to improve precision, as opposed to using a fixed number of checks like Godot Physics, so for Godot Jolt that cast is more like 5-17 collide_shape calls. Second, due to a more or less unfixable regression move_and_slide will with Godot Jolt often run the floor-snapping needlessly, which results in yet another call to body_test_motion, which effectively doubles the amount of collision checks listed above. So you can very easily end up doing an average of 20 collision checks per move_and_slide call, and a lot more than that if there are multiple slides happening, or if moving at greater velocities.

The thing that sticks out in that list is of course the cast part. Juan put up a PR to address this a while back (#70522) that replaces the (somewhat odd) binary search that Godot does (and forces upon every physics implementation) for its shape-casting with a more traditional sweep test. However, I know for a fact that this won't work with Jolt without removing the safe/unsafe fractions from PhysicsTestMotionResult3D, since you can't guarantee that the returned fraction will be safe nor unsafe, which is a fairly substantial breaking change in my opinion. From what I understand from other people having looked at this, this holds true for that Godot Physics PR as well.

My personal stance on move_and_slide is largely that it might be suitable for something like a main character, but should ideally not be used for anything else, unless you have plenty of performance headroom. I would try to reach for simpler physics queries and stuff like navigation meshes for something like an NPC.

Lastly, since I keep seeing early id Tech-derived stuff getting brought up everywhere, keep in mind that those character controllers relied entirely on AABB checks against the BSP tree, as opposed to more general/arbitrary collision checks. I would love to see an AABB check added to PhysicsDirectSpaceState3D, along with a proper sweep test, but I struggle to see move_and_slide being able to utilize any of it while preserving backwards-compatibility.

(I can't comment on the performance impact of structuring the level in the way that's shown in the MRP, but compound shapes aren't exactly free either, even if I do use the more optimized immutable StaticCompoundShape found in Jolt. I am curious about what exactly is taking time when running with Jolt though, so I might do some profiling later on.)

Zireael07 commented 3 months ago

I would try to reach for simpler physics queries and stuff like navigation meshes for something like an NPC.

This is something I would love a tutorial on (I have staircases in my projects, how do I handle the NPCs walking on them w/o move and slide (that I had to call twice because otherwise I had them clip through)

huwpascoe commented 3 months ago

How does the "nodes aren't free" comment compare to e.g. Unity or Unreal that can run the equivalent ootb without performance problems?

They aren't free either. Blueprints still need to build, Entities still have a presence in the managed heap.

UE does a lot of baking, without a high end PC it's just not practical.

jitspoe commented 3 months ago

I've created a new version of my BSP importer which allows reading the BSPX format which can store the original brush collision shapes (using -wrbrushes with ericw tools). Now much of the collision is using boxes and isn't split up as much:

Here's the project with the improved collision: test_character_body_perf_col_opt.zip

Also dropped the physics updates to 60hz and capped at 3 updates per frame. Even with all that, I'm still dropping to like 15fps with 100 moving characters (and we're not even dealing with animations and other things that also hit perf).

Seems the only solution right now is to stagger the physics and animation updates of enemies across multiple frames. In order to do this, it would be good to be able to have a physics frame update every process frame so we can update the player more frequently and stagger the enemy movement more evenly across frames: https://github.com/godotengine/godot-proposals/issues/10015 (and not have some frames where we're doing twice as much and other where we're doing nothing).

I really think we should be able to get better perf on a quake level of detail world, though, even with a lot of extra splits...

Staggering updates and the improved importer have drastically improved my performance, though. Just need a better way to spread them more evenly across frames.

Musicgun47 commented 3 months ago

I've just downloaded and tested the latest version of the MRP to try on my system. Here's the specs for reference: Godot v4.3.beta2 - Windows 10.0.22631 - Vulkan (Forward+) - dedicated NVIDIA GeForce RTX 2070 SUPER (NVIDIA; 31.0.15.5222) - AMD Ryzen 7 3800X 8-Core Processor (16 Threads)

Running the project as provided produces a stable framerate of ~15-18fps as reported. However, swapping the flag to use move and slide instead resulted in a consistent fps of 120 and above. Even increasing physics ticks to 120Hz still resulted in a stable 30fps.

This indicates that the issue you're experiencing is less to do with the Physics Server and more to do with your own movement code. Looking through your step_move function, it's easy to see why your performance is so bad. There is a guaranteed 2 calls to move_and_collide() per physics step per CharacterBody3D, and up to a maximum of 15 in the worst case scenario. Add to this that none of the CharacterBody3D's are using any sort of navigation or collision avoidance, resulting in the bodies constantly colliding with each other and, on occasion, trying to walk into corners.

While I do agree that there are optimisations that could be made to Godot Physics, which Jolt has done a good job of, this issue seems to be a classic case of confirmation bias. Godot Physics has a reputation for being slow/bad, whether deservedly or not, and this project seems to have gone out of its way to try and prove that to be the case.

huwpascoe commented 3 months ago

There is a guaranteed 2 calls to move_and_collide() per physics step per CharacterBody3D, and up to a maximum of 15 in the worst case scenario.

Internally, move_and_slide also call move_and_collide multiple times. It's not so different.

JoanPotatoes2021 commented 3 months ago

Lawnjelly displayed a nice navigation method using only navigation maps with vector collisions in Godot 3.6 in his last 2 videos on his youtube channel also called Lawnjelly, not trying to derrail the subject of this issue, but I believe this could be the answer to the OP issue regarding perfomance with many characters navigating and colliding at once, of course I don't know if this will be implemented in the engine at some point or not, in his videos he mentioned wanting to add to the engine as he finished it with C++ already for 3.6, it looked like a nice solution for a production project that needs lots of characters.

If you don't know what I mean it was a way of moving characters using only navigation map, colliding against it without physics, this addresses many perfomance issues because the collisions are only vector collisions against the navigation map data, this also in theory would allow us to use navigation system for avoidance & pathing for characters, maybe there is a godot proposal already for this somewhere to have navigation map collisions without physics,

Musicgun47 commented 3 months ago

Internally, move_and_slide also call move_and_collide multiple times. It's not so different.

It's ~100+FPS different. move_and_slide() calls move_and_collide() exactly max_slides or less and handles collisions better to minimise the number of calls. The issue being discussed here is performance and clearly the step_move() function implemented in the MRP is bad. The function is essentially an attempt to replicate the move_and_slide() functionality and obviously falls well short of achieving the same performance. Coupled with what I mentioned in my previous comment, the MRP was designed to tank FPS.

clayjohn commented 3 months ago

Testing the MRP on an M2 Macbook Pro with Godot 4.2.2 I can reproduce the findings of Musicgun47 semi-consistently. Running the MRP as-is starts off very fast 200+ FPS, but the FPS drops after running for 30 seconds or so down to a consistent 18-20 FPS. Looking in the profiler, the entire cost is in the step_move() function.

Trying again with USE_MOVE_AND_SLIDE set to true and the framerate consistently stays in the 200s. The main cost for the frame is the call to move_and_slide() which consistently takes approximately 3ms per frame. In one run however, I was able to reproduce an FPS using this condition, it happened after running the MRP for about 2 minutes and it was a sudden drop down to 60-80 FPS. I suspect it had to do with the positions of the CharacterBody3Ds, perhaps they had all grouped up in a corner together or found some other pathological position.

jitspoe commented 2 months ago

I think you guys are missing the point here -- yes, move_and_slide() is simpler and faster, but insufficient for complete character movement. It doesn't handle steps and other things. The physics engine should be able to handle multiple move_and_collide() calls per character. Also, the 100fps difference is due to hitting the threshold where multiple physics updates are hitting every frame which causes values to look more extreme.

Simply using navmesh with no world collision isn't viable, either, as I have enemies that can fly, swim, leap, etc. They need to collide with the world. I also need to do things like ledge checks which I haven't even gotten to, yet. I'm simply trying to recreate enemy behavior that functioned on single-core Pentiums in the 90's. I don't think I should need alternative solutions here. 😅

Actually, I just ran some tests of move_and_collide vs. my custom movement, and they're pretty similar. I'm curious why you're getting wildly different results, as the move_and_slide literally makes 3 calls to move_and_collide most of the time, which is what my code typically does:

image

Total time of 1 physics frame using move_and_slide is about 16ms: image

Custom code has similar perf: image

Are you using this to test with? https://github.com/user-attachments/files/16038712/test_character_body_perf_col_opt.zip

huwpascoe commented 2 months ago

Simply using navmesh with no world collision isn't viable, either, as I have enemies that can fly, swim, leap, etc. They need to collide with the world. I also need to do things like ledge checks which I haven't even gotten to, yet. I'm simply trying to recreate enemy behavior that functioned on single-core Pentiums in the 90's.

It doesn't need to be either extreme. If NPCs have a navmesh, that's space guaranteed to be clear, right? If an NPC knows it's on the navmesh without a doubt, and another agent isn't in the way, it can position += vel * delta. Then reaching the edge of the navmesh, meaning either a wall or a ledge is ahead, then it's time to query move_and_collide() to perform a jump, or fall off etc.

Those 90s games NPCs usually had no idea of physics, other than a single ray to query the floor elevation...

JoanPotatoes2021 commented 2 months ago

Interpolation would solve this when it get's implemented in 4.x #92391, the perfomance cost of having a higher physics framerate isn't that worth in my opinion unless you're making racing games that needs super high physics framerates due to the faster movements, I don't see the point of having it for a FPS or similar projects,

Simply using navmesh with no world collision isn't viable, either, as I have enemies that can fly, swim, leap, etc. They need to collide with the world.

Maybe you could use a hybrid system if we had navmesh collisions only? What if, when, you needed your enemies to fly, swin or leap you could enable the advanced collisions systems so they could interact with the world? This way it would work as a optmization, as movement will be more common that those special behaviors, I don't know, I could find more reasons to use a navmesh only, though they could get stuck by moving to places where the navmesh didn't allowed previously, but I still think the perfomance alone from only using navmesh collisions is worth to workaround it,

I'm simply trying to recreate enemy behavior that functioned on single-core Pentiums in the 90's.

Doing that in a moderm game engine is completely different on how games were built back then, I wish we could have the perfomance of those older games with modern game engines, however I agree that at some point Godot will need better solutions to support bigger projects, so in that regard I support the OP for more perfomance with CharacterBody3D,

Musicgun47 commented 2 months ago

I'm simply trying to recreate enemy behavior that functioned on single-core Pentiums in the 90's.

I feel like this shouldn't need to be said, but it seems necessary so I will. Doom and Quake did NOT simulate 100+ enemies moving around in a physically accurate 3D world at 120 or even 60 TPS on Pentium Processors. Doom is not even a 3D game and to say that either of these games "simulated physics" is probably giving them too much credit. Modern 3D engines wouldn't even run on that hardware let alone achieve even 1 or 2 FPS. I think you need to reassess what your benchmarks are.

It should also be stated that 16ms is well within the acceptable ranges for real-time applications. For reference, the minimum time for human visual processing is 13ms (i.e. the minimum time an image must be displayed to be registered), typical reflex speed is 150-300ms, and keystroke registration is generally in the 50ms range, but can be as low as 20ms depending on the hardware constraints. Given this, to try and push the physics engine beyond 16ms physics step is stepping beyond the bounds of a general purpose engine like Godot and into the realms of needing a custom physics system. While there may be room to improve on the ~0.2ms execution time of move_and_slide(), there's also plenty of ways to optimise and improve on your side, such as what's been mentioned above (i.e. proper navigation and collision avoidance). What you've created here is essentially a stress test of how many Character Bodies can be moving and colliding constantly before the physics engine starts to struggle.

I should also note that most modern AAA titles don't operate physics as fast as you're wanting (probably with the exception of esports titles).

jitspoe commented 2 months ago

That 16ms is for physics alone. If you have multiple physics steps in one frame, that drops the fps below 30. Throw in rendering and other stuff, and then you're down to like 15fps, and that's on a higher end gaming machine.

Regarding the navmesh, that would probably only help about 10% of the cases. Only a few enemies just run around on the ground. That also doesn't account for colliding with other characters, moving platforms, and other dynamic things.

Regarding "Quake can't do that", this was literally a map I made originally for Quake with the intention of bringing over to my Godot project, and it gets well over 2000 FPS on a modern Quake port with 100 enemies all spawned in. Granted, the physics tick rate isn't as high (I believe it's around 20-30hz), but still, it should be possible to get much higher performance with that same level of geometry than we currently get in Godot.

clayjohn commented 2 months ago

Regarding "Quake can't do that", this was literally a map I made originally for Quake with the intention of bringing over to my Godot project, and it gets well over 2000 FPS on a modern Quake port with 100 enemies all spawned in. Granted, the physics tick rate isn't as high (I believe it's around 20-30hz), but still, it should be possible to get much higher performance with that same level of geometry than we currently get in Godot.

I think the point people are trying to make is that Quake is getting the behaviour you want without doing the equivalent of dozens of move_and_collide() calls. They are not saying that Quake can't handle 100s of NPCs, they are saying that Quake handles 100s of NPCs by reducing the cost per NPC. Quake can't maintain its good performance and have as complex as a physics loop as what is found in your step_move() function.

From the sounds of it, whatever physics updates Quake does, they are likely simpler than even using Godot's move_and_slide() which is itself a very high-level and expensive function.

On top of that, Quake's physics is way simpler than Godot can ever be. Quake's physics only has to handle Quake-style levels while Godot's has to be capable of handling anything. So even in an apples-to-apples comparison, Godot would be slower.

Ultimately, this isn't an apples-to-apples comparison because your own code for the physics update is hundreds of times more complex than the one used by Quake, and then on top of that, you want to acheive 120hz while Quake is doing 20-30hz.

In summary, this issue is complicated by a few factors:

  1. step_move() vs move_and_slide(). step_move() takes approximately 20x longer on average than move_and_slide() which indicates the root of the problem may be in the user script rather than the physics engine. We need to actually make a clear determination here as it is unclear where the problem lies.
  2. The performance with step_move() starts off good, then drops off. This indicates that a pathological case is being hit, this could be the fault of the user script (perhaps enemies are grouping up or getting caught in corners and it causes the script to always take the worst-case path), or the fault of Godot.\

Right now, there isn't much actionable on our side until we know for sure whether the problem lies with us or with your project/script. The project is complex enough that it isn't obvious and the fact that move_and_slide() is more than fast enough strongly indicates the problem is in step_move(), so it is tough to motivate a contributor to do a deeper investigation than what has already been done.

clayjohn commented 2 months ago

Just some quick data to help. I moved all move_and_collide calls into a custom function "custom_collide" so that the script profiler can track it:

func custom_collide(body, motion, test_only = false) -> KinematicCollision3D:
    return body.move_and_collide(motion, test_only)

I then ran the profiler and kept track of total frame time and time spent in custom_collide:

Screenshot 2024-07-05 at 11 54 30 AM

As you can see, they are tracking very close. So it is clear that custom_collide is responsible for all the cost.

Next I looked at the "average time" taken by custom_collide and it is always 0.03ms. But the number of calls correlates with the spikes in frame time.

For example, at the lowest point custom_collide is called 160 times. Those small, regular spikes are places where the number of calls jumps to 300, this happens very regularly. At the end of the graph, the number of calls to custom_collide exceeded 1700 hundred.

So we have a couple takeaways so far:

  1. step_move() has a pathological case that calls move_and_collide() way more than you expect (consistently reaching over 10x normal calls)
  2. Frame time increases proportionally with the increase in number of move_and_collide() calls. So if you call move_and_collide() 10x as much, the physics update will cost 10x
  3. _Edit: Also testing with a lower physics tick rate to 30, you never hit the pathological case, average performance drops from 500 FPS to 100 FPS, but then stays stable. So the number of calls to move_and_collide() isn't unbounded and we can attribute part of the performance spike to overrunning the amount of time allotted for a physics tick._

So, @jitspoe I think the action item for you is to investigate what is causing the calls to move_and_collide() to suddenly jump up. This is clearly an issue in your script that needs to be dealt with.

The action item for the physics team would be to just reduce the overall cost of move_and_collide() because any decrease in cost will be a direct benefit for situations like this

Calinou commented 2 months ago

I would probably try capping the number of move_and_collide() calls per physics tick per entity in the script, especially for AI where occasional failures in stair stepping are hard to notice.

Regarding "Quake can't do that", this was literally a map I made originally for Quake with the intention of bringing over to my Godot project, and it gets well over 2000 FPS on a modern Quake port with 100 enemies all spawned in. Granted, the physics tick rate isn't as high (I believe it's around 20-30hz), but still, it should be possible to get much higher performance with that same level of geometry than we currently get in Godot.

Quake's physics tick rate is 72 Hz, but enemy AI (including is movement) only stepped at 10 Hz. This is most noticeable if you disable entity interpolation in your source port or play vanilla.

Doom's physics tick rate is 35 Hz, but enemy AI (including its movement) is only stepped at 8.75 Hz (once every 4 ticks). This can also be noticed by disabling entity interpolation or playing vanilla.

huwpascoe commented 2 months ago

In step_move(), there's only one obvious problem:

# Test for ground - short cast downward without actually moving
collision_info = body.move_and_collide(up * -0.1, true)
if (collision_info):
    var num_collisions := collision_info.get_collision_count()
    for collision_index in num_collisions:
        if (collision_info.get_normal(collision_index).dot(up) >= WALKABLE_NORMAL):
            body.on_ground = true

It doesn't actually do anything, the body.on_ground flag will be set further ahead as part of the loop.

Deleting it gives noticeable performance gain.

mrjustaguy commented 2 months ago

I'll weigh in on this, as I've had moments when I was like what the hell is wrong with Godot's Physics Performance in a 2D game of all things (hint, it's user error/carelessness), and seems quite relevant here..

With physics, it is very easy for the user to end up messing up big time, either through not really making good collision geometry (For me i had hundreads of asteroids with godot's sprite collision polygon conversion at default resulting with hundreads of points and a crap ton of convex shapes created per asteroid, and then having them all collide a bunch, when in reality the asteroids only needed a dozen or so points and had like 3 or so convex shapes, changing 1 fps to hundreads) or by doing logic that asks for a physics object to do multiple full on collision checks (things like calling move_and_slide, move_and_collide) per frame when really you could have gotten the same behavior with a more well planned approach and a single call.

Now obviously, code on the first pass is bound to be highly inefficient and filled with needless redundancies as you're just getting things to do what you want, but when you do so, go through the code, figure out exactly what it's doing, analyze it a bit, and try to find and remove redundancies and things that don't really affect the resulting output all that much, and even try figuring out if you could frame the whole thing in a different way that avoids the hotspots you've got, and see if it works better.

Going through code and iterating on it a few times like that is generally good optimization practice IMO

Musicgun47 commented 2 months ago

Regarding the navmesh, that would probably only help about 10% of the cases. Only a few enemies just run around on the ground. That also doesn't account for colliding with other characters, moving platforms, and other dynamic things.

You seem to have a wrong understanding of what navigation entails as this statement is 100% wrong. There's an entire section of the Godot Docs that covers this including collision avoidance, moving platforms and different actor types (e.g. flying, swimming etc.). Simply put, navigation is not an option for this type of game; it is a requirement. It basically tells your characters where they can and can't move to and how to get there. It will help you avoid situations like this:

image

There is not a single game with dynamic enemies that does not have some form of navigation implemented, from Pac-Man and The Legend of Zelda to Horizon: Forbidden West and Elden Ring. Granted the earlier games had vastly more simplified navigation systems but they still had some navigation. If you want your characters to be able to reliably walk through doors, not get stuck trying to move through a wall, or not drown themselves in a puddle of water (looking at you Minecraft dolphins) you need some form of navigation. It can be as simple or complex as you need it to be but without it you won't even be able to get your enemies to track the player properly let alone anything more substantial than that.

Saul2022 commented 2 months ago

Testing on a qualcom adreno 740 gen 2 s23+ gives me the following results

Default mrp without modifying anything

physics ticks = 160 gives 12 to 17 even decreasing to 10 fps sometimes

Physics ticks = 60 gives stable 60 fps ( vsync is already on i think

Then boxes scene both the 160 and 60 ticks gave me stable 60 fps

So the issue atleast on mobile are the collision shapes and i still don't get why it capped to 60 fps when editor showed like over 120 in the quake scene.

jitspoe commented 2 months ago

For example, at the lowest point custom_collide is called 160 times. Those small, regular spikes are places where the number of calls jumps to 300, this happens very regularly. At the end of the graph, the number of calls to custom_collide exceeded 1700 hundred.

So, @jitspoe I think the action item for you is to investigate what is causing the calls to move_and_collide() to suddenly jump up. This is clearly an issue in your script that needs to be dealt with.

@clayjohn Are you sure it's not just doubling because 2 physics updates are happening in the same frame? I'm not able to reproduce this as the framerate immediately tanks for me, but compare the number of "step_move" calls per frame to the "move_and_collide" calls, or set the max physics updates per frame to 1 and see if the spikes still happen.

There are sometimes cases where the physics get stuck, but there's a MOVEMENT_RETRIES cap of 3, so it shouldn't get completely out of hand.

In step_move(), there's only one obvious problem:

# Test for ground - short cast downward without actually moving
collision_info = body.move_and_collide(up * -0.1, true)
if (collision_info):
  var num_collisions := collision_info.get_collision_count()
  for collision_index in num_collisions:
      if (collision_info.get_normal(collision_index).dot(up) >= WALKABLE_NORMAL):
          body.on_ground = true

It doesn't actually do anything, the body.on_ground flag will be set further ahead as part of the loop.

Deleting it gives noticeable performance gain.

The code further on isn't guaranteed to set on_ground, as sometimes the character controller moves along the ground without touching it or there are small gaps/bumps/etc. that cause it to fail, which can lead to weird movement because of constantly altering between ground and air control. That check could, of course, be moved until later and only run if on_ground is false.

@Musicgun47 Regarding the navigation stuff, I was referring to it being used as a complete replacement for physics, and that really only being viable for around 10% of the enemies. Even then, those enemies can be kicked into the air, pinned to walls, and knocked about or moved in other ways, so the physics is pretty important if I don't want them just phasing through various things.

Musicgun47 commented 2 months ago

@Musicgun47 Regarding the navigation stuff, I was referring to it being used as a complete replacement for physics, and that really only being viable for around 10% of the enemies. Even then, those enemies can be kicked into the air, pinned to walls, and knocked about or moved in other ways, so the physics is pretty important if I don't want them just phasing through various things.

I never suggested replacing physics with a navmesh as I agree that that would be completely inadequate for your game. However, I will reiterate that you need a navigation system because as you can see from the screenshot I posted, you already have enemies phasing through the walls and many others trying to join them. I'm not suggesting navigation to fix your step_move() function, as I think taking on board the advice others have already posted should give you enough to work on in that regard; I'm saying having a navigation system in place would eliminate possibly 90% of the collisions I'm observing when running your project (especially the ones from characters getting where they're not supposed to). And that's not to mention the fact that you'll need it anyway if you ever want to have AI that actually functions.

Anyway, I think I'm going to retire from this discussion as I don't think heading in any meaningful direction now and I've given the advice I can. It's up to you if you decide to take any of it.

clayjohn commented 2 months ago

@clayjohn Are you sure it's not just doubling because 2 physics updates are happening in the same frame? I'm not able to reproduce this as the framerate immediately tanks for me, but compare the number of "step_move" calls per frame to the "move_and_collide" calls, or set the max physics updates per frame to 1 and see if the spikes still happen.

@jitspoe I am fairly certain it is a combination of both. To confirm, I printed the number of times that move_and_collide() was called per physics_process and it regularly reached 10+ (I think the highest I saw was step_move() calling move_and_collide() 17 times. That's just for one update tick. You would multiply the number of calls by the number of update ticks that happen in a frame.

I think both issues are happening as those numbers are showing only a 3-5 times increase in the number of calls to move_and_collide() (the base number of calls seems to be 3). So there is likely an explosion of calls in the script which makes the FPS decrease enough that we start calling multiple ticks per frame which tanks the FPS even more.

When the FPS tanks is probably dependent on the speed of the computer. I was testing on an M2 Macbook which has a pretty good CPU. If your CPU is weaker, performance might tank immediately if multiple updates are called per frame immediately.

jitspoe commented 2 months ago

I've put together an updated test to more easily profile things.

  1. Set the max physics updates per frame.
  2. Moved the character movement into the process function instead of _physics_process, so the frame time isn't jumping all over the place when the physics and process frames don't line up. (Might be good to have a way to do this in the physics, see https://github.com/godotengine/godot-proposals/issues/10015)
  3. Disabled the step logic so the move_and_collide and move_and_slide comparisons are more 1:1.
  4. Added a shape cast movement mode to avoid calling either moveand* function.
  5. Can swap between modes on the fly with space bar.
  6. Added a wrapper function to the actual physics calls so they show up on the profiler.
  7. Includes unoptimized and optimized world collision. (You have to manually swap the levels out in the main scene to test).

New project: test_character_body_shape_cast.zip

Interesting points of note:

  1. Shape cast is faster, but not as significant of an improvement as I was expecting considering how much time is spent in the unstuck logic. Also, as one would expect, getting stuck is a problem. Once stuck, the performance can actually be worse as it's doing more casts.
  2. With the stepping logic disabled, my step_move using move_and_collide() is actually faster than move_and_slide(), so it seems there's room to improve the performance in move_and_slide() for sure.
  3. Difference between optimized and unoptimized level isn't as staggering when doing the movement every _process().