godotengine / godot

Godot Engine – Multi-platform 2D and 3D game engine
https://godotengine.org
MIT License
91.13k stars 21.19k forks source link

Absolutely terrible performance when using skeletons and skeletal animation #99194

Open Miurg opened 1 day ago

Miurg commented 1 day ago

Tested versions

Reproducible in: v4.3.stable.official [77dcf97d8]

System information

Godot v4.3.stable - Windows 10.0.19045 - Vulkan (Forward+) - dedicated AMD Radeon RX 7700 XT (Advanced Micro Devices, Inc.; 32.0.12019.1028) - AMD Ryzen 7 7700 8-Core Processor (16 Threads)

Issue description

I wanted to create an RTS game similar to BFME, but I ran into the problem that the engine is terribly optimized when using many units with a skeleton and animation based on it. I created the simplest model with the simplest animation and only with them I already get 100 fps with 1k units and below 10 fps with 10k units. In this case, the GPU and CPU are practically not involved. I can't use RenderingServer because I don't see that skeletal animations are implemented there in any way, the documentation seems completely unfinished at this point.

Steps to reproduce

  1. Run scene.

Minimal reproduction project (MRP)

reproduction.zip

Zireael07 commented 1 day ago

I do not think ANY game engine out there can handle 10k animated skeletons. Godot has had a big improvement when it comes to skeletons and performance.

Use LODs. Switch off animations on mobs outside the view. Switch off animations on mobs outside a certain range (too far away for you to see even if they're technically in view).

mrjustaguy commented 1 day ago

Actually, Only Rendered skeletons absolutely tank performance, and is directly affected by the number of Mesh surfaces that have a skeleton, and further to that how many times they're rendered (so up to 5 times with a 4 split directional shadow and view).

This isn't the first time this has popped up, and it actually does seem to be CPU bottlenecked last time I've looked into this, as a single thread just gets hammered like crazy while the other's are asleep.

Switching off animations will not resolve this, you'd have to jank out the skeleton from mobs that are being rendered by shadows and or the camera for performance to improve, and this is not good.

See https://github.com/godotengine/godot/issues/93568

fire commented 1 day ago

I might look at this. As far as I know, Unreal Engine struggles in the 100-500 "skeletons" range, and we're trying to go 100x that.

I have not played with their ECS system in a long time, so things are probably completely different now.

Capewearer commented 18 hours ago

I do not think ANY game engine out there can handle 10k animated skeletons. Godot has had a big improvement when it comes to skeletons and performance.

Use LODs. Switch off animations on mobs outside the view. Switch off animations on mobs outside a certain range (too far away for you to see even if they're technically in view).

Take a look for Serious Sam series, Total War and musou games. You'll be surprised.

Zireael07 commented 18 hours ago

Those are bespoke engines, not general game engines I was referring to.

Capewearer commented 18 hours ago

Anyway, techniques they are using are fully implementable in Godot. E.g. modern Total War uses GPU skinning. Serious Sam 4 uses impostor rendering for large crowds. Last one is definitely appliable to Godot.

fire commented 18 hours ago

We discussed this in animation meetings, but reviewing existing game technology reports like https://www.remedygames.com/article/how-northlight-makes-alan-wake-2-shine, we found that it would require a GPU-driven animation system on top of our existing GPU mesh skinning system.

As far as I know, we have yet to develop technologies for GPU-driven animations.

Imposters are approved for implementation in godot engine; feel free to submit work. https://github.com/zhangjt93/godot-imposter

VAT3 is also possible: https://github.com/G4ND44/Godot_VAT3

Zireael07 commented 18 hours ago

Serious Sam 4 uses impostor rendering for large crowds

If you have impostors, they're no longer skeletons afaik so this doesn't fit your claim that said engine can render 10k animated SKELETONS

Capewearer commented 17 hours ago

Serious Sam 4 uses impostor rendering for large crowds

If you have impostors, they're no longer skeletons afaik so this doesn't fit your claim that said engine can render 10k animated SKELETONS

Because it handles not 10k, but 100k entities, of course it's all smokes and mirrors, but anyway, the order of magnitude is higher, than 10k. That's why animated imposters are used for further enemies. Anyway, the amount of 10k skeletons is still usable in such engine. Otherwise official 10x enemy multipliers would crash game in most intensive fights.

smix8 commented 14 hours ago

Take a look for Serious Sam series, Total War and musou games. You'll be surprised.

The only thing I would be surprised here is if people would really think that any of those games use 100+ fully skinned skeleton characters that are updated every frame.

It is very obvious for even untrained eyes in all those games that they use a mix of LOD for both skeleton and animation and skinned mesh as well of swapping everything out for animated sprites and vertex animation at a distance. They are updating everything not inside the camera focus or at a distance at a very low fps and not every frame.

Especially in the Total War series they are not even trying to hide the LOD switch or the low animation fps, it happens in plain sight when you move the camera over the focus bubble / distance threshold back and fourth. Set your hardware setting to low quality and you will see the very aggressive LOD and sprite switches at a close up and how all those things animate like a 5 fps flipbook animation.

Undeniably there are things that can be optimized in the Godot skinned meshes animation system but especially for an RTS you also need to bring a lot of your own accumulated knowledge to the table on how to optimize for these kind of games if you want thousands of actors active.

mrjustaguy commented 8 hours ago

I believe There is actually an example of a game that handles 100+ fully skinned skeleton characters that are updated every frame just fine. ARK Survival Evolved, an open world dino survival game, and I've seen some really dense dino farms there easily packing a hundred raptors alone.

Do note however the game is infamous for poor graphics optimization, it's remaster, ARK Survival Ascended that moved to UE5 especially so, yet if you dial the graphics settings down they'll run fine from a CPU perspective even in such animation intense scenarios.

There are even options in the original to simplify distant animations and use lower quality animations, but they basically don't change things much.

Multithreading animations in Godot would massively help, and in one of the previous threads about Godot's Poor Skeleton Performance there were comparison to UE and Flax, and UE version used suffered as much as Godot, but Flax didn't with it being attributed to Multithreading support of the Flax engine, though Godot fared by far the worst. see https://github.com/godotengine/godot/issues/90943

Zireael07 commented 8 hours ago

I believe There is actually an example of a game that handles 100+

You're off by an order of magnitude or two. OP wants 10k or 100k, not a hundred.

mrjustaguy commented 7 hours ago

I could see 100k running with animation being multithreaded and only rendering for the main camera, and having shadows just be blob shadows (as that'd easily add another 1-4x render runs) and with every skeleton having only one surface.

In my MRP I've got like 2k skeletons running at 30 fps on an i3 10105f, and it's using just one thread to do that, and IIRC there are multiple surfaces per skeleton in the MRP and the sun shadow running them all an added 4 times

Edit: Retested my old MRP on 4.4 dev 4, and it's 2160 Skeletons, each with 6 surfaces, about 50 bones each, with Directional Shadows enabled, with each surface casting a shadow... running at 30 fps on a single thread of an i3 10105f, with the other threads practically idling.

2160x6 is 12 960 mesh surfaces, times 5 is up to 64 800 skeletal mesh surfaces rendered each frame, which is right in the middle of OP range.

do note also however there is no other logic going on on the main thread that's absolutely murdering performance further.