V-Sekai / godot-vrm

Importer/Exporter for VRM avatars and MToon shader. Available for Godot 4.1+ and 3.2+ in the Asset Library.
https://godotengine.org/asset-library/asset/2031
Other
287 stars 24 forks source link

The spring bone performance is fine #75

Closed hiltonm closed 1 month ago

hiltonm commented 1 year ago

Hi, this is not an issue but more of a heads-up about the performance of the spring bone logic here. I saw that you were considering implementing FastSpringBone but I don't really think that is necessary. I checked your code performance and noticed that the only really slow thing in it is the skel.set_bone_global_pose_override on vrm_spring_bone_logic.gd file. All the other math you have here is lightweight in comparison. For me the problem with set_bone_global_pose_override is that it calls _make_dirty for every single changed bone on the skeleton and that makes godot force update the entire skeleton without necessity.

I tested this, commenting out the _make_dirty and the final performance with the spring bones on and off is the same. The spring bone logic here has almost no impact on FPS. And I could not even see any visual issue or glitch with the lack of _make_dirty at all, it just works probably because the skeleton gets updated every frame anyway, at least if you have an animation playing on the character which is the normal use case for a character with spring bones.

What I mean with this is, maybe instead of looking at your code thinking that you need to do more on it, we should look into improving Godot to work better with these kinds of algos. Other optimization approaches could be ways to disable or remove bone chains that have no weights on the mesh, for instance. I might be missing something so I hope others will test this out themselves to see if they find any issue. You really just have to comment that one line on skeleton_3d.cpp

Thanks a lot for the addon, cheers!

lyuma commented 1 year ago

Good catch about the extra calls to _make_dirty in set_bone_global_pose_override. I'll have to look into the skeleton code and understand why that is being done and if there's a way to improve the performance or somehow batch update bones so that the dirty flag has less effect per update on average.

I think one of the issues for us is we want to design in the ability to have hundreds of avatars: perhaps a player character, tons of NPCs, dozens or hundreds of remote players in a social VR setting. So what might appear to be fine in the single model case could become a bottleneck at scale.

But It's too early to worry about this now, I agree. we should fix the problems we can observe, not hypothetical future issues.

hiltonm commented 1 year ago

Cool! I made a few more tests with a scene with around 20 instances of the same model looping a mixamo idle animation (with some moving around crazily so I can see the bones are working) on a mid-range PC without dedicated GPU and I got 30fps, whether the spring bones are active or not didn't matter. The performance fell from 60fps mostly because of the amount of meshes, materials and dynamic lights/shadows affecting them (with just 1 light affecting only the models without shadows, 50fps, so the biggest hit was the shadows from different lights). That also probably means that on a high-end pc it already runs 100 models, with enough vram for all potential different textures/materials.

I'm testing this with Godot built for production without the _make_dirty. With the _make_dirty it runs with 1fps.

That said, for my use case, 20 inside the frustum is way more than enough, plus the way you import all meshes and materials separated gives me power to change stuff around in runtime. For your use-case of many hundreds I believe you need to bake the materials so each mesh has only 1 material or so, or even merge all meshes together in one.

Have you ever checked the Cats plugin for Blender? They do this kind of model optimization, but the last time I tried it I wasn't able to use it very well and didn't try too hard. The optimization ideas there are good for you though. I just hope that if you implement these on this addon, you will keep the current "unperformant" setup for those of us that need the power to change things around.

By the way, you can of course close this issue whenever you want. I just wanted to give you some feedback to try and help you make better decisions. I saw that FastSpringBone implementation and, with all due respect to the dev that put a lot of effort on it, it feels like a lot of work and complexity for very little gain. But it is your call in the end of course.

I hope I didn't bother you with this 'issue' approach. Thanks again for the addon <3

fire commented 1 year ago

We have older tech that broke during the godot 3 update for "one mesh surface merging", but it wasn't designed for VRM materials.

https://github.com/V-Sekai/godot_scene_merge

fire commented 1 year ago

I did some thinking.

void Skeleton3D::set_bone_global_pose_override(int p_bone, const Transform3D &p_pose, real_t p_amount, bool p_persistent) {
    const int bone_size = bones.size();
    ERR_FAIL_INDEX(p_bone, bone_size);
    bones.write[p_bone].global_pose_override_amount = p_amount;
    bones.write[p_bone].global_pose_override = p_pose;
    bones.write[p_bone].global_pose_override_reset = !p_persistent;
        // _make_dirty(); Deleted
    force_update_bone_children_transforms(p_bone);
}

Should also work too.

fire commented 1 year ago

would be nice if we could open this up as a discussion on godot itself I'm worried it's too late in the 4.2 dev cycle to make a risky change like that though if it's free performance maybe it's worth it it will probably help all skeleton related code

lyuma commented 1 month ago

With the addition of SkeletonModifier3D, there is effort to build some spring bone functionality into the Godot engine code.

I am concerned about performance of large numbers of modifiers: it would be good to support 100's of characters with 1000's of spring bones, so we need to make sure whatever modifiers are written handle this scale.