d4rkc0d3r / d4rkAvatarOptimizer

d4rkpl4y3r's VRChat Avatar 3.0 optimizer
MIT License
390 stars 17 forks source link

Avatar performance is worse when using the optimizer #92

Open MivaNyan opened 6 months ago

MivaNyan commented 6 months ago

Noticed this issue a while ago, I thought it might've been a random bug in an older version. The issue has persisted no matter the versions, I only use poiyomi shader and face tracking on my avatar. Everything including the optimizer is of the latest version. Write defaults - On. The avatar in this example was the worst offender. On other avatars sometimes frame time is only slightly worse, sometimes it's better, but in most cases with either "Full" preset or "Shader Toggles" preset enabled, frame time is worse compared to using no optimizer at all.

Tested performance in an empty world, only 1 full mirror enabled, the closer to the mirror - the worse frame time gets. Worst case frame time is reflected on the screenshots below.

Full preset: d4rk full crop image

With "Merge Different Property Materials" Disabled: d4rk no material merge 1 image

With "Merge Skinned Meshes" Disabled: image image

With "Merge Different Property Materials", "Merge Skinned Meshes" and "Write Properties as Static Values" Disabled: image image

And Finally, without using the optimizer at all: image

d4rkc0d3r commented 6 months ago

Thats expected. Merging meshes with shader toggles adds extra code to the shader and you always do skinning & some vertex shader code for all vertices even on disabled stuff. Animated material properties can also generate a lot of extra code when using the shader toggles setting. That setting fundamentally trades gpu performance for cpu performance.

If you send the generated assets as described here I can take a look if it generated some truly outrageous stuff or if its just the expected overhead of shader toggles.

MivaNyan commented 6 months ago

Thank you for your response. Here are the assets. In this case, what settings would you recommend I use if I want to target better GPU performance but still cut down on materials to fit into let's say a medium rating? TrashBin.zip

Toys0125 commented 6 months ago

@MivaNyan You need to remove materials by atlasing them in one skinned mesh. For example, you have four materials on your clothes. You need to make a material atlas of all four into one using Blender or another optimizer program that makes an atlas of materials. If using Blender try out Material Combiner

d4rkc0d3r commented 6 months ago

In this case, what settings would you recommend I use if I want to target better GPU performance but still cut down on materials to fit into let's say a medium rating?

The basic preset. It doesn't change the shaders and still allows for some mesh merging as well as merging of material slots that use the same material.

The better GPU performance you observed in the second to last test is likely due to the merge same ratio blendshapes if I were to guess.

There isn't really much for my optimizer to do GPU wise for shaders that already have their own lock in like poi. For shaders that don't have a lock in however Write Properties as Static Values tries to be a generic version of lock in and can help a lot.

d4rkc0d3r commented 6 months ago

ok i had a look at some of the shaders it generated. seems fine except for poi fur. poi locking does one crucial optimization that mine doesnt. please try again with optimizer set to shader toggles while keeping the fur materials locked in instead of unlocked. im curious how much of the performance difference is due to that.

MivaNyan commented 6 months ago

Here's the result with "Shader toggles" preset and fur material locked: image

@Toys0125 thanks for your recommendation, I'm already atlasing some materials!

d4rkc0d3r commented 4 months ago

I've investigated this a bit more and found that some of my code generation does harm poi fur performance considerably. Specifically I generate a branch that samples all the textures that have a sampler at the start of the fragment shader to make sure the shader compiler doesn't get rid of the samplers. Putting that code at the end of the frag instead pretty much negates the performance penalty over poi native lock in in my test case.