[Studio] Optimize FindLoopNoAcc

takahiro0327 commented 6 months ago

Slightly optimized FindLoopNoAcc. Loading large scenes, which used to take 95 seconds, is now 90 seconds. Please merge if it looks OK.

The optimization I have implemented is simple, it records the path where the object was found in a previous query and searches that path the next time there is a query with the same name.

About 30% of the queries fall into this category and are accelerated. Seventy percent of the queries return null.

In addition, I added the ability to output the scene loading time to the log.

[Info   :Studio Optimizations] Scene Loaded: 87.10[s]

takahiro0327 commented 6 months ago

In fact, this change may not be as fast. Disturbance factors are more significant. On average, I'd expect a 1-3 second reduction versus a 90s load time.

ManlyMarco commented 6 months ago

In fact, this change may not be as fast. Disturbance factors are more significant. On average, I'd expect a 1-3 second reduction versus a 90s load time.

Would caching the results maybe be worthwhile? It could help if there are many repeated calls on the same Transform, maybe cache all child transforms in that case and re-check only if there is no match found in cache? I think I did something similar to this in a different plugin, maybe ABMX (edit: here).

takahiro0327 commented 6 months ago

Thanks for the review. No, I also tried caching the child transforms. But stopped because it was slightly slower.

Dictionary<string,GameObject> _nameToGObj;

There are more queries that return null because no object is found than queries that do not. Therefore, it would have been faster if child transforms could be cached and non-existent transforms could be determined faster. But it did not. There are probably a lot of unrelated child objects mixed in with the query. I think the cost of recording all those children is too high.

My method is caching the paths. I think the nice thing about this method is that the cached path may work for another transform search.

takahiro0327 commented 6 months ago

I've come up with an idea to speed up Find a bit, so I'm going to experiment with it.

takahiro0327 commented 6 months ago

Please review with this.

I've tried to keep the scanning progress on multiple queries. Maybe it is faster.

Searches for the same name were about twice per character. The cache in trasform is of relatively low value. The first search would register in the cache, the second search would match the cache. That's it.

The number of offspring varied by trasform: there were trasforms with 700 children and others with 70-90 children. I think the reason it was slower when I tried it before is that caching the children in the 700 one would cost more to add to the table.

Measured with a 200MB file with 20 characters. Frankly, it is ambiguous whether it is getting faster or slower.

before:

after:

after2(current):

It was faster as far as I could see in the mono profiler, but I think the overhead is too large to be inaccurate. before1: 41.64[s]

0   2408390 IllusionFixes.StudioOptimizations:FindLoopNoAcc (UnityEngine.Transform,string)  41642654700 3002847232

after1: 4.19[s]

0   865620  IllusionFixes.StudioOptimizations:FindLoopNoAccWithPaths (UnityEngine.Transform,string,System.Collections.Generic.List`1<string>)   4192795500  274575360
0   14170   IllusionFixes.StudioOptimizations:FindLoopNoAcc (UnityEngine.Transform,string)  803156100   46448640

after2: 0.07[s]

0   14170   IllusionFixes.StudioOptimizations:FindLoopNoAcc (UnityEngine.Transform,string)  35402800    598016
0   14170   IllusionFixes.FindLoopAssistant:FindChild (UnityEngine.Transform,string)    32667100    598016
0   8321    IllusionFixes.FindLoopAssistant/Frame:NextChild ()  3458500 278528
0   132 IllusionFixes.FindLoopAssistant:RegisterPath (UnityEngine.Transform,UnityEngine.Transform,string)   2176800 94208

before_MonoProfilerOutput_2024-02-17_18-43-32.csv after1_MonoProfilerOutput_2024-02-17_18-47-39.csv after2_MonoProfilerOutput_2024-02-17_18-57-52.csv

takahiro0327 commented 6 months ago

My PC was the cause of the blurry numbers. It was a few percent, but it was decently fast.

before: ave:55:64[s]

Scene Loaded: 58.17[s]
Scene Loaded: 54.54[s]
Scene Loaded: 54.77[s]
Scene Loaded: 56.30[s]
Scene Loaded: 54.42[s]

after1: ave:53.68[s]

Scene Loaded: 53.63[s]
Scene Loaded: 53.58[s]
Scene Loaded: 54.02[s]
Scene Loaded: 53.67[s]
Scene Loaded: 53.51[s]

after2: ave:53.34[s]

Scene loading completed: 53.2[s]
Scene loading completed: 53.8[s]
Scene loading completed: 53.5[s]
Scene loading completed: 53.2[s]
Scene loading completed: 53.0[s]

takahiro0327 commented 6 months ago

I'll write it here because I don't want it to get caught in a search later.

Sorry about the title, but it's not FindLoop optimization, it's FindLoopNoAcc optimization. It is only used in the bone initialization process. The goal is to speed up scene loading.

Items accumulate in _nameToPath. However, most items accumulate only two. This is because there are few kinds of root transforms.

The exception to this is the hair bone, which has variations, so it accumulates.

Still, it is much faster than doing Find on dozens of children each time. If you only accumulate two items, you can almost always find one of them. Much faster than scanning dozens or hundreds.

takahiro0327 commented 6 months ago

In the data I checked, 30% of the queries are initially returned and the remaining 70% return null. The other percentage of returns is less than 1-2%. The _childMap speeds up the 70% of queries that return null. It was necessary to keep track of all children to determine non-existent transforms without having to scan through dozens of children.

takahiro0327 commented 6 months ago

However, recording all of the root's children conversely slowed it down. This was due to the existence of a root with hundreds of children, and the high cost of recording all of those children. Also, the root with hundreds of children did not return nulls. In such a root, not all children were seen.

A _findStack was added to store the scanning status of children. This minimizes the amount of Find children. Together with _childMap, it can return null immediately.

takahiro0327 commented 6 months ago

I don't think there is room for optimization in a regular FindLoop. Removing the caller may be effective, but there does not seem to be room for optimization in the FindLoop itself.

In this case, optimization was performed based on the following assumptions.

if a query comes in with the same transformation as the previous query, assume that the transformation has not been changed since the previous query.
the structure of the transform is roughly the same
but do not make assumptions about what kind of children you have or what kind of structure you have

Since it is a series of calls from one parent function, 1 is a reasonable assumption.

takahiro0327 commented 6 months ago

You're right that the return is small compared to the complexity of the code. But it is also true that it reduces about 100ms per character, and loading time has to be accumulated like that to reduce it. There is no doubt that loading time is reduced by seconds when there are many characters.

It is dangerous to complicate the code if the changes are frequent or have a large impact area. This kind of change should not be included. However, I believe this is not the case here.

takahiro0327 commented 6 months ago

Thanks for the merge!

IllusionMods / IllusionFixes

[Studio] Optimize FindLoopNoAcc #55