Closed WikkidEdd closed 1 year ago
Hi @WikkidEdd, beside the low-hanging fruits you mentioned (and which we will look into!) this is unfortunately an inherent problem of the C# API. So I'm wondering if we could also cover use cases with traversal that have full native implementations under the hood. For that, we'd need to know a bit more about your use case, because as soon as we pass an Entity to the visitor impl, there have been allocations beforehand. What entity properties do you need to look into in the visitor? The name?
Would you mind sharing your improved code here? Then we can compare it against our own upcoming improvements to make sure we don't miss anything. Cheers!
I'll contact you via a different channel to discuss our exact use cases.
Here are the modified files. It only fixes the immediate cases where we have direct problems, but I can see that the FindFirstEntity also has the same problems as VisitEntityImpl MemoryImprovements.zip
Sorry, the RemoteRenderingAPI.cs was missing the using System.Linq;
I'd removed it so I could easily see what other things were making use of Linq. There are a few other potentially tight loop areas where it is in use that you might want to clean up.
Quick question, did you see any improvement when using 'for' as opposed to 'foreach'? It has been a long tradition that foreach had allocation penalty, but that should be resolved for a while?
Furthermore, what would be your use case for a (much) faster visitor function? Find node by name? Find node with component?
I had heard the same with regards to foreach allocations being resolved, but I'm not 100% what Unity version/scripting backend that is related to.
I was doing most of the profiling in Editor because it's faster to iterate, but it was making a difference there. It's possible that when built to IL2CPP it doesn't have the same effect, but high level the memory allocation we pretty much the same in Editor and on device.
In the deep profiler it was allocating when calling GetEnumerator() from the children array.
Hi @WikkidEdd, we have added your short-term improvements to the SDK that improves the scene graph traversal significantly.
More optimizations are possible, but they'd require some invasive (and also API breaking) changes. It's in our backlog now, but I figure I'm closing this ticket for now.
One of our users shared a model with us which was causing them some problems. While debugging it noticed there was large performance spike when the model first loads. It caused the HoloLens display to go blank for 1-2 seconds.
Upon profiling, we noticed that it was caused by us searching specially named nodes in the hierarchy. We noticed that it was allocating 34mb just to look at the names of each entity. We based our logic on the showcases app's FindAllByPattern Entity extension method which uses VisitEntity.
I profiled a simple usage VisitEntity which doesn't do anything other than count the number of entities it's visited.
We've not been using quite so complex models in our testing before, so weren't aware of which how much was being allocated, but for a complex model it is a lot.
Model A - 96756 entities - 26.4mb Model B - 26865 entities - 3.7mb
After some deep profiling and some fixups I've managed to get it down to:
Model A - 9.6mb Model B - 2.7mb
There were two main issues: Entity.cs - VisitEntityImpl() - This is calling
Children
twice instead of just once, doubling your allocations. This is why the initial relationship between allocations for Modal A and Model B is non-linear with respect to number of entities. That function also uses aforeach
to iterate over the children which is allocating small amounts on each iteration. Which with 96k entities adds up.RemoteRenderingAPI.cs - Children This is using Linq Select, again this allocates small amounts every iteration and really adds up.
Even with these improvements, the memory allocation here seems excessive given what the VisitEntity on the surface looks like it should be doing.
During this investigation, we found a few other places in our code which are making use of
Children
to traverse the hierarhcy, not really considering that every timeChildren
is called it's allocating.Presumably, the Entity hierarchy is in memory somewhere in unmanaged native code? Would it be feasible to implement non-allocating access to Entity hierarchy?