Azure / azure-remote-rendering

SDK and samples for Azure Remote Rendering
MIT License
106 stars 38 forks source link

Excessive memory usage when traversing Entities #100

Closed WikkidEdd closed 1 year ago

WikkidEdd commented 1 year ago

One of our users shared a model with us which was causing them some problems. While debugging it noticed there was large performance spike when the model first loads. It caused the HoloLens display to go blank for 1-2 seconds.

Upon profiling, we noticed that it was caused by us searching specially named nodes in the hierarchy. We noticed that it was allocating 34mb just to look at the names of each entity. We based our logic on the showcases app's FindAllByPattern Entity extension method which uses VisitEntity.

I profiled a simple usage VisitEntity which doesn't do anything other than count the number of entities it's visited.

entity.VisitEntity((Remote.Entity child) =>
{
    searchCount++;
    return Entity.VisitorResult.ContinueVisit;
});

We've not been using quite so complex models in our testing before, so weren't aware of which how much was being allocated, but for a complex model it is a lot.

Model A - 96756 entities - 26.4mb Model B - 26865 entities - 3.7mb

After some deep profiling and some fixups I've managed to get it down to:

Model A - 9.6mb Model B - 2.7mb

There were two main issues: Entity.cs - VisitEntityImpl() - This is calling Children twice instead of just once, doubling your allocations. This is why the initial relationship between allocations for Modal A and Model B is non-linear with respect to number of entities. That function also uses a foreach to iterate over the children which is allocating small amounts on each iteration. Which with 96k entities adds up.

RemoteRenderingAPI.cs - Children This is using Linq Select, again this allocates small amounts every iteration and really adds up.

Even with these improvements, the memory allocation here seems excessive given what the VisitEntity on the surface looks like it should be doing.

During this investigation, we found a few other places in our code which are making use of Children to traverse the hierarhcy, not really considering that every time Children is called it's allocating.

Presumably, the Entity hierarchy is in memory somewhere in unmanaged native code? Would it be feasible to implement non-allocating access to Entity hierarchy?

FlorianBorn71 commented 1 year ago

Hi @WikkidEdd, beside the low-hanging fruits you mentioned (and which we will look into!) this is unfortunately an inherent problem of the C# API. So I'm wondering if we could also cover use cases with traversal that have full native implementations under the hood. For that, we'd need to know a bit more about your use case, because as soon as we pass an Entity to the visitor impl, there have been allocations beforehand. What entity properties do you need to look into in the visitor? The name?

FlorianBorn71 commented 1 year ago

Would you mind sharing your improved code here? Then we can compare it against our own upcoming improvements to make sure we don't miss anything. Cheers!

WikkidEdd commented 1 year ago

I'll contact you via a different channel to discuss our exact use cases.

Here are the modified files. It only fixes the immediate cases where we have direct problems, but I can see that the FindFirstEntity also has the same problems as VisitEntityImpl MemoryImprovements.zip

WikkidEdd commented 1 year ago

Sorry, the RemoteRenderingAPI.cs was missing the using System.Linq; I'd removed it so I could easily see what other things were making use of Linq. There are a few other potentially tight loop areas where it is in use that you might want to clean up.

FlorianBorn71 commented 1 year ago

Quick question, did you see any improvement when using 'for' as opposed to 'foreach'? It has been a long tradition that foreach had allocation penalty, but that should be resolved for a while?

Furthermore, what would be your use case for a (much) faster visitor function? Find node by name? Find node with component?

WikkidEdd commented 1 year ago

I had heard the same with regards to foreach allocations being resolved, but I'm not 100% what Unity version/scripting backend that is related to.

I was doing most of the profiling in Editor because it's faster to iterate, but it was making a difference there. It's possible that when built to IL2CPP it doesn't have the same effect, but high level the memory allocation we pretty much the same in Editor and on device.

In the deep profiler it was allocating when calling GetEnumerator() from the children array.

FlorianBorn71 commented 1 year ago

Hi @WikkidEdd, we have added your short-term improvements to the SDK that improves the scene graph traversal significantly.

More optimizations are possible, but they'd require some invasive (and also API breaking) changes. It's in our backlog now, but I figure I'm closing this ticket for now.