bryanedds / Nu

Repository hosting the open-source Nu Game Engine and related projects.
MIT License
1.15k stars 156 forks source link

Potential Performance Issue Tracking #177

Open bryanedds opened 7 years ago

bryanedds commented 7 years ago

Current potential performance issues in Nu, in no particular order -

Potential Issue - Event handlers in a dictionary are slower than handlers on the subscribed object a la C#. This means a look up for every publish. However, this is an artifact of a publisher-neutral event system rather than anything related to FP.

Possible Solution - A lot of optimization is already done to avoid publish calls that won't have a useful effect. Beyond these, I have yet to think of further solutions.

Potential Issue - Farseer Physics Engine doesn't scale to 1000s of interacting bodies - https://github.com/VelcroPhysics/VelcroPhysics/issues/29

Possible Solution - Presumably Farseer could be replaced with a much faster 2D physics lib, perhaps one written in C or C++. Of course, the question then becomes about the overhead of the required marshalling.

Potential Issue - The string hashing required for each Xtension property look-up is suboptimal.

Possible Solution - Not many practical ones. This issue wouldn't exist if .NET lazy-cached hashes in strings, but there's no reason to believe it ever will. At one point I used an alternative type to string called 'Lun' (later called 'Name') which contained a string and its lazily-computed hash, but it wasn't very friendly to use. I decided to get rid of it in favor of .NET strings to simplify Nu's API. I'm pretty sure this was the right decision, but I can't prove it one way or another without making large speculative changes to the engine.

Update - Now that F# finally has implicit ctors, reintroducing the Name type shouldn't cause as many changes as it previously would have. This might now be a practical experiment to run.

Potential Issue - LOH threshold is perhaps too small.

Possible Solution - Upgrading to >= .NET 4.8 will allow us to configure it via GCLOHThreshold - https://docs.microsoft.com/en-us/dotnet/framework/configure-apps/file-schema/runtime/gclohthreshold-element

Update - I've tried increasing LOH, but I cannot observe it having any effect. It's like my attempt is being ignored by the runtime.

Potential Issue - Potentially a lot of events when a subscribed entity transforms - https://github.com/bryanedds/Nu/blob/0b63f406ba9dc755ab0e8046b455dbd0d5dfb998/Nu/Nu/World/WorldModuleEntity.fs#L177-L227

Possible Solution - Probably nothing great. Could selectively disable a chunk of transform events depending on the application. Not real sure what to do here other than assess that this is part of the cost of doing business declaratively.

Potential Issue - Synchronizing entity properties via World.setEntityPropertyFast requires a small and likely cache-local dictionary look-up via WorldModuleEntity.EntitySetters, which is surprisingly fast.

Possible Solution - A faster alternative might be hard-coding a duplicate of the EntitySetters table in a match expression or using a loftier technique such as code generation in the MVU implementation.

Potential Issue - Nu Text rendering might be quite inefficient due to not caching target render buffers. IIRC, render buffers use for text are allocated and deallocated on a one-off basis. I do not see how that could possibly scale well.

Possible Solution - Code it properly. :)

Potential Issue - Only seems to cause a couple small hiccups at the start of programs, but currently .NET GC compaction is not yet parallelized and therefore can cause stalls while it does its thing. This doesn't seem to happen once Nu programs hit their steady state after a couple seconds. Fortunately, according to the .NET team, it appears that parallel compacting is being implemented.

Possible Solution - Wait for parallel compacting GC to ship (.NET 9?). Otherwise, issue GC.Collect between scenes if needed to.

Update - On .NET 9 now and it seems like it has helped with the issue. However, we need to do conrete measurements to make sure.

Potential Issue - Setting physics properties after creating an entity, such as is done by the MMCC initializers, can cause a lot of body recreation inside the physics engines due to the way that RigidBodyFacet's property change handlers work.

Possible Solution - Instead of recreating the physics bodies, create addition body property synchronization messages to make body recreation less often necessary.

bryanedds commented 7 years ago

On writing a compiler for the AMSL -

This is a large task, even for partial compilation.

I estimate 400 hours worth of work if I were to do it myself.

It would probably a fair bit longer for someone who doesn't have as much knowledge about the interpreter's implementation.

bryanedds commented 5 years ago

I read somewhere that, due to security checks, it's significantly slower to get / set a property with reflection than to get / set its backing field. So if we can get / set the backing fields directly, that could speed up serialization.

bryanedds commented 5 years ago

I'm currently working on putting the main subsystem processing on separate threads. If this works well, it should at least double performance.

bryanedds commented 5 years ago

I've managed to get rendering and audio onto separate threads, but not physics. Putting physics on a separate thread may play hell with certain semantic guarantees that are highly desirable. Additionally, I'm not sure if Farseer was even built for this. since I don't know if we can do raycasts and such while the it is integrating.

Maybe better is a physics engine that internally threads itself across cores. Unfortunately, I can't find a .NET wrapper for Box2D, which I think would do this.

bryanedds commented 5 years ago

I just found out that threading does not work with the out-of-box SDL renderer, so I have to temporarily put the rendering and audio code back on the main thread. The only way to get rendering on another thread is to write an OpenGL renderer from scratch, which I don't immediately have time for.

bryanedds commented 4 years ago

Today I attempted to utilize WeakReference in ComponentRef to lighten the load on the GC's scan process. This was not a good idea since it crushed performance. Apparently there is more than enough compute in WeakReference.TryGetTarget to obliterate any potential gains from the hypothetical reduction in GC scan process. Bummer.

bryanedds commented 3 years ago

Another solution to the rendering performance problem is the use of SDL_gpu to do rendering. I will be playing with this possibility over the next week.

bryanedds commented 3 years ago

I was unable to utilize SDL_gpu due to this issue - https://github.com/grimfang4/sdl-gpu/issues/15#issuecomment-851590757

I don't know if the maintainer, @grimfang4 is aware of the issue tho?