H-uru / Plasma

Cyan Worlds's Plasma game engine
http://h-uru.github.io/Plasma/
GNU General Public License v3.0
202 stars 80 forks source link

Mac client can stall entire game loop #1547

Open colincornaby opened 6 months ago

colincornaby commented 6 months ago

This is one more follow up to the MSAA performance issues that were found in testing the Mac client. The Mac was enabling 8x MSAA on older hardware that only supports 4x on the Windows side in Plasma.

It was noticed during that investigation that resource loading was extremely slow. Resource loading shares a the main render thread. So if the rendering gets stalled, so will resource loading.

It seems like there are a few issues at play. Metal requires explicit management of display refresh sync and front/back buffers - so there are some extra challenges here. Some of these issues also apply to OpenGL on the Mac or possibly other platforms.

Metal stalls when it runs out of buffers

Metal will performing rendering on a secondary thread once all the render commands are encoded. However - if Metal runs out of buffers, it will wait up to a second for a buffer to become available.

The timeout does not seem deeply configured. It can only be turned off. But if the timeout is turned off - Metal will wait for another framebuffer forever.

https://developer.apple.com/documentation/quartzcore/cametallayer/2887086-allowsnextdrawabletimeout?language=objc

If the GPU is becoming overwhelmed (i.e. because of MSAA 8x antialiasing) this will cause the CPU to stall.

We might need a new way to do framebuffer swaps that avoids the stall. We can access the next framebuffer on a secondary thread - so we could force only the secondary thread to stall. However framebuffer requests cannot be cancelled - so this would need to be done carefully. Done badly this could cause a secondary starvation by repeatedly requesting frames that never get used.

Game loop is vsynced

On macOS, in all renderers, the game loop itself is vsynced. This means if rendering targets are missed, macOS will begin servicing the game loop less.

There may need to be a secondary gate to allow the game loop to run continuously - but only enter the renderer's draw code when a vsync callback is returned by the system. The vsync timer is universal to all renderers on macOS and is not Metal specific.

I've done some initial research into how this is handled in the D3D9 pipeline. It looks like Windows might have similar issues. It looks like on Windows - vsync will stall the render present command until the next frame is ready. By my understanding - this would block the game loop until the next thread is ready. Plasma could be mitigating this - I have not studied the D3D pipeline deep enough to find if it is.

colincornaby commented 6 months ago

Digging into this more - it's possible that one is the solution for the other.

Apple suggests we feed our performance measurements back into the vsync timer. This will cause the vsync timer to self limit - and not try to call us at an FPS beyond what we are currently capable of. In turn - this should mitigate the problem of us causing a framebuffer underflow when the GPU is drowning in work. We should be only receiving draw events at a rate relative to which frame buffers can be supplied.

This still runs into the issue of the client not being able to pass vsync events into the renderer though. D3D regulates itself - so Plasma was never designed for platform/renderer agnostic vsync support.