KhronosGroup / MoltenVK

MoltenVK is a Vulkan Portability implementation. It layers a subset of the high-performance, industry-standard Vulkan graphics and compute API over Apple's Metal graphics framework, enabling Vulkan applications to run on macOS, iOS and tvOS.
Apache License 2.0
4.63k stars 402 forks source link

On macOS Apple Silicon, avoid managed-memory textures, and resource syncs. #2217

Closed billhollings closed 2 months ago

billhollings commented 2 months ago

Like their iOS/tvOS counterparts, macOS Apple Silicon GPUs support using Shared memory for textures, and do not require resource synchronization, even with Managed memory. This change treats macOS Apple Silicon the same as iOS & tvOS.

billhollings commented 2 months ago

The reason that we can avoid Managed memory is that the Apple Silicon Mac GPUs use a shared memory architecture. However, I believe that some MacBooks with Intel CPUs also use this architecture and have no dedicated GPU. Therefore, I wonder if this change should not only be for Apple Silicon, but for any GPU with a SMA, at least for buffers?

MTLStorageModeShared This is the default storage mode for MTLBuffer instances on integrated GPUs and both MTLBuffer and MTLTexture instances on Apple silicon GPUs. On non-Apple family GPUs, the shared storage mode isn’t available for MTLTexture instances.

I did investigate this a bit. As mentioned in the MTLStorageModeShared spec quote, Intel GPUs do indeed reject attempts to create a MTLTexture from Shared memory, but also behave as if the sync functions are not needed. That's why I used getHasUnifiedMemory() in MVKImage::copyImageToMemory() as part of VK_EXT_host_image_copy.

So, to cover both Apple & older Intel GPUs, we'd have to have a mix of isAppleGPU() and getHasUnifiedMemory() queries, and I didn't want to go to that much back and forth effort testing on older integrated GPUs, particularly since a discrete GPU would be the preference for gaming. It felt like a small use-case. I also only have one Intel machine, and I wasn't confident it would apply to all Intel GPUs.

I've got nothing against eventually getting there, but my goal here is primarily to clean things up for Apple Silicon behaviour, since that's the path of the future.

Also, does this bring any noticeable performance benefits?

I haven't tried running any benchmarks on this. Probably not a huge performance benefit, because I should think the Apple Silicon sync methods must effectively be no-ops. But it can't hurt, and now we treat all Apple Silicon platforms the same.

billhollings commented 2 months ago

I've implemented all the recommendations, including: