On macOS Apple Silicon, avoid managed-memory textures, and resource syncs.

billhollings commented 2 months ago

Like their iOS/tvOS counterparts, macOS Apple Silicon GPUs support using Shared memory for textures, and do not require resource synchronization, even with Managed memory. This change treats macOS Apple Silicon the same as iOS & tvOS.

MVKPhysicalDevice add _hasUnifiedMemory & _isAppleGPU flags.
MVKDeviceTrackingMixin add isUnifiedMemoryGPU() & isAppleGPU().
Do not advertise host-visible-but-not-host-coherent Vulkan memory type on macOS Apple Silicon.
Replace mvkMTLStorageModeFromVkMemoryPropertyFlags() with MVKPhysicalDevice::getMTLStorageModeFromVkMemoryPropertyFlags(), and return Shared instead of Managed for Apple Silicon, even if coherency is not requested.
On unified memory devices, avoid needless calls to didModifyRange:, synchronizeResource:, and synchronizeTexture:slice:level:.

billhollings commented 2 months ago

The reason that we can avoid Managed memory is that the Apple Silicon Mac GPUs use a shared memory architecture. However, I believe that some MacBooks with Intel CPUs also use this architecture and have no dedicated GPU. Therefore, I wonder if this change should not only be for Apple Silicon, but for any GPU with a SMA, at least for buffers?

MTLStorageModeShared This is the default storage mode for MTLBuffer instances on integrated GPUs and both MTLBuffer and MTLTexture instances on Apple silicon GPUs. On non-Apple family GPUs, the shared storage mode isn’t available for MTLTexture instances.

I did investigate this a bit. As mentioned in the MTLStorageModeShared spec quote, Intel GPUs do indeed reject attempts to create a MTLTexture from Shared memory, but also behave as if the sync functions are not needed. That's why I used getHasUnifiedMemory() in MVKImage::copyImageToMemory() as part of VK_EXT_host_image_copy.

So, to cover both Apple & older Intel GPUs, we'd have to have a mix of isAppleGPU() and getHasUnifiedMemory() queries, and I didn't want to go to that much back and forth effort testing on older integrated GPUs, particularly since a discrete GPU would be the preference for gaming. It felt like a small use-case. I also only have one Intel machine, and I wasn't confident it would apply to all Intel GPUs.

I've got nothing against eventually getting there, but my goal here is primarily to clean things up for Apple Silicon behaviour, since that's the path of the future.

Also, does this bring any noticeable performance benefits?

I haven't tried running any benchmarks on this. Probably not a huge performance benefit, because I should think the Apple Silicon sync methods must effectively be no-ops. But it can't hurt, and now we treat all Apple Silicon platforms the same.

billhollings commented 2 months ago

I've implemented all the recommendations, including:

@cdavis5e recommendation to remove the dead code on iOS/tvOS.
@spnda recommendation to implement a mix of testing for Apple GPU to determine when to use Managed Memory, and testing for Unified Memory to determine when to sync GPU to CPU.

KhronosGroup / MoltenVK

On macOS Apple Silicon, avoid managed-memory textures, and resource syncs. #2217