Low performance on M1 Pro

KhronosGroup / MoltenVK

MoltenVK is a Vulkan Portability implementation. It layers a subset of the high-performance, industry-standard Vulkan graphics and compute API over Apple's Metal graphics framework, enabling Vulkan applications to run on macOS, iOS and tvOS.

Apache License 2.0

4.79k stars 423 forks source link

Low performance on M1 Pro #1471

Open ph1lm opened 2 years ago

ph1lm commented 2 years ago

I'm experiencing low performance when using vulkan renderer (molten-vk) in Quake3e on Mac. The original issue: https://github.com/ec-/Quake3e/issues/127

I've tested it on two different laptops Macbook Pro 14' (2021) (base model M1 Pro) and Macbook Pro 16' (2018) (Intel i7 Radeon Pro 555x 4Gb). Everything works well on Intel - both OpenGL and Vulkan (molten-vk) renders show comparable performance 700-800fps. On the other hand, OpenGL is almost 3x times faster than Vulkan (molten-vk) on M1: 1100-1200fps vs 400-420fps.

Increasing the number of swapchains (from 2 to 3) didn't help: https://github.com/ec-/Quake3e/issues/127#issuecomment-968325543

Env:

molten-vk: stable 1.1.5
sdl2: stable 2.0.16

KTRosenberg commented 2 years ago

Is this M1 Pro-specific, or does it happen on M1 as well?

Not that I have expertise, but in the meantime I thought I would comment: I am hazarding a guess that the problem has something to do with how M1 devices don't need managed buffers. Everything can (and possibly should) be in shared buffers since it's all in unified memory. If MoltenVK is unnecessarily synchronizing changed data using managed buffers in the backend using didModifyRange, then that could be a problem, in other words. Is anything special done for iOS devices? If you treat M1 as an iOS-like device (integrated) but with the macos featuresets, I wonder what will happen.

billhollings commented 2 years ago

If MoltenVK is unnecessarily synchronizing changed data using managed buffers in the backend using didModifyRange, then that could be a problem

This is an interesting idea.

Although managed memory is not physically available on Apple Silicon, it is available in the macOS Metal API, and any general Metal app designed to run across both Apple Silicon and non-Apple GPUs will generally use it.

So I would assume that the Apple M1 driver would no-op didModifyRange, since any macOS Metal app would generally use it to cover managed memory on non-Apple GPUs, but when running on Apple Silicon, the didModifyRange is hopefully not copying memory to itself unnecessarily.

However, if Apple is not being smart about that, then I suppose MoltenVK should take care of bypassing it. I am a little wary of breaking the Metal API contract, in case there is some internal state tracking that Apple does as part of didModifyRange, even if the actual memory copying is unnecessary.

KTRosenberg commented 2 years ago

Is there a contract?

I am curious too what Metal does for managed buffers on M1.

shared buffers ARE available on macos as shown here https://developer.apple.com/documentation/metal/mtlstoragemode/shared

(also even On macOS intel, it was also possible to create buffers using mmapped/virtual memory and wrap that with a MTLBuffer with a specific call)

but the default storage mode for textures and buffers is different between iOS and macOS.

Another thing to consider is that even if didModifyRange does nothing, I imagine MoltenVK itself needs to keep some state around and maybe do some synchronization for all I know just to use didModifyRange. Maybe profile to see whether this is a GPU issue or a CPU side issue in the MoltenVK layer.

I have a guess that Apple might support eGPUs again eventually, which would mean that they wouldn’t just flat remove managed buffers from macOS, so it’s definitely possible that didModifyRange has some overhead even on devices that don’t require it.