KhronosGroup / MoltenVK

MoltenVK is a Vulkan Portability implementation. It layers a subset of the high-performance, industry-standard Vulkan graphics and compute API over Apple's Metal graphics framework, enabling Vulkan applications to run on macOS, iOS and tvOS.
Apache License 2.0
4.8k stars 424 forks source link

Performance regression introduced in either 1.1.3 or 1.1.4 #1646

Closed pizuz closed 2 years ago

pizuz commented 2 years ago

Hi,

I noticed a quite substantial performance regression starting either in 1.1.3 or 1.1.4 when using MVK in the Dolphin emulator. v1.1.3 refuses to load, therefore there‘s a lot more to bisect. Unfortunately, the CI artifacts from back then have expired already. Do you keep them around somewhere? Building every single PR myself is possible, but quite a pain.

My setup: macOS 10.15.7 on a Late 2013 iMac (i7 with a nVidia GeForce GT 750M)

Regards, Pizuz

billhollings commented 2 years ago

Unfortunately, the CI artifacts from back then have expired already. Do you keep them around somewhere?

Unfortunately, we don't separately archive the CI artifacts from each PR. 🙁

pizuz commented 2 years ago

That‘s a shame. I will try to bisect myself, but it will probably take me a while because of some time constraints on my end. The regression was definitely introduced somewhere inbetween those versions.

TellowKrinkle commented 2 years ago

There's a good chance this is the same issue reported in #1628, which looks like it was introduced in 4371ef4d2b706d761acac49c1b7f9d413d0d15db, conveniently between 1.1.2 and 1.1.3

billhollings commented 2 years ago

There's a good chance this is the same issue reported in #1628, which looks like it was introduced in 4371ef4, conveniently between 1.1.2 and 1.1.3

PR #1676, which fixes #1628, may improve this. Please retest with latest MoltenVK and close this issue if performance is improved.

pizuz commented 2 years ago

I'm seeing some performance benefit, but it is still quite a bit slower than MVK 1.1.2. Guess I have to live with that. Thanks for the work.

billhollings commented 2 years ago

still quite a bit slower than MVK 1.1.2

If you can get further info on where is it slowing down, we can try to address it.

pizuz commented 2 years ago

I‘ll try to profile it over the weekend.

pizuz commented 2 years ago

I made a few performance traces over at the Dolphin thread, if that helps narrowing it down. Looks like the remaining performance hit is less than I initially thought.

https://github.com/dolphin-emu/dolphin/pull/9981#issuecomment-1214378319

TellowKrinkle commented 2 years ago

The traces posted look very similar to an issue I was having with MVK on Nvidia with PCSX2

Looks like the issue was related to the semaphore emulation that's used by default on Nvidia now (which AIUI is due to an Nvidia driver bug and required for correctness). Setting MVK_ALLOW_METAL_FENCES brings the speed back to normal, though a number of previous MVK commits have changed the speed through the history since 1.1.2.

Test: Run NFSCarbon.gs.xz.zip with PCSX2 v1.7.3212. Set blending emulation to minimum in the graphics settings to ensure a CPU bottleneck. (You'll have to unzip the file, but PCSX2 can read .gs.xz directly. For Qt, drag the file onto the main window. For wx, place it in ~/Library/Application Support/PCSX2/snaps and open the GS debugger from the debug menu.)

1.1.2: 65fps 4371ef4d2b706d761acac49c1b7f9d413d0d15db: 75fps 2ef21c65bf940d82577453ab24d08ddaef49cfe9: 85fps Somewhere between f8280bca8933ecc839a4e35ba904d35a70430962 and aa89f845a994f71491f2713e1a0137317d465fbc: 75fps After the commit that switched on semaphore emulation on Nvidia: 60fps but 75fps with MVK_ALLOW_METAL_FENCES enabled

We're still faster than 1.1.2 (and any single release for that matter, as the improvement and drop are both between 1.1.2 and 1.1.3), so not a huge deal, but there definitely was a time that was faster than now.