OatmealDome / dolphin-ios

Dolphin for iOS, reborn
Other
277 stars 30 forks source link

Bounding box emulation broken on Metal backend #163

Open BalkoBalkho opened 1 week ago

BalkoBalkho commented 1 week ago

If you play paper mario games (I was playing super paper mario) with the metal backend it will crash when it starts using bounding box.

With the Vulkan backend it gives the unsupported bounding box error message the moment the metal backend crashes and the game glitches but the app doesnt crash.

So it seems the bounding box implementation for macOS doesnt work on ios.

Crash logs DolphiniOS-2024-09-21-175840.txt

TellowKrinkle commented 1 week ago

Crash is in spirv-cross, surely trying to complain about how it thinks some feature is not supported:

spirv_cross::report_and_abort(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const&) + 28
spirv_cross::CompilerMSL::emit_subgroup_op(spirv_cross::Instruction const&) + 2452

As it turns out, there are a lot of reasons it might complain, so it would be good to know which one of those we're hitting. Can anyone decompile the libdolphin.dylib in that build and see what error string is loaded right before the call at 2452 bytes past the start of CompilerMSL::emit_subgroup_op? (Or if you can run dolphin in a way where its logs actually go somewhere, it should print its complaint before crashing.)

Phone is iPhone12,1 → iPhone 11, A13, GPU family Apple6. It should support subgroup operations. Anyone trying to reproduce this should also use an A13 or newer, or you'll hit a different codepath that won't use subgroup ops.

Side note, we should probably set ios_use_simdgroup_functions on the SPIRV-Cross compiler options, apparently it defaults to some weird attempt at emulating them with quadgroups on iOS.

TellowKrinkle commented 1 week ago

Wait never mind it's hitting this one https://github.com/KhronosGroup/SPIRV-Cross/blob/65d7393430f6c7bb0c20b6d53250fe04847cc2ae/spirv_msl.cpp#L16094-L16099

We need to set ios_use_simdgroup_functions

BalkoBalkho commented 4 days ago

@TellowKrinkle I compiled with ios_use_simdgroup_functions set to true. Now it doesn't crash, it gives a shader compilation error and behaves similarly to moltenvk

BalkoBalkho commented 4 days ago

Oh it freezes so its worse than moltenvk

Hmmm, Im confused, does this gpu or the future iphone gpus support it?

TellowKrinkle commented 2 days ago

Internal compiler errors are the best If you're compiling for yourself can you go into MTLUtil.mm and remove the line that sets g_features.subgroup_ops = [device supportsFamily:MTLGPUFamilyMac2] || ...? Make sure to not remove the framebuffer fetch one below it. Then run again and tell me both if it freezes and if it gives the internal compiler error. (Note that that change is for the Metal renderer, so it won't affect Vulkan.)

TellowKrinkle commented 2 days ago

Hmmm, Im confused, does this gpu or the future iphone gpus support it?

It's supposed to support it but clearly something is buggy. Also the feature is completely optional, we just use it for a minor performance improvement, though IIRC Apple's compiler does the same optimization internally anyways. (Internal compiler error means Apple's compiler crashed while trying to compile those shaders. If you're up for it, I'd like to get a dedicated reproduction program to report this to Apple later so maybe they'll fix it in iOS 19.)

TellowKrinkle commented 1 day ago

Actually you know what? I bet the simd_min and simd_max we need are considered "reduction operations", rather than "permute operations", and therefore not supported by A13 (which is an Apple6 GPU in the table below).

Well we'll know for sure if removing the subgroup_ops line fixes the issue, but in that case there's probably no point reporting it to Apple, I've already reported that their documentation sucks at making it clear what operations are what (and that their compiler sucks at giving good error messages when it fails due to you attempting to use an unsupported feature).

screenshot of Apple's feature set tables showing that SIMD-scoped permute operations are supported by Apple6 GPUs but not SIMD-scoped reduction operations