KhronosGroup / MoltenVK

MoltenVK is a Vulkan Portability implementation. It layers a subset of the high-performance, industry-standard Vulkan graphics and compute API over Apple's Metal graphics framework, enabling Vulkan applications to run on macOS, iOS and tvOS.
Apache License 2.0
4.86k stars 429 forks source link

FFXIV fails to run on Mac with > 100GB RAM #1650

Open cbackas42 opened 2 years ago

cbackas42 commented 2 years ago

This is a fairly weird issue, so please bear with me. Final Fantasy XIV exhibits the following behavior:

This doesn't seem like it's MVK's fault per-se, but we're looking for solutions to work around it. Maybe a build setting or ENV setting to limit the max size of surfaces - the thinking is if we can induce the failure we see from lower-RAM machines the game would work on the higher-RAM machines, even if the defect is in the game itself.

Can you suggest places in the MVK code base I could target to try out hacks? I have actual hardware to test on.

billhollings commented 2 years ago

I have to say...this is pretty bizarre. 😲

Aside from the fundamental question of why the heck this is happening in a game...from my calculations, the math doesn't seem to make sense. A surface of 16K x 16k x 2k is 512 Gigapixels. You don't indicate which surface format you're using, but any format has got to be way beyond even your 128GB memory availability too.

Typically, the surface itself is created from an existing CAMetalLayer (or more accurately one per swapchain image). So this allocation will be happening outside MoltenVK anyway. Plus it's hard to see how that could be a 3D texture.

Can you get a handle on where this initial giant surface is coming from? Surely it's a bug in the game itself, or something?

cbackas42 commented 2 years ago

Yup. Pretty bizarre is about right!

I agree the math doesn't make a ton of sense. All I know from debugging the shader timeout was that those are the dimensions of the surface it was operating on at the time, and also that I can see ~80GB of RAM become "Wired" by the metal driver around this time.

I'm not super familiar with how MVK works, in this case it's DXVK->MVK so I assumed all the "Metal" allocations would happen at the MVK layer. You're saying it may in fact be DXVK creating the surface?

I agree it's probably a game bug. Or, perhaps they're doing something that semantically makes SOME sense in DirectX and doesn't cause giant allocations, but one of the translation layers has a different interpretation. I'm not sure at this point. As a first step I was looking for a spot I could try to add a hack to cause the allocation to fail like it does on the lower-spec machines so that the game could launch.

billhollings commented 2 years ago

You could try inserting a check for an unreasonable value of pCreateInfo->imageExtent prior to this line:

https://github.com/KhronosGroup/MoltenVK/blob/60b2ae51ddb87617beb6d8cb7fac11e1daed763e/MoltenVK/MoltenVK/GPUObjects/MVKSwapchain.mm#L345

and set something like:

setConfigurationResult(reportError(VK_ERROR_OUT_OF_HOST_MEMORY, "vkCreateSwapchainKHR(): Swapchain surface size (%d, %d) is unsupportably large.", pCreateInfo-> imageExtent.width, pCreateInfo-> imageExtent.height));
js6i commented 2 years ago

The issue is that when FFXIV draws without render targets, wined3d, and I believe DXVK, create a render pass with maximum possible render area and framebuffer sizes, with the understanding that actual pixel grid will be determined by the viewport. Apple Silicon GPU's, due to their tiled architecture, nevertheless need to preallocate large amounts of tile memory in this case. For CrossOver, we worked around this by trimming the render target size in the render pass to what is required by the current viewport. It seemed to make more sense to do it at the level of D3D, but I suppose MoltenVK could implement a similar workaround.

cbackas42 commented 2 years ago

The issue is that when FFXIV draws without render targets, wined3d, and I believe DXVK, create a render pass with maximum possible render area and framebuffer sizes, with the understanding that actual pixel grid will be determined by the viewport. Apple Silicon GPU's, due to their tiled architecture, nevertheless need to preallocate large amounts of tile memory in this case. For CrossOver, we worked around this by trimming the render target size in the render pass to what is required by the current viewport. It seemed to make more sense to do it at the level of D3D, but I suppose MoltenVK could implement a similar workaround.

The is interesting information! If the fix is better done in DXVK that's fine; but, what version of CrossOver works around this? I ask because the Square Enix "official" launcher has the same bug present, so presumably not that one. Is it in recent CrossOver releases? Is the patch available or submitted upstream?

js6i commented 2 years ago

Sorry, it's very recent and not officially shipped anywhere yet. It's to be included in the next CrossOver release. Probably too hacky to go to upstream, in the current form anyway.

cbackas42 commented 2 years ago

Sounds good, I look forward to that change appearing at some point!

I tried @billhollings suggestion above with selectively called setConfigurationResult() with an error, but it just resulted in "A DirectX Error has occurred". It's unclear how this error gets handled gracefully on lower spec machines in the first place, but I suspect it's a very narrowly targeted hack someplace.

I wonder though if someone could suggest a hack for MVK that's along the lines of what @js6i is suggesting. Because, if you could do the same thing in MVK it should solve this problem for both WineD3D and DXVK at once.

cdavis5e commented 2 years ago

[redacted] That sounds more like a Wine/CrossOver issue. Have you contacted CodeWeavers support? (UPDATE: Edited to redact deactivated username at user's request.)

cbackas42 commented 1 year ago

Update here; for our purposes in FFXIV we were able to fix this with a small hack to DXVK, the diff of which can be found here: https://github.com/Gcenx/DXVK-macOS/pull/3

The hack itself was "inspired" by a similar change CX had done in WineD3D for the same reason. It's unclear to me whether this indicates something MVK could do on behalf of its clients so they didn't all need a change - since as an outsider at least it seems like "This is needed only for Metal devices" should land more in MVK's court. But it's very likely I don't know what I'm talking about either so I'll leave whether to close this or not to you folks who know the stacks better.

billhollings commented 1 year ago

Thinking about how this could be handled in MoltenVK...

@js6i

The issue is that when FFXIV draws without render targets, wined3d, and I believe DXVK, create a render pass with maximum possible render area and framebuffer sizes, with the understanding that actual pixel grid will be determined by the viewport. Apple Silicon GPU's, due to their tiled architecture, nevertheless need to preallocate large amounts of tile memory in this case. For CrossOver, we worked around this by trimming the render target size in the render pass to what is required by the current viewport. It seemed to make more sense to do it at the level of D3D, but I suppose MoltenVK could implement a similar workaround.

  1. To do this, are you trimming the render target size in vkCmdBeginRenderPass()? If not, how then?
  2. Under what conditions are you doing this? When the frame buffer has no attachments?
  3. How is the viewport size known at trimming time? Pipelines and viewports can be bound inside a renderpass.

@cbackas42

for our purposes in FFXIV we were able to fix this with a small hack to DXVK

The fix you identify sets VkFramebufferCreateInfo::layers to 1 when there are no attachments. MoltenVK could certainly do this logic internally, but presumably this will still allocate a significant chunk of unused memory to cover one very large texture of only one layer (as opposed to I guess the 2048 layers above).

Gcenx commented 1 year ago

@billhollings you can view there hack via my mirror see wined3d/context_vk.c

billhollings commented 1 year ago

you can view there hack via my mirror

Thanks. A similar approach to @cbackas42, but also limiting render area.

The concern I have about putting this into MoltenVK would be the question I have above about generically trimming to whatever viewport is active at the time the render pass begins. It's possible no that no pipeline or viewport is established at the time this decision is taken, or that a different viewport will be established later in the renderpass. Ordering renderpass, pipelines, and viewports is something the app (or DXVK) might have control over, but MoltenVK doesn't.

However...I wonder if MoltenVK could use attachment status to control setting the values of MTLRenderPassDescriptor properties renderTargetWidth, renderTargetHeight, and renderTargetArrayLength, which I assume may be what Metal uses to allocate the tile memory cache.

cbackas42 commented 1 year ago

but presumably this will still allocate a significant chunk of unused memory to cover one very large texture of only one layer (as opposed to I guess the 2048 layers above).

Yes, this is very likely true. I'm not presenting our fix as "the best" or even "correct", just what worked for us for our target game. None of us are very familiar with the graphics APIs here - we changed the layer count because it was easily accessed and we couldn't figure out how to determine the viewport size from that spot to more closely mimic the CX hack.

But I wouldn't present this as a good "fix" to upstream or anything without know for certain that there wouldn't be a valid application of > 1 layer without attachments. I rather suspect our fix would break some programs.

But the root idea of course is that "default to the largest possible size" isn't especially safe on Apple Silicon because it might ACTUALLY try to allocate that!

billhollings commented 1 year ago

However...I wonder if MoltenVK could use attachment status to control setting the values of MTLRenderPassDescriptor properties renderTargetWidth, renderTargetHeight, and renderTargetArrayLength, which I assume may be what Metal uses to allocate the tile memory cache.

@cbackas42 @js6i @Gcenx

With that in mind, can someone try applying the following small patch to MoltenVK to see if it fixes the tiling memory over-allocation, please?

no_preset_render_size.patch.zip

The patch is just commenting out the following 4 lines in MVKCommandBuffer.mm, so that Metal is not pre-warned how big the rendering area is. I'm hoping this changes how it allocates tiling memory. If it works, I'll modify it to only do so when there are no attachments in the framebuffer.

https://github.com/KhronosGroup/MoltenVK/blob/a307b24001b0b4cc36ca3fcdb28dca135e95c280/MoltenVK/MoltenVK/Commands/MVKCommandBuffer.mm#L526-L528

https://github.com/KhronosGroup/MoltenVK/blob/a307b24001b0b4cc36ca3fcdb28dca135e95c280/MoltenVK/MoltenVK/Commands/MVKCommandBuffer.mm#L552

cbackas42 commented 1 year ago

Thanks Bill; I went back to an older release of XIV On Mac with a DXVK prior to when we'd applied our patch, and dropped in an MVK built with your patches. It appears to be just as effective, at least for FFXIV. The game launches and works without issue, no giant memory allocation or kernel panic!

billhollings commented 1 year ago

This is now fixed in PR #1797. Please retest using latest MoltenVK, and close this issue if it fixes the problem.

cbackas42 commented 1 year ago

Tested on a Studio Ultra 128GB on the unpatched DXVK; the problem appears to be solved! Thank you so much!

billhollings commented 1 year ago

@cbackas42

Thanks for testing. However, it turns out that the fix in #1797 causes the frag shader not to run.

I'm curious. What is going on in FFXIV On Mac during these attachment-free renders, and why does the applied fix not cause behavioral issues if the frag shader is not being run?

cbackas42 commented 1 year ago

Oh no! It was too good to be true...

The game is doing early startup, with no fixes in place it causes the giant allocation prior to drawing anything at all, even before the first studio logo appears. If you have say, DXVK overlays on you do see the very first frame of that, so I believe it dies either during or after the very first frame drawn period. My assumption has always been that it actually configures this thing later, just lazily but I'm off into wild guesses at that point.

Gcenx commented 1 year ago

However...I wonder if MoltenVK could use attachment status to control setting the values of MTLRenderPassDescriptor properties renderTargetWidth, renderTargetHeight, and renderTargetArrayLength, which I assume may be what Metal uses to allocate the tile memory cache.

@cbackas42 @js6i @Gcenx

With that in mind, can someone try applying the following small patch to MoltenVK to see if it fixes the tiling memory over-allocation, please?

no_preset_render_size.patch.zip

The patch is just commenting out the following 4 lines in MVKCommandBuffer.mm, so that Metal is not pre-warned how big the rendering area is. I'm hoping this changes how it allocates tiling memory. If it works, I'll modify it to only do so when there are no attachments in the framebuffer.

https://github.com/KhronosGroup/MoltenVK/blob/a307b24001b0b4cc36ca3fcdb28dca135e95c280/MoltenVK/MoltenVK/Commands/MVKCommandBuffer.mm#L526-L528

https://github.com/KhronosGroup/MoltenVK/blob/a307b24001b0b4cc36ca3fcdb28dca135e95c280/MoltenVK/MoltenVK/Commands/MVKCommandBuffer.mm#L552

@billhollings @cdavis5e & @js6i does the issue also happen if this attached change is applied instead of #1797

billhollings commented 1 year ago

does the issue also happen if this attached change is applied instead of #1797

Unfortunately, yes, the issue will happen. Both are actually the same fix. #1797 is just an industrial version of that patch (applying it only if there are no attachments).

Gcenx commented 1 year ago

does the issue also happen if this attached change is applied instead of #1797

Unfortunately, yes, the issue will happen. Both are actually the same fix. #1797 is just an industrial version of that patch (applying it only if there are no attachments).

Ah I misunderstood nvm, I’ll remove the revert on the DXVK-macOS.

billhollings commented 1 year ago

@Gcenx

Ah I misunderstood nvm, I’ll remove the revert on the DXVK-macOS.

I've pushed PR #1802, which does some more sophisticated management of Metal renderpasses, and fixes the issue here in a way that always sets the Metal render area to something: the frame buffer render area, or the viewport, if there are no attachments. Since the viewport is not guaranteed until the draw call, we now defer creating the Metal renderpass until then (or until needed by other renderpass operations that involve drawing, like clearing attachments).

Can you test this again with the new code, and without the DXVK-macOS fix, and let me know the results, please?

cbackas42 commented 1 year ago

Sorry for the delay; I just tried the latest build of the current version of the PR, and it seems to work just as well on my 128GB Studio Ultra with FFXIV as the initial change did.

billhollings commented 1 year ago

Sorry for the delay; I just tried the latest build of the current version of the PR, and it seems to work just as well on my 128GB Studio Ultra with FFXIV as the initial change did.

Thanks for following up on that. And no worries about the pause.

To any reading this discussion, please note that after two attempts, MoltenVK does not have a fix for this. After some discussion, PR ##1797 has been reversed, as it breaks required behavior, and PR #1802 is on hold as WIP at this point, as it is a complicated solution to a problem that it seems should be better handled at the app or emulator level, as was done above.

billhollings commented 7 months ago

For the big boss here, can you delete my old account's posts here? [ghost] the images I posted have sensitive info, and when people type my name on google, they end up seeing these images...woops 😥

I've deleted the two ghost posts above.