KhronosGroup / MoltenVK

MoltenVK is a Vulkan Portability implementation. It layers a subset of the high-performance, industry-standard Vulkan graphics and compute API over Apple's Metal graphics framework, enabling Vulkan applications to run on macOS, iOS and tvOS.
Apache License 2.0
4.71k stars 409 forks source link

SPIR-V to MSL conversion error: Argument buffer resource base type could not be determined #2271

Open patrick-han opened 1 month ago

patrick-han commented 1 month ago
[mvk-error] SPIR-V to MSL conversion error: Argument buffer resource base type could not be determined. When padding argument buffer elements, all descriptor set resources must be supplied with a base type by the app.
[mvk-error] VK_ERROR_INVALID_SHADER_NV: Fragment shader function could not be compiled into pipeline. See previous logged error.

validation layer: VK_ERROR_INVALID_SHADER_NV: Fragment shader function could not be compiled into pipeline. See previous logged error.

This seems to have cropped up at various points before, but unsure if it's related. I have the latest VulkanSDK as of yesterday (1.3.283.0) so in theory I have the fix for these issues:

https://github.com/KhronosGroup/MoltenVK/issues/2216 https://github.com/KhronosGroup/MoltenVK/issues/2016

After enabling MVK_CONFIG_USE_METAL_ARGUMENT_BUFFERS programmatically I also disabled shader validation at the advice of one of the recent Vulkanised 2024 talks.

The specific descriptor indexing features I enabled are: descriptorBindingSampledImageUpdateAfterBind, descriptorBindingPartiallyBound, runtimeDescriptorArray with the appropriate pool/binding/layout flags: VK_DESCRIPTOR_POOL_CREATE_UPDATE_AFTER_BIND_BIT_EXT, VK_DESCRIPTOR_BINDING_PARTIALLY_BOUND_BIT_EXT, VK_DESCRIPTOR_BINDING_UPDATE_AFTER_BIND_BIT_EXT, VK_DESCRIPTOR_SET_LAYOUT_CREATE_UPDATE_AFTER_BIND_POOL_BIT_EXT

Although in my fragment shader I'm still just declaring my texture normally:

layout (set = 0, binding = 0) uniform texture2D texture0;
layout (set = 0, binding = 1) uniform sampler linearSampler;
billhollings commented 1 month ago

It's likely that PR #2260 and followups may have fixed this.

Please build from the latest MoltenVK, and try again. Or you can wait for the next SDK, which should be released in a couple of weeks.

With this updated code, MVK_CONFIG_USE_METAL_ARGUMENT_BUFFERS is enabled by default, so you no longer need to set it explicitly.

patrick-han commented 1 month ago

I just tried building the latest now, building the macOS library. Unfortunately I'm running into another error now with a crash + artifacts:

Execution of the command buffer was aborted due to an error during execution. Caused GPU Address Fault Error (0000000b:kIOGPUCommandBufferCallbackErrorPageFault)
[mvk-error] VK_ERROR_OUT_OF_DEVICE_MEMORY: MTLCommandBuffer "vkQueueSubmit MTLCommandBuffer on Queue 0-0" execution failed (code 3): Caused GPU Address Fault Error (0000000b:kIOGPUCommandBufferCallbackErrorPageFault)

validation layer: VK_ERROR_OUT_OF_DEVICE_MEMORY: MTLCommandBuffer "vkQueueSubmit MTLCommandBuffer on Queue 0-0" execution failed (code 3): Caused GPU Address Fault Error (0000000b:kIOGPUCommandBufferCallbackErrorPageFault)
Screenshot 2024-07-11 at 8 44 56 AM
patrick-han commented 1 month ago

It seems that reducing my descriptor counts to something very small (and therby my maxSets) avoids the crash:

    constexpr uint32_t maxBindlessResourceCount = 100; // Stops crashing if I change this to something like 25
    constexpr uint32_t maxSamplerCount = 2;

    std::array<VkDescriptorPoolSize, 2> bindlessDescriptorPoolSizes {{
        { VK_DESCRIPTOR_TYPE_SAMPLED_IMAGE, maxBindlessResourceCount},
        { VK_DESCRIPTOR_TYPE_SAMPLER, maxSamplerCount}
    }};
    VkDescriptorPoolCreateInfo poolCreateInfo = {
        .sType = VK_STRUCTURE_TYPE_DESCRIPTOR_POOL_CREATE_INFO,
        .pNext = nullptr,
        .flags = VK_DESCRIPTOR_POOL_CREATE_UPDATE_AFTER_BIND_BIT_EXT,
        .maxSets = maxBindlessResourceCount * static_cast<uint32_t>(bindlessDescriptorPoolSizes.size()),
        .poolSizeCount = static_cast<uint32_t>(bindlessDescriptorPoolSizes.size()),
        .pPoolSizes = bindlessDescriptorPoolSizes.data()
    };

Although this seems too small?

billhollings commented 1 month ago

This might be caused by the same problem as in issue #2246, which was fixed with PR #2273.

Please retest with the latest MoltenVK, and if that fixes it, close this issue?

patrick-han commented 1 month ago

Just pulled and built @ edbdcf054b2be9c84430f719ae99e78f9e845350

Still getting the same crash unfortunately

[mvk-error] VK_ERROR_OUT_OF_DEVICE_MEMORY: MTLCommandBuffer "vkQueueSubmit MTLCommandBuffer on Queue 0-0" execution failed (code 3): Caused GPU Address Fault Error (0000000b:kIOGPUCommandBufferCallbackErrorPageFault)
[VULKAN] Debug.h(18): validation layer: VK_ERROR_OUT_OF_DEVICE_MEMORY: MTLCommandBuffer "vkQueueSubmit MTLCommandBuffer on Queue 0-0" execution failed (code 3): Caused GPU Address Fault Error (0000000b:kIOGPUCommandBufferCallbackErrorPageFault)

Lowering maxBindlessResourceCount to something low again like 50 no longer triggers the crash, but the textures are incorrectly applied (even in the absence of validation errors):

Screenshot 2024-07-22 at 11 35 28 AM

I've verified everything renders correctly on a Windows system (Nvidia).

billhollings commented 1 month ago

Thanks for testing.

Can you test again with Metal validation enabled, and report any validation errors that are logged, or that trip an assertion, please? You can do this using the following environment variable:

METAL_DEVICE_WRAPPER_TYPE=1

To avoid the assertion, and just log all Metal validation errors, you can also add the following environment variables:

METAL_ERROR_MODE=3
METAL_DEBUG_ERROR_MODE=3
patrick-han commented 1 month ago

I am not super familiar with XCode, but since I am building+running through it, I added the env variables to my scheme like this (Hopefully this is correct):

Screenshot 2024-07-22 at 12 42 25 PM

And now my output looks like this when it breaks on vkQueuePresentKHR. The only additional info seems to be the second entry didCompleteWithStartTime:endTime:

FAULT: <NSRemoteView: 0x13b76d780 com.apple.TextInputUI.xpc.CursorUIViewService TUICursorUIViewService> determined it was necessary to configure <TUINSWindow: 0x13d0781e0> to support remote view vibrancy
CLIENT ERROR: TUINSRemoteViewController does not override -viewServiceDidTerminateWithError: and thus cannot react to catastrophic errors beyond logging them

-[_MTLCommandBuffer didCompleteWithStartTime:endTime:error:], line 1047: error 'Execution of the command buffer was aborted due to an error during execution. Caused GPU Address Fault Error (0000000b:kIOGPUCommandBufferCallbackErrorPageFault)'

Execution of the command buffer was aborted due to an error during execution. Caused GPU Address Fault Error (0000000b:kIOGPUCommandBufferCallbackErrorPageFault)

[mvk-error] VK_ERROR_OUT_OF_DEVICE_MEMORY: MTLCommandBuffer "vkQueueSubmit MTLCommandBuffer on Queue 0-0" execution failed (code 3): Caused GPU Address Fault Error (0000000b:kIOGPUCommandBufferCallbackErrorPageFault)

[VULKAN] Debug.h(18): validation layer: VK_ERROR_OUT_OF_DEVICE_MEMORY: MTLCommandBuffer "vkQueueSubmit MTLCommandBuffer on Queue 0-0" execution failed (code 3): Caused GPU Address Fault Error (0000000b:kIOGPUCommandBufferCallbackErrorPageFault)

Not sure if relevant but also wanted to add the ohter extensions I am using: VK_KHR_dynamic_rendering,VK_KHR_buffer_device_address,VK_EXT_scalar_block_layout

patrick-han commented 1 month ago

Ah I forgot I am using VK_STRUCTURE_TYPE_DESCRIPTOR_SET_VARIABLE_DESCRIPTOR_COUNT_ALLOCATE_INFO_EXT as well, just wanted to add that. Another issue seems to be the same problem, and this seems to be the common thread: https://github.com/KhronosGroup/MoltenVK/issues/2278