KhronosGroup / MoltenVK

MoltenVK is a Vulkan Portability implementation. It layers a subset of the high-performance, industry-standard Vulkan graphics and compute API over Apple's Metal graphics framework, enabling Vulkan applications to run on macOS, iOS and tvOS.
Apache License 2.0
4.74k stars 413 forks source link

Error using gl_SubgroupInvocationID in fragment shaders.. #629

Closed oscarbg closed 5 years ago

oscarbg commented 5 years ago

Hi, I was testing a simple but complete enough test of wave/subgroup ops to see if how well it plays.. basically using this sample (https://github.com/ConfettiFX/The-Forge/tree/master/Examples_3/Unit_Tests/src/14_WaveIntrinsics) but ported to a even simple sample as this sample has some issues running under Wine even on Linux.. but the interesting shader is anyway: https://github.com/ConfettiFX/The-Forge/blob/master/Examples_3/Unit_Tests/src/14_WaveIntrinsics/Shaders/Vulkan/wave.frag

the sample which runs on Linux under Wine fails to run on MacOS with interesting log:

[mvk-error] VK_ERROR_INITIALIZATION_FAILED: Shader library compile failed (Error code 3):
Compilation failed: 

program_source:36:117: error: invalid attribute 'thread_index_in_simdgroup' as input of a fragment function
fragment main0_out main0(main0_in in [[stage_in]], float4 gl_FragCoord [[position]], uint gl_SubgroupInvocationID [[thread_index_in_simdgroup]])
                                                                                                                    ^~~~~~~~~~~~~~~~~~~~~~~~~
.
[mvk-error] VK_ERROR_INVALID_SHADER_NV: Fragment shader function could not be compiled into pipeline. See previous logged error.
Fatal : VkResult is "ERROR_INVALID_SHADER_NV" in C:\10200\Vulkan\examples\triangle\triangle.cpp at line 0009:fixme:msvcp:_Locinfo__Locinfo_ctor_cat_cstr (000000000022F5F8 1 C) semi-stub

the issue is clearly in use of gl_SubgroupInvocationID (in case 2 ,4 of shader) as if I comment this 2 cases I can run the sample correctly using other cases 1..9:

case 2:
    {
        // Example of query intrinsics: WaveGetLaneIndex
        // Gradiently color the wave block by their lane id. Black for the smallest lane id and White for the largest lane id.
        outputColor = vec4(float(gl_SubgroupInvocationID) / float(laneSize));
        break;
    }
case 4:
    {
        // Example of query intrinsics: WaveIsFirstLane
        // Mark the first active lane as white pixel. Mark the last active lane as red pixel.
        if (subgroupElect())
            outputColor = vec4(1., 1., 1., 1.);
        if (gl_SubgroupInvocationID == subgroupMax(gl_SubgroupInvocationID))
            outputColor = vec4(1., 0., 0., 1.);
        break;
    }

I'm afraid is an inherent limitation of Metal that fragment shaders can't contain thread_index_in_simdgroup only compute shaders, right? can be handled in SPIRV-Cross as a function of other subgroup ID's? EDIT: attach also frag,SPIR-V that I generate with either: glslangValidator -V --target-env spirv1.3 triangle.frag -o triangle.frag.spv -DrenderMode=$1 glslangValidator -V --target-env spirv1.3 triangle.frag -o triangle.frag.spv -DrenderMode=$1 -DUSE_INV_ID renderMode sets case to render..

trianglefragspv.zip

thanks..

cdavis5e commented 5 years ago

I'm afraid is an inherent limitation of Metal that fragment shaders can't contain thread_index_in_simdgroup only compute shaders, right?

Yeah, this is a known limitation of Metal (rdar://30281606). But it just occurred to me: maybe we can work around it like this:

uint gl_SubgroupInvocationID = simd_prefix_exclusive_sum(1);
uint gl_SubgroupSize = simd_max(gl_SubgroupInvocationID) + 1;

simd_prefix_exclusive_sum() returns 0 in the first lane and the sum of all previous lanes' arguments in the other lanes.

oscarbg commented 5 years ago

Hi @cdavis5e, thanks for suggestion.. nice idea.. will try soon and report findings..

cdavis5e commented 5 years ago

Note that the example I gave was for MSL. For GLSL, it'd be:

int SubgroupInvocationID = subgroupExclusiveAdd(1);
int SubgroupSize = subgroupMax(SubgroupInvocationID) + 1;
oscarbg commented 5 years ago

Hi @cdavis5e .. thanks I spoke too soon.. but was already thinking of using "equivalent" GLSL code but now you pointed out I don't have to think equivalent port.. still not tested but just seeing today new Metal SL 2.2 got released: https://developer.apple.com/metal/Metal-Shading-Language-Specification.pdf and good news is now thread_index_in_simdgroup is supported as fragment shader input: just in table Table 5.4 I see: thread_index_in_quadgroup All OS: Since Metal 2.2. thread_index_in_simdgroup All OS: Since Metal 2.2. threads_per_simdgroup: All OS: Since Metal 2.2. so maybe you SPIRV-Cross subgroup support can check for Metal 2.2 or not to use native id or your "equivalent" way of getting the ID in case less than 2.2 just saying.. from that PDF we get more functionality like int64 support,barycentric coords, primitive_id.. will open a MoltenVK request tracking new Vulkan functionality that can be exposed thanks to new Metal3.0 (& Metal SL 2.2) features..

cdavis5e commented 5 years ago

SPIRV-Cross has been updated to include the necessary changes.