Glslang implicitely uses 64 bits for SPV_KHR_shader_ballot

Max-Andersson commented 6 years ago

In http://htmlpreview.github.io/?https://github.com/KhronosGroup/SPIRV-Registry/blob/master/extensions/KHR/SPV_KHR_shader_ballot.html, it is stated that subgroup masks, as well as the resulting bitmask of OpSubgroupBallotKHR should be a uvec4 to avoid having a dependency on 64 bit support. It is known that the various built-in masks are not rewritten properly: https://github.com/KhronosGroup/glslang/issues/930. OpSubgroupBallotKHR however, is rewritten into a uvec4, but is silently re-written into a 64 bit integer before being used:

%86 = OpSubgroupBallotKHR %v4uint %83 %87 = OpCompositeExtract %uint %86 0 %88 = OpCompositeExtract %uint %86 1 %90 = OpCompositeConstruct %v2uint %87 %88 %91 = OpBitcast %ulong %90

This adds a dependency to 64 bit integer types, even if the GLSL did not contain any explicit types. Furthermore, it seems like we're missing logic for rewriting operations operating on subgroup_bitmask_t type when converting GLSL to SPIR-V. E.g. comparing the result of OpSubgroupBallotKHR with constant 0 promotes the constant and the comparison to 64 bit integer type.

amdrexu commented 6 years ago

I think because SPIR-V is an intermediate representation. So it should be as general as possible and be somewhat independent of high level language. So the spec design makes subgroupsize as a uvec4 instead of a uint64 (uvec4 is more general and the value range is large accordingly). However, the extension ARB_shader_ballot has dependency on ARB_gpu_shader_int64. So if you use the former, you assume the latter is supported on this hardware.

In this case, I think what OpSubgroupBallotKHR does is correctly because the GLSL prototype uint64 ballot(bool) requires this. This incorrect translation is BuiltInSubgroupXXMask should decorate on uvec4 instead of uint64. I have notice this issue recently and is thinking about a solution. Like what OpSubgroupBallotKHR does, it is still a uvec4.xy -> uint64 conversion as well, I believe.

If you want to use subgroup operations more general. Maybe KHR_shader_subgroup_XXX is a good candidate. In this extension, the active invocation mask is always uvec4 without this silent translation you have mentioned.

Max-Andersson commented 6 years ago

The SPIR-V spec. clearly states that :

Unlike GL_ARB_shader_ballot the SPIR-V extension does not depend on GL_ARB_shader_gpu_int64 because the types representing subgroup IDs are held in a 4 component vector of integers. To me, that means that it is not the intention to rewrite the bitmasks back to 64 bit integer type.

My interpretation of the spec. is that glslang is responsible for rewriting the uint64_t from GLSL (or rather subgroup_mask_t) to uvec4 in each of its uses. As for the BuiltInSubgroup*Mask, they are 64 bit inputs, e.g. %gl_SubGroupEqMaskARB = OpVariable %_ptr_Input_ulong Input

amdrexu commented 6 years ago

For glslang, it must be tightly coupled with GLSL. So the GLSL representation uint64 will not be replaced by uvec4. Instead, a uvec4.xy -> uint64 cast is generated in the SPIR-V binary. To replace uint64 with uvec4, many semantics will be changed and it is error-prone. But as you have stated, gl_SubGroupXXMaskARB is generated incorrectly. It is true. So we will fix this issue, maybe the codes generated is like this:

OpDecorate %gl_SubGroupXXMaskUvec4 BuiltInSubgroupXXMaskKHR

%gl_SubGroupXXMaskUvec4 = OpVariable %_ptr_uvec4 Input
%maskValue = OpLoad %_uvec4 %gl_SubGroupXXMask
%maskValue_x = OpCompositeExatrct %_int %maskValue 0
%maskValue_y = OpCompositeExatrct %_int %maskValue 1
%maskValue_xy = OpCompositeConstruct %_uvec2 %maskValue_x %maskValue_y
%maskValue_uint64 = OpBitcast %_uint64 %maskValue_xy

%gl_SubGroupXXMaskUint64 = OpVariable %_ptr_uint64 Input
OpStore %gl_SubGroupXXMaskUint64 %maskValue_uint64

%gl_SubGroupXXMaskUint64 corresponds to uint64_t gl_SubGroupXXMaskARB in GLSL.

Max-Andersson commented 6 years ago

While I agree that it is much less error prone to simply rewrite the uvec4 into a uint64_t for the instructions which generate them, AFAICT that defeats the point of having them as a uint64_t in the first place. We'll lose almost all benefits of having a uvec4 (breaking dependency on 64 bit types, less restrictive subgroup sizes...) and add an unnecessary layer of bit-fiddling. The only benefit left that I can think of, is that this extension will be consistent with the Vulkan 1.1 subgroup features.

By always using 64 bits, glslang will restrict this extension (excluding manually written SPIR-V) to only be supported on targets with 64 bit integer support despite this not being required by the SPIR-V spec. Is this really expected behavior?

amdrexu commented 6 years ago

Yes, this limitation is caused by OpenGL. When you use ARB_shader_ballot, you must also use ARB_gpu_shader_int64. So currently, new subgroup operations are re-implemented based on KHR_shader_subgroup_XXX of Vulkan 1.1.

alegal-arm commented 6 years ago

So the spec design makes subgroupsize as a uvec4 instead of a uint64 (uvec4 is more general and the value range is large accordingly). However, the extension ARB_shader_ballot has dependency on ARB_gpu_shader_int64. So if you use the former, you assume the latter is supported on this hardware.

I agree that this is correct but forcing such dependency (on 64-bit ints) when SPIR-V is being produced renders ARB_shader_ballot useless as a high-level language for implementations that don't support int64 but can support SPV_KHR_shader_ballot.

SPV_KHR_shader_ballot explicitly calls out ARB_shader_ballot and therefore we can assume that application that would want to use this functionality will be written using ARB_shader_ballot syntax. Having a dependency on 64-bit in SPIR-V context is technically not necessary, it also creates implicit requirement on Int64 support, which apps might not be aware of, and it will also prevent (for no good reasons) such apps from running on implementations that don't support this feature

amdrexu commented 6 years ago

The spec ARB_shader_ballot says: This extension requires GL_ARB_gpu_shader_int64. This issue is caused by OpenGL design. I think using VK_KHR_shader_subgroup_ballot will be a good choice in the future assuming underlying hardware supports Vulkan 1.1 (The semantics is clear and need glslang to do nothing to work around this issue). @johnkslang Maybe John can shed some light on our discussion.

alegal-arm commented 6 years ago

subgroupBallot() from VK_KHR_shader_subgroup_ballot maps to OpGroupNonUniformBallot , not to OpSubgroupBallotKHR defined by SPV_KHR_shader_ballot.

amdrexu commented 6 years ago

Yes, the OpCodes are different. But the semantics are the same according to the translation in glslang. I am OK to change glslang codes to make uvec4 as the general used type replacing uint64. But the risk of such change is not very low. It depends on John's decision eventually.

alegal-arm commented 6 years ago

summoning @johnkslang one more time :)

Max-Andersson commented 6 years ago

As I see it, we have three different options.

Add support to glslang to rewrite uses of 64 bit subgroup masks into uvec4s and thus remove the GLSL dependency on 64 bit capability. This is what I originally expected of glslang.
Expand GLSL to have uvec4 signature variants for the affected built-ins when targeting SPIR-V.
Do nothing and make it clear that it is not possible to generate SPIR-V without a dependency on 64 bit types through glslang. If a user want to use the extension in a Vulkan environment, then the user is expected to either verify that all intended targets support 64 bits, or rewrite the SPIR-V to remove the 64 bit dependency manually.

Of course, 1) is the best from the user point-of-view, where everything "Just Works", but the most error prone from glslang's point-of-view. 3) is very user unfriendly, but requires nothing of glslang.

Feel free to add to the list of options if you have a good idea.

amdrexu commented 6 years ago

Actually, I believe option (1) is the correct choice to align GL_ARB_shader_ballot with GL_KHR_shader_subgroup_ballot. Since the enumerant BuiltInSubgroupXXMaskKHR take the same values with BuiltInSubgroupXXMask. We should make both implementation consistent. Currently, what glslang does contradicts with the expectation of SPIR-V spec for SPV_KHR_shader_ballot.

KhronosGroup / glslang

Glslang implicitely uses 64 bits for SPV_KHR_shader_ballot #1292