KhronosGroup / GLSL

GLSL Shading Language Specification and Extensions
Other
336 stars 97 forks source link

Why is `gl_SubgroupSize` not a constant? #19

Open krOoze opened 6 years ago

krOoze commented 6 years ago

Why is gl_SubgroupSize not a constant? Per Vulkan spec it is PhysicalDevice property, so it feels it could be a built-in constant. Currently it seems to be listed under built-in variables.

I am also not certain why gl_NumSubgroups is only "uniform across the invocation group". Why would it ever change within the workgroup?

johnkslang commented 6 years ago

Variables are not considered constant unless their value is known when the front-end compiler translates the GLSL to SPIR-V. (There are also specialization constants for SPIR-V, which can change their value when the SPIR-V is given to the driver.)

If something could vary across pipelines, or platforms, etc., it cannot be a front-end constant.

krOoze commented 6 years ago

That's weird. Almost all built-in constants are "Implementation-dependent". gl_SubgroupSize is practically the same. It is a physical device limit too.

johnkslang commented 6 years ago

It is declared as mediump in uint gl_SubgroupSize; allowing it to vary more than a constant. However, I also see in the issue section that it was resolved to be a property of just the device, making it like other constants.

@sheredom, anything else to add?

sheredom commented 6 years ago

Nope, I think you've covered it all @johnkslang.

In theory it could be a constant as long as the value is only defined when the IHV driver consumes the SPIR-V, in practice I'm not sure it matters much?

I am also not certain why gl_NumSubgroups is only "uniform across the invocation group". Why would it ever change within the workgroup?

You are probably right here, probably a copy/pasta issue. I'll take a look.

johnkslang commented 6 years ago

I'm not sure it matters much?

Front ends (and GLSL) can do more with a const than they can with an in, but I don't know if this specific one benefits from being a constant.

johnkslang commented 6 years ago

On further internal discussion, it is better to keep this as an in, at least for Vukan, because we don't want to make the ISV change the .conf file per target platform.

So, the parts of the extension specification that say it is constant should be corrected to be consistent with being an in.

krOoze commented 6 years ago

It is bit unfortunate. That would disqualify its use in e.g. subgroupBroadcast(). Shouldn't something like subgroupBroadcast(x, gl_SubgroupSize - 1) be allowed?

johnkslang commented 6 years ago

I see the value in that, but suspect it should be achieved by a different change.

krOoze commented 6 years ago

because we don't want to make the ISV change the .conf file per target platform.

This is also identical problem as for the other physical device limits, no? The default config should simply be const gl_SubgroupSize = 1. The user provides the conf to the compiler to check if his code is correct — and the default config is supposed to guarantee compatibility with all drivers.

PS: Hmm, either way it has to stay a variable until incompatible change can be made... drivers wouldn't know how to populate it with value if it was changed to OpSpecConstant.

johnkslang commented 6 years ago

This is also identical problem as for the other physical device limits, no?

Technically, but in practice there is a different emphasis on knowing this compared to limits: This one is the actual value in play, versus a min/max limit.

ianromanick commented 5 years ago

Why is gl_SubgroupSize not a constant? Per Vulkan spec it is PhysicalDevice property, so it feels it could be a built-in constant. Currently it seems to be listed under built-in variables.

I am also not certain why gl_NumSubgroups is only "uniform across the invocation group". Why would it ever change within the workgroup?

It was originally specified this way due to the way Intel GPUs handle SIMD groups. SIMD groups can be 8, 16, or 32. For vertex processing stages and compute, the SIMD group size is fixed during shader compilation. For fragment stage, the SIMD group size will vary across the primitive, up to a limit based on register pressure, depending on how the hardware decided to dispatch the groups.

I think we may have implemented these variables in a way that could allow some of these variables to be const (instead of uniform) or uniform (instead of in). I need to double check.

allanmac commented 5 years ago

As noted above as well as over here ( https://github.com/KhronosGroup/glslang/issues/1591#issuecomment-442121566 ), not treating gl_SubgroupSize as a constant is going to be painful for developers.

If a device doesn't support shader_subgroup_shuffle then there isn't a simple replacement for subgroupBroadcast(gl_SubgroupSize-N).

ianromanick commented 5 years ago

Why is gl_SubgroupSize not a constant? Per Vulkan spec it is PhysicalDevice property, so it feels it could be a built-in constant. Currently it seems to be listed under built-in variables. I am also not certain why gl_NumSubgroups is only "uniform across the invocation group". Why would it ever change within the workgroup?

It was originally specified this way due to the way Intel GPUs handle SIMD groups. SIMD groups can be 8, 16, or 32. For vertex processing stages and compute, the SIMD group size is fixed during shader compilation. For fragment stage, the SIMD group size will vary across the primitive, up to a limit based on register pressure, depending on how the hardware decided to dispatch the groups.

I think we may have implemented these variables in a way that could allow some of these variables to be const (instead of uniform) or uniform (instead of in). I need to double check.

What we ended up implementing was always set gl_SubgroupSize to either the compile-time computed value (for vertex, compute, etc.) or 32 (the maximum possible value for fragment stage), and mark the other channels as inactive when executing in a narrower SIMD mode. I believe it should be safe for us to change gl_SubgroupSize to const.

arianvp commented 5 years ago

Given that subgroupProperties.subgroupSize on the vulkan side gives us only one number, is there any way for the end user to distinguish when it is not constant?

I am implementing a stream compaction phase for my path tracer using subgroups and tried adding the following line:

shared float sumdata[gl_SubgroupSize];

but as a workaround now have to do:

layout (constant_id = 0) const int userProvidedSubgroupSize = 64;
shared float sumdata[userProvidedSubgroupSize];

and on the vulkan side, I then use VkSpecializationInfo to always sret it to subgroupProperties.subgroupSize which seems a bit counter-intuitive given its always constant for a given Device

johnkslang commented 5 years ago

Vulkan has VK_PIPELINE_SHADER_STAGE_CREATE_ALLOW_VARYING_SUBGROUP_SIZE_BIT_EXT which says whether or not the SPIR-V SubgroupSize can vary.

devshgraphicsprogramming commented 3 years ago

If memory serves me well, Intel can configure your shader to run in SIMD4,8, and 16 modes.

darksylinc commented 3 years ago

I feel there are two issues here:

  1. The reason @krOoze created this ticket, which can be traced to his comments in the Vulkan tutorial. In there he seems to believe that if gl_SubgroupSize were to be constant, then the following would be true:
    subgroupBallotFindMSB( subgroupBallot(true) ) == gl_SubgroupSize - 1;
    subgroupMax(global_index) == subgroupBroadcast(global_index, highestActiveID);

Unfortunately that's wrong. The reason those equalities are wrong is not because gl_SubgroupSize isn't constant, but because the driver is (as per spec) allowed to issue a dispatch of 16x1x1 threads to be processed by 16 different waves with 63 inactive waves and 1 active wave instead of 1 wave with 16 active threads and 48 inactive ones. It would be extremely inefficient, but it is allowed.

Although extremely likely, we don't have a guarantee that threads [0; gl_WorkGroupSize) will correspond 1:1 to [0; gl_SubgroupSize).

In fact the GL Mesa driver when operating in Wave32 on RDNA currently sets gl_SubgroupSize = 64 and always returns the last 32 bits as inactive threads. This is bizarre, but allowed as per specs.

If you want the strong guarantee you're looking for, then use VK_EXT_subgroup_size_control to force gl_SubgroupSize to a specific value and VK_PIPELINE_SHADER_STAGE_CREATE_REQUIRE_FULL_SUBGROUPS_BIT_EXT to ensure 1:1 [0; gl_WorkGroupSize) -> [0; gl_SubgroupSize). Someone correct me if I'm mistaken.

This is a desirable property because very clever optimizations can be done if we can assume gl_LocalInvocationID == gl_SubGroupInvocation with no inactive threads.

In fact if I'm wrong and EXT_subgroup_size_control + PIPELINE_SHADER_STAGE_CREATE_REQUIRE_FULL_SUBGROUPS_BIT_EXT is not enough for that, then another ticket should be created.

  1. The issue that gl_SubgroupSize isn't constant. ~It is already uniform. This is problematic because uniform is meant to be the same value over all the workgroups. So OP is right that if gl_SubgroupSize is uniform, then it should be a constant~. Update: GL_ARB_shader_ballot declares it as uniform, but GL_KHR_shader_subgroup does not, although question 6 in the FAQ is confusing: 6. Should gl_SubgroupSize be allowed to vary (for example across shader stages)? RESOLUTION: NO (...).

However drivers and implementations don't want this to be the same value. They're treating it like a special case where it can vary per workgroup. If GLSL were to adopt my https://github.com/KhronosGroup/GLSL/issues/81 proposal the correct decoration would be either threadgroup_uniform or simd_uniform.

My recommendation is that this issue will only be resolved until #81 is adopted and then it gets decided whether gl_SubgroupSize should be threadgroup_uniform or simd_uniform.

krOoze commented 3 years ago

VK_PIPELINE_SHADER_STAGE_CREATE_ALLOW_VARYING_SUBGROUP_SIZE_BIT_EXT is indeed problematic here, which makes it "only" a subgroup-scope uniform. It quickly gets messy to express all the levels of conditional constness...

krOoze commented 3 years ago

PS: Maybe the original issue should be closed. The window in which it could have been fixed trivially is gone. Now it is settled and extensions are made on top of those foundations. It was mostly matter of purism for me, but as we see the constness situation only got more complicated and exceed the humble concern of the original Issue.

Notably also, I think subgroupBroadcast() was relaxed in higher versions of SPIR-V.