KhronosGroup / Vulkan-Portability

Apache License 2.0
40 stars 4 forks source link

Add VkPhysicalDevicePortabilitySubsetFeaturesKHR:: workgroupMatrices capability #33

Open billhollings opened 2 years ago

billhollings commented 2 years ago

Based on CTS testing, and this SPIRV-Cross discussion, it appears that Metal currently has a Catch-22 oversight, where initialization of locally declared threadgroup variables is required (like C++ statics), but Metal matrix variables do not have the necessary constructors in the threadgroup address space.

As discussed in the SPIRV-Cross thread, although it might be possible to create a heroic workaround in SPIRV-Cross, it is not clear that it would have real-world requirements for going that far.

An alternative is to add a VkPhysicalDevicePortabilitySubsetFeaturesKHR:: workgroupMatrices capability, where an implementation can indicate that SPIR-V can or cannot include matrices in a Workgroup variable.

kvark commented 2 years ago

Something doesn't seem right here. I can use matrices in theradgroup memory in Metal, e.g.

#include <metal_stdlib>
#include <simd/simd.h>

struct Foo {
    metal::float2x2 m;
};

kernel void main1(
  threadgroup Foo const& foo
) {
    float x;
    float _e6 = foo.m[0].x;
    x = _e6;
    return;
}
cdavis5e commented 2 years ago

But you can't do this:

#include <metal_stdlib>
#include <simd/simd.h>

struct Foo {
    metal::float2x2 m;
};

kernel void main1() {
    threadgroup Foo foo;
    float x;
    float _e6 = foo.m[0].x;
    x = _e6;
    return;
}

In practice, we're more likely to see this latter pattern, because AFAIK SPIR-V has no equivalent for passing the TGSM block as an argument like Metal.

cdavis5e commented 2 years ago

Another thing about that TGSM parameter is that you have to set it up from the MTLComputeCommandEncoder with -[MTLComputeCommandEncoder setThreadgroupMemoryLength:atIndex:]. Again, Vulkan has no equivalent.

billhollings commented 2 years ago

But you can't do this:

    threadgroup Foo foo;

And specifically, the CTS tests under review set values within the threadgroup matrix:

     threadgroup Foo foo;
     foo.m[0].x = 1.0;
kvark commented 2 years ago

What is GLSL/SPIR-V code used by these tests? I'm not getting why Metal code would need to declare "threadgroup Foo foo;" as a local variable (since it's not local, it's group-shared).

billhollings commented 2 years ago

What is GLSL/SPIR-V code used by these tests? I'm not getting why Metal code would need to declare "threadgroup Foo foo;" as a local variable (since it's not local, it's group-shared).

The catalyst for this (including relevant SPIR-V) is in the back-link mention I made above (just above your question). That back-link is actually there to reference the thought you made above as leading to a potential solution.

Happy to have any suggestions about how that can be dealt with.

kvark commented 2 years ago

I'm just trying to understand the problem. It would help to first provide a piece of GLSL code that, when translated to SPIR-V and then to Metal, causes a problem. I tried converting the SPIR-V to GLSL and saw this:

struct _2
{
    vec4 _m0;
    mat3x2 _m1;
    bvec4 _m2;
};
shared _2 _154;

So there is a global variable with shared qualifier, and it contains a matrix. Metal doesn't have global variables, so this one is expected to go the same route as all the resources (e.g. uniform buffers) - via function parameters. Doing this doesn't require any default constructors.

So where does the problem come from?

billhollings commented 2 years ago

So where does the problem come from?

SPIRV-Cross converts the SPIR-V to include the following GLSL:

struct _152
{
    mediump vec4 _m0;
    mat3x2 _m1;
    bvec4 _m2;
};

shared _152 _154;

void main()
{
    _154._m0 = vec4(1.0, -5.0, -9.0, -5.0);
    _154._m1 = mat3x2(vec2(1.0, -7.0), vec2(1.0, 2.0), vec2(8.0, 7.0));
    _154._m2 = bvec4(false, true, false, false);
    ...
    vec4 _184 = _154._m0;

and the corresponding converted MSL includes:

struct _152
{
    float4 _m0;
    float3x2 _m1;
    bool4 _m2;
};

kernel void main0(device _210& _212 [[buffer(1)]])
{
    threadgroup _152 _154;
    _154._m0 = float4(1.0, -5.0, -9.0, -5.0);
    _154._m1 = float3x2(float2(1.0, -7.0), float2(1.0, 2.0), float2(8.0, 7.0));
    _154._m2 = short4(bool4(false, true, false, false));
    ...
    float4 _184 = _154._m0;

The MSL error results from the declaration of _154. It requires a default constructor, because MSL threadgroup local variables are initialized. Unfortunately, the default constructor of _152 is deleted because float3x2 does not have a default constructor.

Your focus on the threadgroup variable being in a buffer does offer one possible way through this, but it would require elevating the local variable to an input/output variable, and coordinating through the app to provide that buffer.

kvark commented 2 years ago

Yeah, I'm totally missing why SPIRV-Cross tries to make it a local variable. All global declarations in SPIR-V pretty much need to go as arguments for MSL, I don't understand why threadgroup should be an exception.

HansKristian-Work commented 2 years ago

I don't understand why threadgroup should be an exception

Because threadgroup is not part of the interface with host. Similar to how we declare Private variables. You don't pass Private variables by main parameter. Only global variables which are IO or descriptors are main() parameters. Sure, Metal has this weird threadgroup-by-pointer-allocated-from-host thing, but that's not mandatory and kinda awkward. AFAIK, it's a hack to workaround the lack of spec constant array size of threadgroup storage when making the workgroup size a spec constant.

kvark commented 2 years ago

Not sure where "threadgroup-by-pointer-allocated-from-host" comes from. Are you talking about setThreadgroupMemoryLength(_:index:)? There is nothing about the host allocation here that I can see.

Can MoltenVK use this for all threadgroup memory? Would there be any downside for doing so?

cdavis5e commented 1 year ago

I'm not certain this is needed anymore. I actually had to implement a workaround to solve shaderSharedCompositeVariables--and then Apple finally did what I asked them to and added default matrix constructors in the threadgroup AS.

billhollings commented 1 year ago

Apple finally did what I asked them to

Would we still need the flag for older OS's then?

cdavis5e commented 1 year ago

Would we still need the flag for older OS's then?

From the prior message:

I actually had to implement a workaround to solve shaderSharedCompositeVariables

So probably not.