EmbarkStudios / rust-gpu

🐉 Making Rust a first-class language and ecosystem for GPU shaders 🚧
https://shader.rs
Apache License 2.0
7.34k stars 245 forks source link

MCP: Bindless support #389

Open Jasper-Bekkers opened 3 years ago

Jasper-Bekkers commented 3 years ago

Let's start off with the simplest hlsl shader I can more or less come up with to do bindless support, which I would like to port to rust-gpu.

Notice that instead of ByteArrayBuffer here, for the full feature one should be able to have (RW)Texture2D and other types as well.

ByteAddressBuffer g_byteAddressBuffer[] : register(t0, space3);
RWByteAddressBuffer g_rwByteAddressBuffer[] : register(u0, space4);

[numthreads(64, 1, 1)]
void main(int threadId: SV_DispatchThreadID)
{
    g_rwByteAddressBuffer[0].Store(threadId, g_byteAddressBuffer[0].Load(threadId));
}

Emitted SPIR-V from DXC:

; SPIR-V
; Version: 1.0
; Generator: Google spiregg; 0
; Bound: 30
; Schema: 0
               OpCapability Shader
               OpCapability RuntimeDescriptorArray
               OpExtension "SPV_EXT_descriptor_indexing"
               OpMemoryModel Logical GLSL450
               OpEntryPoint GLCompute %main "main" %gl_GlobalInvocationID
               OpExecutionMode %main LocalSize 64 1 1
               OpSource HLSL 660
               OpName %type_ByteAddressBuffer "type.ByteAddressBuffer"
               OpName %g_byteAddressBuffer "g_byteAddressBuffer"
               OpName %type_RWByteAddressBuffer "type.RWByteAddressBuffer"
               OpName %g_rwByteAddressBuffer "g_rwByteAddressBuffer"
               OpName %main "main"
               OpDecorate %gl_GlobalInvocationID BuiltIn GlobalInvocationId
               OpDecorate %g_byteAddressBuffer DescriptorSet 3
               OpDecorate %g_byteAddressBuffer Binding 0
               OpDecorate %g_rwByteAddressBuffer DescriptorSet 4
               OpDecorate %g_rwByteAddressBuffer Binding 0
               OpDecorate %_runtimearr_uint ArrayStride 4
               OpMemberDecorate %type_ByteAddressBuffer 0 Offset 0
               OpMemberDecorate %type_ByteAddressBuffer 0 NonWritable
               OpDecorate %type_ByteAddressBuffer BufferBlock
               OpMemberDecorate %type_RWByteAddressBuffer 0 Offset 0
               OpDecorate %type_RWByteAddressBuffer BufferBlock
        %int = OpTypeInt 32 1
      %int_0 = OpConstant %int 0
       %uint = OpTypeInt 32 0
     %uint_2 = OpConstant %uint 2
     %uint_0 = OpConstant %uint 0
%_runtimearr_uint = OpTypeRuntimeArray %uint
%type_ByteAddressBuffer = OpTypeStruct %_runtimearr_uint
%_runtimearr_type_ByteAddressBuffer = OpTypeRuntimeArray %type_ByteAddressBuffer
%_ptr_Uniform__runtimearr_type_ByteAddressBuffer = OpTypePointer Uniform %_runtimearr_type_ByteAddressBuffer
%type_RWByteAddressBuffer = OpTypeStruct %_runtimearr_uint
%_runtimearr_type_RWByteAddressBuffer = OpTypeRuntimeArray %type_RWByteAddressBuffer
%_ptr_Uniform__runtimearr_type_RWByteAddressBuffer = OpTypePointer Uniform %_runtimearr_type_RWByteAddressBuffer
      %v3int = OpTypeVector %int 3
%_ptr_Input_v3int = OpTypePointer Input %v3int
       %void = OpTypeVoid
         %20 = OpTypeFunction %void
%_ptr_Uniform_uint = OpTypePointer Uniform %uint
%g_byteAddressBuffer = OpVariable %_ptr_Uniform__runtimearr_type_ByteAddressBuffer Uniform
%g_rwByteAddressBuffer = OpVariable %_ptr_Uniform__runtimearr_type_RWByteAddressBuffer Uniform
%gl_GlobalInvocationID = OpVariable %_ptr_Input_v3int Input
       %main = OpFunction %void None %20
         %22 = OpLabel
         %23 = OpLoad %v3int %gl_GlobalInvocationID
         %24 = OpCompositeExtract %int %23 0
         %25 = OpBitcast %uint %24
         %26 = OpShiftRightLogical %uint %25 %uint_2
         %27 = OpAccessChain %_ptr_Uniform_uint %g_byteAddressBuffer %int_0 %uint_0 %26
         %28 = OpLoad %uint %27
         %29 = OpAccessChain %_ptr_Uniform_uint %g_rwByteAddressBuffer %int_0 %uint_0 %26
               OpStore %29 %28
               OpReturn
               OpFunctionEnd

First attempt:

#[allow(unused_attributes)]
#[spirv(gl_compute)]
pub fn main_cs(
    #[spirv(descriptor_set = 0, binding = 0)] img: Uniform<&[u32]>,
) {
    let img = img.load();
    let stuff = &img[0];
}

This seems to emit a OpRuntimeArray but then proceed to do some things wrong. I had even more trouble making the slice mutable (for storing data) since that leads to a bunch of compiler errors down the line around .load not being available etc.

Ideally we would also like to declare the bindless arrays as globals, so it's nicer to create our own wrapper types around Image2d and other (so they can do the indirection through the bindless array).

Jasper-Bekkers commented 3 years ago

Trying to re-create the needed SPIR-V in inline assembly also fails because of OpVariable needing to be Uniform and thus in global scope instead of inside the function.

#[allow(unused_attributes)]
#[spirv(gl_compute)]
pub fn main_cs(
) {

    unsafe {
        asm!(
            "%int = OpTypeInt 32 0",
            "%uint = OpTypeInt 32 1",
            "%int_0 = OpConstant %int 0",
            "%uint_0 = OpConstant %uint 0",
            "%int_array = OpTypeRuntimeArray %int",
            "%ptr_int_array = OpTypePointer Uniform %int_array",
            "%array = OpVariable %ptr_int_array Uniform",
            "%stuff = OpAccessChain %ptr_int_array %array %uint_0 %uint_0 %uint_0"
        );
    }
}
Jasper-Bekkers commented 3 years ago

From internal discussions it would seem that this is depending on https://github.com/EmbarkStudios/rust-gpu/issues/300

repi commented 3 years ago

Really would like to have bindless support as believe it can simplify a lot in how we build our renderer with this. So approve of the idea :) But this is not that clear/concrete (to me) of how it would be implemented and what the options are, so maybe a lighter RFC is needed and/or some prototypes to map It out?

Jasper-Bekkers commented 3 years ago

@repi This is just an issue to get over the initial hurdle so we can get rust-gpu to emit the correct SPIR-V incantation to even begin prototyping this.