KhronosGroup / SPIRV-Reflect

SPIRV-Reflect is a lightweight library that provides a C/C++ reflection API for SPIR-V shader bytecode in Vulkan applications.
Apache License 2.0
701 stars 148 forks source link

[Bug] spirv-reflect doesn't understand how struct strides work #280

Open Nielsbishere opened 2 months ago

Nielsbishere commented 2 months ago

The following HLSL (compiled with DXC's flags for dx buffer layouts) produces incorrect spirv-reflect behavior:

struct Nicu {
    float a, b;
    uint64_t c;
    uint16_t d;
    float16_t e;
    int16_t f;
};

RWStructuredBuffer<Nicu> _sbuffer0;

[[oxc::stage("compute")]]
[[oxc::extension("16BitTypes", "I64")]]
[numthreads(1, 1, 1)]
void main(uint id : SV_DispatchThreadID) {
    _sbuffer0[id].a = 123;
}

Turns into:

; SPIR-V
; Version: 1.5
; Generator: Google spiregg; 0
; Bound: 26
; Schema: 0
               OpCapability Shader                  ; 0x00000014
               OpCapability Int64                   ; 0x0000001c
               OpCapability Int16                   ; 0x00000024
               OpCapability Float16                 ; 0x0000002c
               OpMemoryModel Logical GLSL450        ; 0x00000034
               OpEntryPoint GLCompute %1 "main" %gl_GlobalInvocationID %3   ; 0x00000040
               OpExecutionMode %1 LocalSize 1 1 1                           ; 0x0000005c

               ; Annotations
               OpDecorate %gl_GlobalInvocationID BuiltIn GlobalInvocationId     ; 0x00000074
               OpDecorate %3 DescriptorSet 0                                    ; 0x00000084
               OpDecorate %3 Binding 0                                          ; 0x00000094
               OpMemberDecorate %_struct_5 0 Offset 0                           ; 0x000000a4
               OpMemberDecorate %_struct_5 1 Offset 4                           ; 0x000000b8
               OpMemberDecorate %_struct_5 2 Offset 8                           ; 0x000000cc
               OpMemberDecorate %_struct_5 3 Offset 16                          ; 0x000000e0
               OpMemberDecorate %_struct_5 4 Offset 18                          ; 0x000000f4
               OpMemberDecorate %_struct_5 5 Offset 20                          ; 0x00000108
               OpDecorate %_runtimearr__struct_5 ArrayStride 24                 ; 0x0000011c
               OpMemberDecorate %_struct_4 0 Offset 0                           ; 0x0000012c
               OpDecorate %_struct_4 Block                                      ; 0x00000140

               ; Types, variables and constants
      %float = OpTypeFloat 32                                                   ; 0x0000014c
  %float_123 = OpConstant %float 123                                            ; 0x00000158
        %int = OpTypeInt 32 1                                                   ; 0x00000168
      %int_0 = OpConstant %int 0                                                ; 0x00000178
      %ulong = OpTypeInt 64 0                                                   ; 0x00000188
     %ushort = OpTypeInt 16 0                                                   ; 0x00000198
       %half = OpTypeFloat 16                                                   ; 0x000001a8
      %short = OpTypeInt 16 1                                                   ; 0x000001b4
  %_struct_5 = OpTypeStruct %float %float %ulong %ushort %half %short           ; 0x000001c4
%_runtimearr__struct_5 = OpTypeRuntimeArray %_struct_5                          ; 0x000001e4, ArrayStride 24
  %_struct_4 = OpTypeStruct %_runtimearr__struct_5                              ; 0x000001f0, Block
%_ptr_StorageBuffer__struct_4 = OpTypePointer StorageBuffer %_struct_4          ; 0x000001fc
       %uint = OpTypeInt 32 0                                                   ; 0x0000020c
     %v3uint = OpTypeVector %uint 3                                             ; 0x0000021c
%_ptr_Input_v3uint = OpTypePointer Input %v3uint                                ; 0x0000022c
       %void = OpTypeVoid                                                       ; 0x0000023c
         %20 = OpTypeFunction %void                                             ; 0x00000244
%_ptr_StorageBuffer_float = OpTypePointer StorageBuffer %float                  ; 0x00000250
          %3 = OpVariable %_ptr_StorageBuffer__struct_4 StorageBuffer           ; 0x00000260, DescriptorSet 0, Binding 0
%gl_GlobalInvocationID = OpVariable %_ptr_Input_v3uint Input                    ; 0x00000270, BuiltIn GlobalInvocationId

               ; Function 1
          %1 = OpFunction %void None %20                                        ; 0x00000280

         %22 = OpLabel                                                          ; 0x00000294
         %23 =   OpLoad %v3uint %gl_GlobalInvocationID                          ; 0x0000029c
         %24 =   OpCompositeExtract %uint %23 0                                 ; 0x000002ac
         %25 =   OpAccessChain %_ptr_StorageBuffer_float %3 %int_0 %24 %int_0   ; 0x000002c0
                 OpStore %25 %float_123                                         ; 0x000002dc
                 OpReturn                                                       ; 0x000002e8
               OpFunctionEnd                                                    ; 0x000002ec

As the U32 at offset 0x0000011c says; the stride of _runtimearr__struct_5 is 24. However, when running this with spirv-reflect it will gaslight me that binding->block.members[0]->padded_size == 22:

SH Binary (compute): main
        [[oxc::model(6.5)]]
        [[oxc::extension( "I64" , "16BitTypes" )]]
        [[oxc::uniforms()]
        [[oxc::vendor()]] //(any vendor)
        Binaries:
                SPV: 752
        Registers:
                _sbuffer0  (SPV: Used)
                        RWStructuredBuffer
                        [[vk::binding(0, 0)]]
                        0x00000000: $Element (SPIRV: Used): Nicu (Stride: 22)
                                0x00000000: a (SPIRV & DXIL: Unused): F32
                                0x00000004: b (SPIRV & DXIL: Unused): F32
                                0x00000008: c (SPIRV & DXIL: Unused): U64
                                0x00000010: d (SPIRV & DXIL: Unused): U16
                                0x00000012: e (SPIRV & DXIL: Unused): F16
                                0x00000014: f (SPIRV & DXIL: Unused): I16

That doesn't match expected behavior and is likely because 22 = 0x14 + sizeof(I16) = 0x16. My best guess is it calculates the struct size rather than reflects it. If it does calculate it, then it should calculate it by taking the largest member (U64) and align the struct size to it. Doing that manually seems very confusing.

spencer-lunarg commented 2 months ago

OpMemberDecorate %_struct_5 5 Offset 20 (which is int16_t f;)

I guess I can't find any spec text (i assume it might be somewhere deep down in it) that says that the struct size is 22 or 24 in this case

I can confirm the logic of SPIRV-Reflect is it grabs the highest offset value in the struct and then proceeds to calculate that last element.

it should calculate it by taking the largest member (U64)

what if your largest member is a struct, like

struct Nicu {
    SomeStruct, a;
    int16_t b;
};

I am not sure how it would calculate it there

(basically will bring up in spec WG to confirm what the "right" answer should be here and how it should be calculated, because teh Vulkan Validation Layers are doing the same logic as well currently)

Nielsbishere commented 2 months ago

@spencer-lunarg I'm more specifically talking about the HLSL structured buffer packing rules and there it 100% mismatches the reflection that can be obtained from DXIL vs the reflection that can be obtained from SPIRV (24 would be the correct answer). Maraneshi has a good writeup about these rules: https://maraneshi.github.io/HLSL-ConstantBufferLayoutVisualizer/ (see Addendum). Essentially it has similar rules to C/C++ where the largest data type is the alignment the struct should use. With nested structs they are aligned to their largest size as well. So if SomeStruct has a U64 in it it would basically force Nicu to become 16 bytes. If this mismatches with SSBO layouts, I'm not 100% sure, as I've not been using GLSL for a while now.

See the output of

struct PSInput
{
    float4 color : COLOR;
};

struct Nest {
  uint64_t a;  
};

struct Test {
    Nest a;
    float b;
};

StructuredBuffer<Test> test;

float4 PSMain(PSInput input) : SV_TARGET
{
    return input.color * test[0].b.xxxx;
}

On https://shader-playground.timjones.io/ where it produces:


;   struct struct.Test
;   {
;
;       struct struct.Nest
;       {
;
;           uint64_t a;                               ; Offset:    0
;       
;       } a;                                          ; Offset:    0
;
;       float b;                                      ; Offset:    8
;   
;   } $Element;                                       ; Offset:    0 Size:    16
spencer-lunarg commented 2 months ago

@Nielsbishere ok, I know glsl/hlsl have various memory layouts, (https://github.com/KhronosGroup/Vulkan-Guide/blob/main/chapters/shader_memory_layout.adoc) but I need to make sure I get it right for how SPIR-V defines it

... basically I have had this on my todo list to get better documented in the spec (or somewhere) and a set of SPIR-V to test things with as there are clearly many edge cases to test for

spencer-lunarg commented 2 months ago

@Nielsbishere I just realize I have also been mixing up stride with size :facepalm:

Nielsbishere commented 2 months ago

No problem. For me I also don't care too much about size, I only care about stride. Because the DXIL backend can't accurately figure out the size of a struct for nested structs.

spencer-lunarg commented 2 months ago

@Nielsbishere can I ask where the stride of 22 blows up, like I guess want to also better understand who is expecting this to be 24, because if you had an array of Nicu SPIR-V would have an ArrayStride for it

Nielsbishere commented 2 months ago

This data is used for reflection purposes and is used for both analysis and to send data to the shader. The two should match because otherwise DXIL and SPIRV have mismatching behavior (I unify both binaries into a single reflection format for ease of use).

chaoticbob commented 3 weeks ago

spriv-reflect isn't gaslighting you. When this code was written, there was not a clear definition on size vs stride of a struct in a StructuredBuffer. Both were reversed engineered from the behavior of the drivers at the time because there was no documentation published on HLSL buffer packing. There's a good chance the driver that we leaned on the hardest to for the calculations didn't quite get it right either. There may have been something like the stride needs to align to the size of the smallest machine type (4 bytes) but oh...well for StructuredBuffer the driver might ignore that.

The code was originally written to line up with DXBC and not DXIL. DXIL wasn't in widespread usage at the time and wouldn't be for almost another two years.

It's been a while since I've written that code and I haven't had the time to revisit in now that DXIL is more common than DXBC.

The rules for packing ConstantBuffer and StructuredBuffer are a little bit different and also have changed over time with the inclusion of types that are smaller than 4 bytes. We need to revisit these with an updated test to line up with DXIL.