KhronosGroup / glslang

Khronos-reference front end for GLSL/ESSL, partial front end for HLSL, and a SPIR-V generator.
Other
2.9k stars 815 forks source link

Incorrect SPIRV codegen for 8bit/16bit variables in buffers #3607

Open alelenv opened 1 month ago

alelenv commented 1 month ago

For the following simple shader

#version 460

#extension GL_EXT_shader_8bit_storage : require
#extension GL_EXT_shader_16bit_storage : require
#extension GL_EXT_shader_explicit_arithmetic_types_float16 : require

layout(binding = 1 ) uniform _16bit_storage
{
        i16vec4 i16v4;
};

// This is read back and checked on the CPU side to verify the converions
layout(binding = 2 ) writeonly buffer ConversionOutBuffer
{
        i8vec4 i16v4_to_i8v4;
} cob;

out vec4 fcolor;

void main()
{
        // Conversions
        {
                cob.i16v4_to_i8v4   = i8vec4(i16v4);
        }

        bool RED = true;
        bool GREEN = false;

        fcolor = vec4( (RED) ? 1.0f : 0.0f,
                                   (GREEN) ? 1.0f : 0.0f,
                                   0.0f, 1.0f);
}

We see some weird codegen when loading members from UBO and doing a convert

         %19 = OpAccessChain %_ptr_Uniform_v4short %_ %int_0
         %20 = OpLoad %v4short %19
         %22 = OpCompositeExtract %int %20 0
         %23 = OpCompositeExtract %int %20 1
         %24 = OpCompositeExtract %int %20 2
         %25 = OpCompositeExtract %int %20 3
         %26 = OpCompositeConstruct %v4int %22 %23 %24 %25
         %27 = OpSConvert %v4char %26

OpCompositeExtract tried to extract as 'int' when %20 is a vector of i16/shorts

arcady-lunarg commented 1 month ago

Yeah that definitely isn't valid SPIR-V, I'll take a look. Interestingly this seems to be the one cast that isn't currently covered by our unit tests.

ravi688 commented 2 weeks ago

@alelenv, how did you compiled your shader? Trying to execute the following command reports an error for me:

$ glslang -G test.frag -o test.frag.spv
test.frag
ERROR: test.frag:20: 'location' : SPIR-V requires location for user input/output 
ERROR: 1 compilation errors.  No code generated.

ERROR: Linking fragment stage: Missing entry point: Each stage requires one entry point

SPIR-V is not generated for failed compile or link

The test.frag contains the shader you embedded into your first messaged in this issue.

arcady-lunarg commented 2 weeks ago

@ravi688 You can use the --aml option to automatically assign locations when you get that error message.

ravi688 commented 1 week ago

Inspecting the code in ParseHelper.cpp, I think we are first trying to extract the components of i16v4 (i16vec4) and then convert each component, of type i16, separately to scalar type of i8, after that construct the target vector type of i8vec4 from the final scalar components of type i8.

Why can't we just use the following set of instructions?

%x = OpSConvert from i16vec4 to ivec4
OpSConvert from %x to i8vec4

I believe OpSConvert works on vector types also.

Also, the following (vulkan) GLSL code:

#version 450

layout(location = 0) in vec4 in_color;

layout(location = 0) out vec4 out_color;

void main()
{
    out_color = vec4(in_color);
}

translates to the following SPIR-V instructions:

...
 %12 = OpLoad %v4float %in_color
         %13 = OpCompositeExtract %float %12 0
         %14 = OpCompositeExtract %float %12 1
         %15 = OpCompositeExtract %float %12 2
         %16 = OpCompositeExtract %float %12 3
         %17 = OpCompositeConstruct %v4float %13 %14 %15 %16
 OpStore %out_color %17
...

Why are we using OpCompositeExtract for the case when number of components are the same for source and target both and no mixing of scalar or differently sized vectors is there? we could have directly used %12, i.e. ignored the identity conversion.

Is there anything which I'm not aware of just in case it is all right extracting and constructing again?

arcady-lunarg commented 1 week ago

@ravi688 take a look at https://github.com/KhronosGroup/glslang/pull/3628 which does exactly what you suggest.

ravi688 commented 1 week ago

Raise a PR #3631 which fixes this issue.