Incorrect codegen when there is redundant float16_t conversion

FanShupei commented 4 weeks ago

For background, you may see https://github.com/google/shaderc/issues/1418

I just understand this is actually a glslang bug, so I raise the issue here.

For the compute shader, glslang generates ill-formed spirv binary, demonstrated by the following commands

$ glslang --target-env vulkan1.3 issue.comp
$ spirv-val comp.spv
error: line 65: Expected input to have different bit width from Result Type: FConvert
  %33 = OpFConvert %half %31

#version 450
#extension GL_EXT_shader_16bit_storage : require
layout(local_size_x = 1, local_size_y = 1, local_size_z = 1) in;

#define TYPE float16_t // or 'int64_t', 'uint16_t', also generated ill-formed spirv
// #define TYPE float, int, uint, these are fine
layout(binding = 0) readonly buffer A { TYPE data_a[]; };
layout(binding = 1) writeonly buffer D { TYPE data_d[]; };

void main() {
    const uint i = gl_GlobalInvocationID.x;
    data_d[i] = TYPE(data_a[i]);
}

I take a look at the disassembly, it is indeed incorrect. The spec on OPFConvert says that "The component width must not equal the component width in Result Type.", so the op actually should be eliminated.

; SPIR-V
; Version: 1.6
; Generator: Khronos Glslang Reference Front End; 11
; Bound: 37
; Schema: 0
               OpCapability Shader
               OpCapability StorageBuffer16BitAccess
          %1 = OpExtInstImport "GLSL.std.450"
               OpMemoryModel Logical GLSL450
               OpEntryPoint GLCompute %main "main" %gl_GlobalInvocationID %_ %__0
               OpExecutionMode %main LocalSize 1 1 1
               OpSource GLSL 450
               OpSourceExtension "GL_EXT_shader_16bit_storage"
               OpName %main "main"
               OpName %i "i"
               OpName %gl_GlobalInvocationID "gl_GlobalInvocationID"
               OpName %D "D"
               OpMemberName %D 0 "data_d"
               OpName %_ ""
               OpName %A "A"
               OpMemberName %A 0 "data_a"
               OpName %__0 ""
               OpDecorate %gl_GlobalInvocationID BuiltIn GlobalInvocationId
               OpDecorate %_runtimearr_half ArrayStride 2
               OpMemberDecorate %D 0 NonReadable
               OpMemberDecorate %D 0 Offset 0
               OpDecorate %D Block
               OpDecorate %_ DescriptorSet 0
               OpDecorate %_ Binding 1
               OpDecorate %_runtimearr_half_0 ArrayStride 2
               OpMemberDecorate %A 0 NonWritable
               OpMemberDecorate %A 0 Offset 0
               OpDecorate %A Block
               OpDecorate %__0 DescriptorSet 0
               OpDecorate %__0 Binding 0
       %void = OpTypeVoid
          %3 = OpTypeFunction %void
       %uint = OpTypeInt 32 0
%_ptr_Function_uint = OpTypePointer Function %uint
     %v3uint = OpTypeVector %uint 3
%_ptr_Input_v3uint = OpTypePointer Input %v3uint
%gl_GlobalInvocationID = OpVariable %_ptr_Input_v3uint Input
     %uint_0 = OpConstant %uint 0
%_ptr_Input_uint = OpTypePointer Input %uint
       %half = OpTypeFloat 16
%_runtimearr_half = OpTypeRuntimeArray %half
          %D = OpTypeStruct %_runtimearr_half
%_ptr_StorageBuffer_D = OpTypePointer StorageBuffer %D
          %_ = OpVariable %_ptr_StorageBuffer_D StorageBuffer
        %int = OpTypeInt 32 1
      %int_0 = OpConstant %int 0
%_runtimearr_half_0 = OpTypeRuntimeArray %half
          %A = OpTypeStruct %_runtimearr_half_0
%_ptr_StorageBuffer_A = OpTypePointer StorageBuffer %A
        %__0 = OpVariable %_ptr_StorageBuffer_A StorageBuffer
%_ptr_StorageBuffer_half = OpTypePointer StorageBuffer %half
      %float = OpTypeFloat 32
     %uint_1 = OpConstant %uint 1
         %36 = OpConstantComposite %v3uint %uint_1 %uint_1 %uint_1
       %main = OpFunction %void None %3
          %5 = OpLabel
          %i = OpVariable %_ptr_Function_uint Function
         %14 = OpAccessChain %_ptr_Input_uint %gl_GlobalInvocationID %uint_0
         %15 = OpLoad %uint %14
               OpStore %i %15
         %23 = OpLoad %uint %i
         %28 = OpLoad %uint %i
         %30 = OpAccessChain %_ptr_StorageBuffer_half %__0 %int_0 %28
         %31 = OpLoad %half %30
         %33 = OpFConvert %half %31
         %34 = OpAccessChain %_ptr_StorageBuffer_half %_ %int_0 %23
               OpStore %34 %33
               OpReturn
               OpFunctionEnd

Version

I also test the bug also exists on current master.

OS: ArchLinux

$ glslang --version
Glslang Version: 11:14.1.0
ESSL Version: OpenGL ES GLSL 3.20 glslang Khronos. 14.1.0
GLSL Version: 4.60 glslang Khronos. 14.1.0
SPIR-V Version 0x00010600, Revision 1
GLSL.std.450 Version 100, Revision 1
Khronos Tool ID 8
SPIR-V Generator Version 11
GL_KHR_vulkan_glsl version 100
ARB_GL_gl_spirv version 100

arcady-lunarg commented 4 weeks ago

Thanks for the detailed bug report, this is definitely an issue that needs to be fixed, probably in the core of glslang as apparently the way it generates these conversions is wrong.

ravi688 commented 2 weeks ago

I read the GL_EXT_shader_16bit_storage extension spec, here: https://github.com/KhronosGroup/GLSL/blob/main/extensions/ext/GL_EXT_shader_16bit_storage.txt

It says the following about float16_t and types alike: "16-bit scalar or vector types can only be loaded from uniform or buffer memory, stored to buffer memory, or converted to and from 32-bit types via constructors "

This means we can do the following things to avoid generating invalid SPIR-V bitcode:

Make expressions like float16_t(float16_t var) a conversion error.
Skip the expressions like float16_t(float16_t var), since they are redundant.
First convert the float16_t var into float_t, and then float_t to float16_t, as this would comply with the extension specification, i.e. 16-bit scalar types can only be converted to and from 32-bit types.

I've a applied a temporary fix for the solution no 3, i.e. which now generates the following instructions:

; SPIR-V
; Version: 1.0
; Generator: Khronos Glslang Reference Front End; 11
; Bound: 38
; Schema: 0
               OpCapability Shader
               OpCapability StorageBuffer16BitAccess
               OpExtension "SPV_KHR_16bit_storage"
          %1 = OpExtInstImport "GLSL.std.450"
               OpMemoryModel Logical GLSL450
               OpEntryPoint GLCompute %main "main" %gl_GlobalInvocationID
               OpExecutionMode %main LocalSize 1 1 1
               OpSource GLSL 450
               OpSourceExtension "GL_EXT_shader_16bit_storage"
               OpName %main "main"
               OpName %i "i"
               OpName %gl_GlobalInvocationID "gl_GlobalInvocationID"
               OpName %D "D"
               OpMemberName %D 0 "data_d"
               OpName %_ ""
               OpName %A "A"
               OpMemberName %A 0 "data_a"
               OpName %__0 ""
               OpDecorate %gl_GlobalInvocationID BuiltIn GlobalInvocationId
               OpDecorate %_runtimearr_half ArrayStride 2
               OpMemberDecorate %D 0 NonReadable
               OpMemberDecorate %D 0 Offset 0
               OpDecorate %D BufferBlock
               OpDecorate %_ DescriptorSet 0
               OpDecorate %_ Binding 1
               OpDecorate %_runtimearr_half_0 ArrayStride 2
               OpMemberDecorate %A 0 NonWritable
               OpMemberDecorate %A 0 Offset 0
               OpDecorate %A BufferBlock
               OpDecorate %__0 DescriptorSet 0
               OpDecorate %__0 Binding 0
               OpDecorate %gl_WorkGroupSize BuiltIn WorkgroupSize
       %void = OpTypeVoid
          %3 = OpTypeFunction %void
       %uint = OpTypeInt 32 0
%_ptr_Function_uint = OpTypePointer Function %uint
     %v3uint = OpTypeVector %uint 3
%_ptr_Input_v3uint = OpTypePointer Input %v3uint
%gl_GlobalInvocationID = OpVariable %_ptr_Input_v3uint Input
     %uint_0 = OpConstant %uint 0
%_ptr_Input_uint = OpTypePointer Input %uint
       %half = OpTypeFloat 16
%_runtimearr_half = OpTypeRuntimeArray %half
          %D = OpTypeStruct %_runtimearr_half
%_ptr_Uniform_D = OpTypePointer Uniform %D
          %_ = OpVariable %_ptr_Uniform_D Uniform
        %int = OpTypeInt 32 1
      %int_0 = OpConstant %int 0
%_runtimearr_half_0 = OpTypeRuntimeArray %half
          %A = OpTypeStruct %_runtimearr_half_0
%_ptr_Uniform_A = OpTypePointer Uniform %A
        %__0 = OpVariable %_ptr_Uniform_A Uniform
%_ptr_Uniform_half = OpTypePointer Uniform %half
      %float = OpTypeFloat 32
     %uint_1 = OpConstant %uint 1
%gl_WorkGroupSize = OpConstantComposite %v3uint %uint_1 %uint_1 %uint_1
       %main = OpFunction %void None %3
          %5 = OpLabel
          %i = OpVariable %_ptr_Function_uint Function
         %14 = OpAccessChain %_ptr_Input_uint %gl_GlobalInvocationID %uint_0
         %15 = OpLoad %uint %14
               OpStore %i %15
         %23 = OpLoad %uint %i
         %28 = OpLoad %uint %i
         %30 = OpAccessChain %_ptr_Uniform_half %__0 %int_0 %28
         %31 = OpLoad %half %30
         %33 = OpFConvert %float %31
         %34 = OpFConvert %half %33
         %35 = OpAccessChain %_ptr_Uniform_half %_ %int_0 %23
               OpStore %35 %34
               OpReturn
               OpFunctionEnd

Which also passes the spir-v validator.

I'll raise a pull request once I've verified the existing test case to make sure my changes doesn't break anything.

FanShupei commented 2 weeks ago

Thanks for your work. I'm not an expert in GLSL spec, I just feel the method 3 is too wired. Personally I think the method 2 (skip it since it's redundant) is actually what the standard implies.

Although the GL_EXT_shader_16bit_storage extension spec does not explicit list float16_t(float16_t) in the conversion table (I believe you interpret it as the spec disallows identity conversions of 16bit types). However, I notice that the Chapter 5 of GLSL spec does not explicit list other identity conversions (like float(float) ) either.

Instead, Chapter 5.4 (Operators and Expressions/Conversion) says "Identity constructors, like float(float) are also legal, but of little use.".

In summary, GL_EXT_shader_16bit_storage'x text should be read together with Chapter 5.4. Although the text does not explicitly says identity conversions are noop, float(float) and float16_t(float16_t) are not treat differently. If we have have consensus float(float) is a noop, float16_t(float16_t) should be also regarded as a noop.

ravi688 commented 2 weeks ago

Somebody has added the following comment when generating intermediate representation for float16_t constructor in ParserHelper.cpp:TParseContext::constructBuiltIn:

// 8/16-bit storage extensions don't support constructing composites of 8/16-bit types,
// so construct a 32-bit type and convert

Is that true? I see that constructing composites (vec4 etc.) of float16_t is completely possible.

ravi688 commented 2 weeks ago

I think the following section should be removed from ParserHelper.cpp:

As that code is trying to generate float16_t to float conversion. Removing this code generates correct code also.

ravi688 commented 2 weeks ago

I've applied a correct fix for it and all tests are passing: ravi@devmachine:~/OpenSource/glslang/build$ ctest --output-on-failure Test project /home/ravi/OpenSource/glslang/build Start 1: glslang-testsuite 1/2 Test #1: glslang-testsuite ................ Passed 30.53 sec Start 2: glslang-gtests 2/2 Test #2: glslang-gtests ................... Passed 15.38 sec

100% tests passed, 0 tests failed out of 2

Total Test time (real) = 45.92 sec

However, I suspect we do not check the generated SPIR-V using FileCheck like tools or do we?

FanShupei commented 2 weeks ago

Disclaimer: I'm not an expert of GLSL expert and I'm not familiar with glslang, I just happened to write some GLSL code before. All the below are my personal understandings.

Is that true? I see that constructing composites (vec4 etc.) of float16_t is completely possible.

Yes for GL_EXT_shader_16bit_storage. The ext spec text explicit says, "Constructors of or using 8-bit and 16-bit vector types must construct from a type with the same number of components. The following is a complete list of such constructors"

I interpret it as GL_EXT_shader_16bit_storage does not allow constucting a f16vec from float16_t scalar compoents.

However, EXT_shader_explicit_arithmetic_types does allow constructing f16vec from float16_t scalar components.

As that code is trying to generate float16_t to float conversion. Removing this code generates correct code also.

The code seems somewhat reasonable? However, I think it should only be applied to composite types (Vec and Mat), not be applied to 8/16-bit scalar types.

ravi688 commented 2 weeks ago

Applied the fix and raised a pull request here: https://github.com/KhronosGroup/glslang/pull/3622

Yes, you're correct we do not need to remove the code.

KhronosGroup / glslang

Incorrect codegen when there is redundant float16_t conversion #3616

Version