Updating value based on OpCompositeInsert or OpCompositeConstruct

jaebaek commented 5 years ago

This issue was originally reported as a DXC issue.

When we compile the following HLSL with dxc -Tvs_6_0 -Emain -spirv,

struct VtxInput {
   float3 nrm :ATTR2;
   float4 tan :ATTR3;
};

float4 main(VtxInput vtx, out float3x3 mtrx:MTRX):SV_Position
{
    mtrx[0] = vtx.nrm;
    mtrx[1] = vtx.tan.xyz;
    mtrx[2] = 0;
    return 0;
}

it generates the following SPIR-V (which is the result of spirv-opt):

; SPIR-V
; Version: 1.0
; Generator: Google spiregg; 0
; Bound: 27
; Schema: 0
               OpCapability Shader
               OpMemoryModel Logical GLSL450
               OpEntryPoint Vertex %main "main" %in_var_ATTR2 %in_var_ATTR3 %gl_Position %out_var_MTRX
               OpSource HLSL 600
               OpName %in_var_ATTR2 "in.var.ATTR2"
               OpName %in_var_ATTR3 "in.var.ATTR3"
               OpName %out_var_MTRX "out.var.MTRX"
               OpName %main "main"
               OpDecorate %gl_Position BuiltIn Position
               OpDecorate %in_var_ATTR2 Location 0
               OpDecorate %in_var_ATTR3 Location 1
               OpDecorate %out_var_MTRX Location 0
      %float = OpTypeFloat 32
    %float_0 = OpConstant %float 0
    %v3float = OpTypeVector %float 3
          %9 = OpConstantComposite %v3float %float_0 %float_0 %float_0
    %v4float = OpTypeVector %float 4
         %11 = OpConstantComposite %v4float %float_0 %float_0 %float_0 %float_0
%_ptr_Input_v3float = OpTypePointer Input %v3float
%_ptr_Input_v4float = OpTypePointer Input %v4float
%_ptr_Output_v4float = OpTypePointer Output %v4float
%mat3v3float = OpTypeMatrix %v3float 3
%_ptr_Output_mat3v3float = OpTypePointer Output %mat3v3float
       %void = OpTypeVoid
         %18 = OpTypeFunction %void
%in_var_ATTR2 = OpVariable %_ptr_Input_v3float Input
%in_var_ATTR3 = OpVariable %_ptr_Input_v4float Input
%gl_Position = OpVariable %_ptr_Output_v4float Output
%out_var_MTRX = OpVariable %_ptr_Output_mat3v3float Output
         %19 = OpUndef %mat3v3float
       %main = OpFunction %void None %18
         %20 = OpLabel
         %21 = OpLoad %v3float %in_var_ATTR2
         %22 = OpLoad %v4float %in_var_ATTR3
         %23 = OpCompositeInsert %mat3v3float %21 %19 0
         %24 = OpVectorShuffle %v3float %22 %22 0 1 2
         %25 = OpCompositeInsert %mat3v3float %24 %23 1
         %26 = OpCompositeInsert %mat3v3float %9 %25 2
               OpStore %gl_Position %11
               OpStore %out_var_MTRX %26
               OpReturn
               OpFunctionEnd

Based on what SPIR-V spec says, "OpCompositeInsert Make a copy of a composite object, while modifying one part of it.". In my opinion, we can change the following part

         %19 = OpUndef %mat3v3float
...
         %21 = OpLoad %v3float %in_var_ATTR2
         %22 = OpLoad %v4float %in_var_ATTR3
         %23 = OpCompositeInsert %mat3v3float %21 %19 0
         %24 = OpVectorShuffle %v3float %22 %22 0 1 2
         %25 = OpCompositeInsert %mat3v3float %24 %23 1
         %26 = OpCompositeInsert %mat3v3float %9 %25 2
...
               OpStore %out_var_MTRX %26

to

         %21 = OpLoad %v3float %in_var_ATTR2
         %22 = OpLoad %v4float %in_var_ATTR3
         %24 = OpVectorShuffle %v3float %22 %22 0 1 2
         %26 = OpCompositeConstruct %mat3v3float %21 %24 %9
               OpStore %out_var_MTRX %26

may have the better performance. What do you think?

not_optimized.txt is the one generated by DXC without optimization i.e., dxc -Tvs_6_0 -Emain -spirv -fcgl.

dneto0 commented 5 years ago

It's impossible to tell if there is a performance difference just by staring at the code. The driver's shader compiler will process the code, and may collapse those insertions anyway. The sequence of inserts in the unoptimized code above is actually typical for code coming from an LLVM-based compiler.

We should only do this transformation if we have evidence it actually does affect performance, or if there is another compelling reason to do so.

It turns out that, to work around a driver issue, I wrote an LLVM pass in Clspv to do this kind of rewriting. See https://github.com/google/clspv/blob/master/lib/RewriteInsertsPass.cpp#L265 But I did not do it for compactness or performance.

GregSlazinski commented 2 years ago

Vec4 Test_VS(VtxInput vtx, out MatrixH3 mtrx:MTRX):SV_Position
{
   mtrx[0]=vtx.nrm();
   mtrx[1]=vtx.tan();
   mtrx[2]=vtx.bin(0, 0);
   return 0;
}

When converting to GLSL results in extremely unoptimized code

mat3 _43;

void main()
{
    mat3 _45 = _43;
    _45[0] = ATTR2;
    mat3 _46 = _45;
    _46[1] = ATTR3.xyz;
    mediump vec3 _23 = cross(vec3(0.0), vec3(0.0)) * ATTR3.w;
    mat3 _47 = _46;
    _47[2] = vec3(_23.x, _23.y, _23.z);
    gl_Position = vec4(0.0);
    IO0 = _47;
}

Problem: in HLSL I setup only 1 matrix, on GLSL modifying it results in a new matrix object being created every time. Also it's initialized at the start from an invalid value mat3 _45 = _43;

sudonatalie commented 2 years ago

Thanks for following up @GregSlazinski . To clarify, as @dneto0 mentioned above, performance improvements to spirv-opt are made with the goal of producing SPIR-V with better performance on the drivers that run SPIR-V. No guarantees can be made about how other tools (I assume SPIRV-Cross in this case) will translate that SPIR-V into higher level languages. When compiling down to a low-level language and disassembling back up to a high-level language, it's very common to produce more verbose code, but that isn't evidence in itself that the compiled code would have had poor performance.

KhronosGroup / SPIRV-Tools

Updating value based on OpCompositeInsert or OpCompositeConstruct #2816