KhronosGroup / glslang

Khronos-reference front end for GLSL/ESSL, partial front end for HLSL, and a SPIR-V generator.
Other
3.01k stars 829 forks source link

HLSL: loop unrolled constant offsets generate Offset instead of ConstOffset to OpImageFetch #2877

Open danginsburg opened 2 years ago

danginsburg commented 2 years ago

The following shader will produce a runtime validation error (note: spirv-val does not catch this, but Vulkan validation does):

[VUID-StandaloneSpirv-Offset-04663] Image Operand Offset can only be used with OpImage*Gather operations"

Shader:

Texture2D<float4>             g_inoutColorReadonly;
struct PS_INPUT
{
   uint2 vTexCoord : TEXCOORD0;
};

float4 LoadSourceColor( uint2 pixelPos, int2 offset )
{
    float4 color = g_inoutColorReadonly.Load( int3( pixelPos, 0 ), offset );
    return color;
}
float4 MainPs( PS_INPUT ps_input ) : SV_TARGET0
{
    float4 color = float4( 0, 0, 0, 0 );
    [unroll]
    for ( int i = 0; i < 2; i++ )
    {
        color += LoadSourceColor( ps_input.vTexCoord.xy, uint2( i, i ) );
    }
   return color;
}

Compiled with: glslangValidator -D -e MainPs -S frag -V test.hlsl -o test.spv

The SPIR-V output contains the following instruction:

%138 = OpImageFetch %v4float %124 %130 Lod|Offset %int_0 %110

I believe the issue is that the compiler isn't able to figure out that uint( i, i ) in an unrolled loop could be a ConstOffset (if you change it to pass in a hardcoded constant, it will change from Offset to ConstOffset). I tested this shader on dxc and it produces the same output for that instruction. I guess it's a little questionable whether this is something glslang should handle, but it is accepted by fxc.

danginsburg commented 2 years ago

Note that if I add a spirv-opt loop unroll pass to the generated SPIR-V then it does generate ConstOffset.

greg-lunarg commented 2 years ago

This appears to be another case where optimization/legalization is required to generate valid code. In this case, the optimization is unrolling.

glslang does not do unrolling as part of legalization, partly it didn't exist when I first put together legalization, partly because I was always a little worried about the effect on shader size once it showed up, and partly because the issue never was forced. Now I think it is forced.

For legalization, I would only do unrolling of loops that are marked for unrolling in the HLSL, as this one is.

Do you use the glslang legalization/optimization, or do you do your own? If on your own, then you will need to add it to your recipe. I still need to do a little diligence to get this right for glslang. I will let you know what I come up with for a final solution, particularly what the final legalization recipe looks like.

danginsburg commented 2 years ago

Do you use the glslang legalization/optimization, or do you do your own?

We do our own. I've worked around this problem locally by exposing the ability in our shader toolchain to opt-in to an unroll pass and opting into unrolling for the offending shader. While I agree this might not be possible to solve without force unrolling, I'm not sure it makes sense to always force unroll loops because as you said it will bloat code size and could also be harder for IHVs to optimize. Maybe the middle ground would be having an option to glslang to unroll loops during legalization that defaults off?

greg-lunarg commented 2 years ago

Maybe glslang only enables unrolling if it detects an offset image operand applied to a non-OpImageGather instruction?

greg-lunarg commented 2 years ago

FWIW, I believe that this is a pretty reasonable recipe with unrolling. It is just the current glslang size/legalization recipe with CreateLocalMultiStoreElimPass() moved up (not sure how it got so low in the recipe) and CreateLoopUnrollPass(true) placed right after it:

if (options->stripDebugInfo) {
        optimizer.RegisterPass(spvtools::CreateStripDebugInfoPass());
    }
    optimizer.RegisterPass(spvtools::CreateWrapOpKillPass());
    optimizer.RegisterPass(spvtools::CreateDeadBranchElimPass());
    optimizer.RegisterPass(spvtools::CreateMergeReturnPass());
    optimizer.RegisterPass(spvtools::CreateInlineExhaustivePass());
    optimizer.RegisterPass(spvtools::CreateEliminateDeadFunctionsPass());
    optimizer.RegisterPass(spvtools::CreateScalarReplacementPass());
    optimizer.RegisterPass(spvtools::CreateLocalAccessChainConvertPass());
    optimizer.RegisterPass(spvtools::CreateLocalSingleBlockLoadStoreElimPass());
    optimizer.RegisterPass(spvtools::CreateLocalSingleStoreElimPass());
    optimizer.RegisterPass(spvtools::CreateSimplificationPass());
    optimizer.RegisterPass(spvtools::CreateLocalMultiStoreElimPass());
    optimizer.RegisterPass(spvtools::CreateLoopUnrollPass(true));
    optimizer.RegisterPass(spvtools::CreateAggressiveDCEPass());
    optimizer.RegisterPass(spvtools::CreateVectorDCEPass());
    optimizer.RegisterPass(spvtools::CreateDeadInsertElimPass());
    optimizer.RegisterPass(spvtools::CreateAggressiveDCEPass());
    optimizer.RegisterPass(spvtools::CreateDeadBranchElimPass());
    optimizer.RegisterPass(spvtools::CreateBlockMergePass());
    optimizer.RegisterPass(spvtools::CreateIfConversionPass());
    optimizer.RegisterPass(spvtools::CreateSimplificationPass());
    optimizer.RegisterPass(spvtools::CreateAggressiveDCEPass());
    optimizer.RegisterPass(spvtools::CreateVectorDCEPass());
    optimizer.RegisterPass(spvtools::CreateDeadInsertElimPass());
    optimizer.RegisterPass(spvtools::CreateInterpolateFixupPass());
    if (options->optimizeSize) {
        optimizer.RegisterPass(spvtools::CreateRedundancyEliminationPass());
    }
    optimizer.RegisterPass(spvtools::CreateAggressiveDCEPass());
    optimizer.RegisterPass(spvtools::CreateCFGCleanupPass());
greg-lunarg commented 2 years ago

BTW, are using CreateEliminateDeadMembersPass()? This gives a pretty nice size savings, around 5%. I need to add it to the glslang size optimization (-Os).

greg-lunarg commented 2 years ago

It seems to work pretty well near the end. I would put it 2nd to last.

greg-lunarg commented 2 years ago

Re the original issue, the solution that we decided on was to create an option to glslang to include unrolling as part of legalization in case the shader source is, as in this case, depending on unrolling to generate valid code. If a user gets invalid code from glslang, they would recompile with this option enabled.