Looooong / Unity-SRP-VXGI

Voxel-based Global Illumination using Unity Scriptable Render Pipeline
MIT License
766 stars 62 forks source link

Optimize and refactor voxel shader #39

Closed Looooong closed 4 years ago

Looooong commented 4 years ago

Today, I find out that writing to three RWTexture3D is better than writing to a single RWByteAddress buffer. The VoxelShader aggregate step goes from 47ms to 3ms (using High resolution setting on my crappy laptop's GeForce 920MX). That is almost 16 times faster.

I also discover a bug in the HLSL to GLSL translator that doesn't properly translate multiple RWTexture3D readings (Unity case 1241093). This HLSL code:

RWTexture2D<float> ColorA;
RWTexture2D<float> ColorB;
RWTexture2D<float> ColorG;
RWTexture2D<float> ColorR;
RWTexture2D<float4> Result;

[numthreads(8,8,1)]
void CSMain (uint3 id : SV_DispatchThreadID)
{
  Result[id.xy] = float4(ColorR[id.xy], ColorG[id.xy], ColorB[id.xy], ColorA[id.xy]);
}

Get translated into this GLSL code:

readonly layout(binding=0, r32f) highp uniform image2D ColorA;
readonly layout(binding=1, r32f) highp uniform image2D ColorB;
readonly layout(binding=2, r32f) highp uniform image2D ColorG;
readonly layout(binding=3, r32f) highp uniform image2D ColorR;
writeonly layout(binding=4) uniform image2D Result;
ivec4 u_xlati0;
layout(local_size_x = 8, local_size_y = 8, local_size_z = 1) in;
void main()
{
    u_xlati0.x = floatBitsToInt(imageLoad(ColorR, ivec2(gl_GlobalInvocationID.xy)).x);
    u_xlati0.y = floatBitsToInt(imageLoad(ColorG, ivec2(gl_GlobalInvocationID.xy)).y);
    u_xlati0.z = floatBitsToInt(imageLoad(ColorB, ivec2(gl_GlobalInvocationID.xy)).z);
    u_xlati0.w = floatBitsToInt(imageLoad(ColorA, ivec2(gl_GlobalInvocationID.xy)).w);
    imageStore(Result, ivec2(gl_GlobalInvocationID.xy), intBitsToFloat(u_xlati0));
    return;
}

The expected GLSL code should be:

    // ...
    u_xlati0.x = floatBitsToInt(imageLoad(ColorR, ivec2(gl_GlobalInvocationID.xy)).x);
    u_xlati0.y = floatBitsToInt(imageLoad(ColorG, ivec2(gl_GlobalInvocationID.xy)).x);
    u_xlati0.z = floatBitsToInt(imageLoad(ColorB, ivec2(gl_GlobalInvocationID.xy)).x);
    u_xlati0.w = floatBitsToInt(imageLoad(ColorA, ivec2(gl_GlobalInvocationID.xy)).x);
    // ...