darksylinc / betsy

Betsy GPU compressor
Other
314 stars 18 forks source link

Texture swizzling optimization for the BC4 shader #14

Open nanley opened 2 years ago

nanley commented 2 years ago

The BC4 shader takes in a uniform that acts as a channel selector. With a couple ternaries, it's used to determine which channel the shader should compress. Instead of relying on this uniform, it's possible to set a texture swizzle on the texture object to remap the desired channel for compression to red/x. Then, the ternaries can be dropped. For some drivers, this optimization can save a bit of math when executing the shader.

darksylinc commented 2 years ago

Originally I used RG_32 because of their ease to reinterpret in UAVs to other formats (all formats can reinterpret to 32 bit uint) and because I was unsure if RGBA16 can be copied to BC4 in OpenGL ES. It certainly can be in DX11/12; but I was unsure for the other APIs.

nanley commented 2 years ago

Maybe I'm misunderstanding, but there shouldn't be a problematic interaction with the BC4/RG_32 destination texture. The swizzling would only impact the RGBA8 source texture.

darksylinc commented 2 years ago

Nope, I'm the one who misunderstood I think.

Could you post snippets of code of what/where you mean?

nanley commented 2 years ago

Sure:

diff --git a/bin/Data/bc4.glsl b/bin/Data/bc4.glsl
index 1c8cbe3..717ff82 100644
--- a/bin/Data/bc4.glsl
+++ b/bin/Data/bc4.glsl
@@ -8,10 +8,7 @@
 shared float2 g_minMaxValues[4u * 4u * 4u];
 shared uint2 g_mask[4u * 4u];

-layout( location = 0 ) uniform float2 params;
-
-#define p_channelIdx params.x
-#define p_useSNorm params.y
+layout( location = 0 ) uniform float p_useSNorm;

 uniform sampler2D srcTex;

@@ -45,10 +42,7 @@ void main()
        for( uint i = 0u; i < 4u; ++i )
        {
                const uint2 pixelsToLoad = pixelsToLoadBase + uint2( i, blockThreadId );
-
-               const float4 value = OGRE_Load2D( srcTex, int2( pixelsToLoad ), 0 ).xyzw;
-               srcPixel[i] = p_channelIdx == 0 ? value.x : ( p_channelIdx == 1 ? value.y : value.w );
-               srcPixel[i] *= 255.0f;
+               srcPixel[i] = OGRE_Load2D( srcTex, int2( pixelsToLoad ), 0 ).x * 255.0f;
        }

        minVal = min3( srcPixel.x, srcPixel.y, srcPixel.z );
diff --git a/src/betsy/EncoderBC1.cpp b/src/betsy/EncoderBC1.cpp
index 4cdccd7..93b0f79 100644
--- a/src/betsy/EncoderBC1.cpp
+++ b/src/betsy/EncoderBC1.cpp
@@ -127,11 +127,12 @@ namespace betsy
                if( m_bc4TargetRes )
                {
                        // Compress Alpha too (using BC4)
+                       glTexParameteri( GL_TEXTURE_2D, GL_TEXTURE_SWIZZLE_R, GL_ALPHA );
                        bindComputePso( m_bc4Pso );
                        bindUav( 0u, m_bc4TargetRes, PFG_RG32_UINT, ResourceAccess::Write );

-                       // p_channelIdx, p_useSNorm
-                       glUniform2f( 0, 3.0f, 0.0f );
+                       // p_useSNorm
+                       glUniform1f( 0, 0.0f );

                        glDispatchCompute( 1u,  //
                                                           alignToNextMultiple( m_width, 16u ) / 16u,
darksylinc commented 2 years ago

Ahh!! Now I see what you mean.

I'm used to code API-agnostically and D3D11 does not have swizzling. D3D12 added it again with D3D12_SHADER_COMPONENT_MAPPING

In Vulkan swizzling means creating a different VkImageView.

@reduz what's easier for you? Creating a new VkImageView may yield slightly faster performance, while having a uniform (p_channelIdx) means no need for a second VkImageView.