KhronosGroup / Vulkan-Docs

The Vulkan API Specification and related tools
Other
2.82k stars 470 forks source link

Copying between different image aspects #2079

Open turanszkij opened 1 year ago

turanszkij commented 1 year ago

Hello,

When copying images, it would be really useful to allow copying between different image aspects, namely copying from a color image to depth or stencil planes of a depthbuffer. This is something that is supported by DirectX 12, and in Vulkan there is no good substitute currently, especially for stencil because stencil can not be exported on a per pixel level on many GPUs.

The use case is that depth or stencil color texture is created in a compute pipeline with appropriate formats (eg. R32_FLOAT for depth, R8_UINT for stencil). Then the depth and stencil color textures would be copied to the corresponding planes of a D32S8 format depth buffer. Example:

VkImageCopy copy = {};
copy.dstSubresource.aspectMask = VK_IMAGE_ASPECT_STENCIL_BIT;
copy.srcSubresource.aspectMask = VK_IMAGE_ASPECT_COLOR_BIT;
// ...
vkCmdCopyImage(
    commandbuffer,
    src_resource, // source image in VK_FORMAT_R8_UINT
    VK_IMAGE_LAYOUT_TRANSFER_SRC_OPTIMAL,
    dst_resource, // destination image in VK_FORMAT_D32_SFLOAT_S8_UINT
    VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL,
    1,
    &copy
);

Apologies if this is not an appropriate issue.

Best regards, Janos

spencer-lunarg commented 1 year ago

(adding for ease of looking this up later) This would violate

VUID-vkCmdCopyImage-srcImage-01551 If neither srcImage nor dstImage has a multi-planar image format then for each element of pRegions, srcSubresource.aspectMask and dstSubresource.aspectMask must match

HansKristian-Work commented 1 year ago

Came up on call and we think it's a useful feature. It's on our radar at least.

turanszkij commented 1 year ago

If anyone encounters this, it is possible to workaround it by manually copying stencil bit-by bit. This is done by doing 8 full screen passes with following pipeline setup (pseudo-code):

BeginRenderPass(depthStencil_dst)
SetShaderResource(stencil_src)
for(int bit_index = 0; bit_index < 8; ++bit_index)
{
  uint bit = 1u << bit_index;
  SetPipelineState(depth=off, stencilwrite=on, stencilcompare=always, stencilPassOp=replace, stencilWriteMask=bit);
  PushConstant(bit);
  SetStencilRef(bit);
  Draw(3); // full screen triangle
}
EndRenderPass()

The vertex shader is a simple full screen triangle:

float4 main(uint vertexID : SV_VertexID) : SV_Position
{
    float4 pos;
    pos.x = (float)(vertexID / 2) * 4.0 - 1.0;
    pos.y = (float)(vertexID % 2) * 4.0 - 1.0;
    pos.z = 0;
    pos.w = 1;
    return pos;
}

The pixel shader discards itself based on the current bit and whether the current source texture contains that same bit in the current pixel and returns nothing:

Texture2D<uint> stencil_src;

struct StencilBitPush
{
    uint bit;
};
[[vk::push_constant]] StencilBitPush push;

void main(float4 pos : SV_Position)
{
    if ((stencil_src[uint2(pos.xy)] & push.bit) == 0)
        discard;
}

I expect this issue could be pretty common with GPU-driven renderers when whole scene is rendered in as few indirect calls as possible and breaking this up for every possible stencil ref is not feasible. Instead the stencil mask can be generated even in compute and this method can be used to fill the hardware stencil buffer at the end, which further post processing can use.

For depth, it is much more trivial, as pixel shader can simply read from color texture and write to depth with just one full screen draw:

Texture2D<float> depth_src;

float main(float4 pos : SV_Position) : SV_Depth
{
    return input_depth[uint2(pos.xy)];
}
ShabbyX commented 8 months ago

I'm not disputing the usefulness of this as was previously established (https://github.com/KhronosGroup/Vulkan-Docs/issues/2079#issuecomment-1488797625), but for this particular purpose:

The use case is that depth or stencil color texture is created in a compute pipeline with appropriate formats (eg. R32_FLOAT for depth, R8_UINT for stencil). Then the depth and stencil color textures would be copied to the corresponding planes of a D32S8 format depth buffer.

What's the benefit of this over having the compute pipeline fill in a buffer and use vkCmdCopyBufferToImage instead?