NVIDIAGameWorks / NRI

MIT License
173 stars 26 forks source link

[RFE] Instancing problem in GPU-driven rendering #61

Closed vertver closed 2 months ago

vertver commented 2 months ago

I'm trying to implement GPU-driven rendering with bindless descriptors and instances, and I have a problem with instance (specifically on D3D12). SV_InstanceID on D3D11/D3D12 starts from 0. However, in Vulkan it starts from firstInstance in draw indexed instanced command. That means that I can't use instance id in D3D12 or D3D11 because it always starts from 0. I can hack this and use additional vertex instance buffer with instance index for each draw, but this method adds complexity. So what do you think, how can we fix this problem?

https://github.com/gpuweb/gpuweb/issues/901 - Problem description

dzhdanNV commented 2 months ago

Probably that's the solution: https://github.com/microsoft/DirectXShaderCompiler/pull/5770 Already implemented: https://microsoft.github.io/DirectX-Specs/d3d/HLSL_ShaderModel6_8.html#extended-command-information

vertver commented 2 months ago

Does this semantic works only on newer GPUs? Or is this semantic platform-agnostic?

dzhdanNV commented 2 months ago

A takeaway from your post (thanks!) could be the fact that currently NRI doesn't include a compatibility HLSL include file offering some standardization bits between C++/D3D11/D3D12/VK. Here is what I have been using:

// TODO: rename to Compatibility.hlsli?

#define NRI_MERGE_TOKENS(a, b) a##b

#if (defined(__cplusplus))
    #define NRI_RESOURCE(resourceType, resourceName, regName, bindingIndex, setIndex) struct resourceName
#elif (defined(COMPILER_DXC))
    #if (defined(VULKAN))
        #define NRI_RESOURCE(resourceType, resourceName, regName, bindingIndex, setIndex) \
            resourceType resourceName : register(NRI_MERGE_TOKENS(regName, bindingIndex), NRI_MERGE_TOKENS(space, setIndex))

        #define NRI_PUSH_CONSTANTS(structName, constantBufferName, bindingIndex) [[vk::push_constant]] structName constantBufferName
        #define PRINTF_AVAILABLE
    #else
        #define NRI_RESOURCE(resourceType, resourceName, regName, bindingIndex, setIndex) \
            resourceType resourceName : register(NRI_MERGE_TOKENS(regName, bindingIndex), NRI_MERGE_TOKENS(space, setIndex))

        #define NRI_PUSH_CONSTANTS(structName, constantBufferName, bindingIndex) ConstantBuffer<structName> constantBufferName : register(NRI_MERGE_TOKENS(b, bindingIndex), space0)
    #endif
#else
    #define NRI_RESOURCE(resourceType, resourceName, regName, bindingIndex, setIndex) resourceType resourceName : register(NRI_MERGE_TOKENS(regName, bindingIndex))

    #define NRI_PUSH_CONSTANTS(structName, constantBufferName, bindingIndex) \
        cbuffer structName##_##constantBufferName : register(NRI_MERGE_TOKENS(b, bindingIndex)) { \
            structName constantBufferName; \
        }
#endif

// TODO: move to a better place
#define NRI_UV_TO_CLIP(uv) (uv * float2(2, -2) + float2(-1, 1))
#define NRI_CLIP_TO_UV(clip) (clip * float2(0.5, -0.5) + 0.5)

This file can be improved to offer compatibility in this area.

dzhdanNV commented 2 months ago

Does this semantic works only on newer GPUs? Or is this semantic platform-agnostic?

It's a good question. I don't see a reason why SM6.8 shouldn't work on older HW. But since the support comes with new drivers... if a HW has been deprecated in drivers, than, yes, no support definitely. An experiment is needed...

dzhdanNV commented 2 months ago

A starting point: https://learn.microsoft.com/en-us/windows/win32/api/d3d12/ns-d3d12-d3d12_feature_data_shader_model

BTW, just realized that no D3D11 support, which is stuck in FXC world :(

vertver commented 2 months ago

This also seems to require the Agility SDK on older Windows, so I think it might be a hack for older GPUs to emulate instance id via extra vertex instance buffer. You can just fill an entire vertex buffer with incremented integers to pass them directly to GPU to use them as instance ids.

BTW, just realized that no D3D11 support, which is stuck in FXC world :(

Yes, that's why I disabled the D3D11 backend, because I can't use spaces for arrays of resources. Also, using root constant emulation on D3D11 for SRV indexes has a big impact on performance.

example from my ImGui Shader:

struct ImGuiConstants
{
    float2 InvScreenSize;
    float IsImage;
    float Padding0;
    float Padding1;
    float Padding2;
    uint TextureIdx;
    uint SamplerIdx;
};

#ifdef VULKAN
[[vk::push_constant]] ImGuiConstants Constants;
#else
ConstantBuffer<ImGuiConstants> Constants : register(b0, space0); // Root Constant
#endif

struct ImPixel
{
    float4 pos : SV_POSITION;
    float2 uv : TEXCOORD0;
    float4 col : COLOR0;
};

float4 ps_main(ImPixel Pixel) : SV_Target
{
    SamplerState Sampler = Samplers[Constants.SamplerIdx];
    Texture2D Tex = Tex2DTable[Constants.TextureIdx];
    float4 color = Pixel.col;

    if (Constants.IsImage == 1.0f) {
        color = Tex.Sample(Sampler, Pixel.uv);
        color.xyz = lin2srgb(color.xyz);
    } else {
        color.w *= Tex.Sample(Sampler, Pixel.uv).r;
    }

    return color;
}
dzhdanNV commented 2 months ago

Added this to NRI:

    D3D12_FEATURE_DATA_SHADER_MODEL shaderModel = {D3D_SHADER_MODEL_6_9};
    HRESULT hr = m_Device->CheckFeatureSupport(D3D12_FEATURE_SHADER_MODEL, &shaderModel, sizeof(shaderModel));
    if (FAILED(hr))
        REPORT_WARNING(this, "ID3D12Device::CheckFeatureSupport(shaderModel) failed, result = 0x%08X!", hr);

RTX 4080: 6.8 supported Intel integrated GPU (13th Gen Intel(R) Core(TM) i9-13900K): 6.7 supported => not as bad, I believe I just need to update drivers!

vertver commented 2 months ago

not as bad, I believe I just need to update drivers!

I'm targeting SM6_0 as a minimum target, so I'm not using any new features to support all the GPUs. Also, I just checked - AMD RX 4xx and RX 5xx (GCN 3.0) doesn't support SM6_8

vertver commented 2 months ago

I just found that dawn does the same hack I suggested. So maybe it's not as bad as I thought. Another one

dzhdanNV commented 2 months ago

Thanks. Looks interesting. Interestingly what does VKD3D use? ;)

P.S. I updated Intel drivers, same 6.7 support :(

vertver commented 2 months ago

Thanks. Looks interesting. Interestingly what does VKD3D use? ;)

Since D3D11 and D3D12 don't support instance index starting from firstInstance, they just haven't implemented the feature. There is a couple of ways to implement this feature, and the variant with root constants only works for non-indirect solutions (since we don't have methods to dynamically offset descriptors for each draw or we don't have access to indirect arguments from the vertex shader). The second solution is simply offsetting the instance index in the vertex instance buffer via StartInstanceIndex, so I'll implement this solution in my project first and then try to explore more.

References: https://github.com/google/dawn/blob/ddd45561455cda0def30fcabf6748a89a44fd23c/src/tint/lang/wgsl/ast/transform/offset_first_index.h#L35

vertver commented 2 months ago

Hm, I might be wrong, but it seems possible to use root constants for indirect draw argument. Since we easily can write values from compute shader, I don't see any problems to specifically emulate first instance and vertex instance via 2 root constants for D3D12

vertver commented 2 months ago

So, my proposal for features baseArguments and baseArgumentsIndirect:

Vulkan: enable baseArguments function if VK_KHR_shader_draw_parameters is supported or if core version 1.1 is supported. If drawIndirectFirstInstance is supported - allow user to enable this feature. If drawIndirectFirstInstance is enabled - use [[vk::builtin(‘BaseInstance’)]] in HLSL shaders.

D3D12: enable baseArguments and baseArgumentsIndirect if user set this option. If baseArguments is enabled, the NRI will use tthe first or last 8 bytes of the vertex root constants for the BaseVertex and BaseInstance parameters if the specified command was called. Otherwise, do nothing. If baseArgumentsIndirect is activated, the NRI will use the first or last 8 bytes of indirect drawing arguments for the vertex root constants for the BaseVertex and BaseInstance parameters if the specified command was called. If baseArguments or baseArgumentsIndirect are activated - add an additional 2 32-bit constants to the size and offset root constants and write these two arguments after or before all root constants.

D3D11: enable baseArguments feature if user setted this option, baseArgumentsIndirect is not supported. TODO

dzhdanNV commented 2 months ago

Thanks for digging. An MR is needed to wrap up the functionality + a comment clarifying HLSL additions. Some comments:

If drawIndirectFirstInstance is supported - allow user to enable this feature

Always enabled if supported. Not exposed in DeviceDesc.

[[vk::builtin(‘BaseInstance’)]] in HLSL shaders

And [[vk::builtin(‘BaseVertex’)]]...

D3D12: enable baseArguments and baseArgumentsIndirect if user set this option

It needs to be exposed in DeviceCreationDesc (or wrapper), right?

the NRI will use the first or last 8 bytes of the vertex root constants for the BaseVertex and BaseInstance parameters if the specified command was called...

I follow. But that's where I would like to see changes in an MR.

P.S. In D3D12 push constants mambo-jumbo is not needed if SM6.8 is supported, right?

vertver commented 2 months ago

It needs to be exposed in DeviceCreationDesc (or wrapper), right?

Yes, right.

I follow. But that's where I would like to see changes in an MR.

I'll start working on this tomorrow.

P.S. In D3D12 push constants mambo-jumbo is not needed if SM6.8 is supported, right?

Yes, right.

dzhdanNV commented 2 months ago

I'll start working on this tomorrow.

Thanks! Very appreciated!

P.S. Я тут покумекал... Here is the massaged HLSL frontend, containing compatibility features, which will be released alonside with the functionality requested by you:

#ifndef NRI_COMPATIBILITY_HLSLI
#define NRI_COMPATIBILITY_HLSLI

#ifndef __cplusplus
    #define NRI_MERGE_TOKENS(a, b) a##b
#endif

#define NRI_UV_TO_CLIP(uv) (uv * float2(2, -2) + float2(-1, 1))
#define NRI_CLIP_TO_UV(clip) (clip * float2(0.5, -0.5) + 0.5)

// Container detection
#ifdef __hlsl_dx_compiler
    #ifdef __spirv__
        #define NRI_SPIRV
        #define NRI_PRINTF_AVAILABLE
    #else
        #define NRI_DXIL
    #endif
#else
    #ifndef __cplusplus
        #define NRI_DXBC
    #endif
#endif

// Portability features
#ifdef NRI_DXBC
    #define NRI_SHADER_MODEL 50
#else
    #define NRI_SHADER_MODEL (__SHADER_TARGET_MAJOR * 10 + __SHADER_TARGET_MINOR)
#endif

#ifdef NRI_SPIRV
    #define NRI_RESOURCE(resourceType, resourceName, regName, bindingIndex, setIndex) \
        resourceType resourceName : register(NRI_MERGE_TOKENS(regName, bindingIndex), NRI_MERGE_TOKENS(space, setIndex))

    #define NRI_PUSH_CONSTANTS(structName, constantBufferName, bindingIndex) \
        [[vk::push_constant]] structName constantBufferName

    // Requires SPV_KHR_shader_draw_parameters
    #define NRI_BASE_VERTEX \
        [[vk::builtin("BaseVertex")]] int nriBaseVertex : _SV_Nothing

    #define NRI_BASE_INSTANCE \
        [[vk::builtin("BaseInstance")]] uint nriBaseInstance : _SV_Nothing
#endif

#ifdef NRI_DXIL
    #define NRI_RESOURCE(resourceType, resourceName, regName, bindingIndex, setIndex) \
        resourceType resourceName : register(NRI_MERGE_TOKENS(regName, bindingIndex), NRI_MERGE_TOKENS(space, setIndex))

    #define NRI_PUSH_CONSTANTS(structName, constantBufferName, bindingIndex) \
        ConstantBuffer<structName> constantBufferName : register(NRI_MERGE_TOKENS(b, bindingIndex), space0)

    #if (NRI_SHADER_MODEL >= 68)
        #define NRI_BASE_VERTEX \
            int nriBaseVertex : SV_StartVertexLocation

        #define NRI_BASE_INSTANCE \
            uint nriBaseInstance : SV_StartInstanceLocation
    #else
        // TODO
        #define NRI_BASE_VERTEX uint nriBaseVertex : SV_VertexID
        #define NRI_BASE_INSTANCE uint nriBaseInstance : SV_InstanceID
    #endif
#endif

#ifdef NRI_DXBC
    #define NRI_RESOURCE(resourceType, resourceName, regName, bindingIndex, setIndex) \
        resourceType resourceName : register(NRI_MERGE_TOKENS(regName, bindingIndex))

    #define NRI_PUSH_CONSTANTS(structName, constantBufferName, bindingIndex) \
        cbuffer structName##_##constantBufferName : register(NRI_MERGE_TOKENS(b, bindingIndex)) { \
            structName constantBufferName; \
        }

    // Unsupported
    #define NRI_BASE_VERTEX uint nriUnsupportedBaseVertex : SV_VertexID
    #define NRI_BASE_INSTANCE uint nriUnsupportedBaseInstance : SV_InstanceID
#endif

#ifdef __cplusplus
    #define NRI_RESOURCE(resourceType, resourceName, regName, bindingIndex, setIndex) \
        struct resourceName
#endif

#endif

It's WIP:

vertver commented 2 months ago

Since SV_VertexID / SV_InstanceID always start from BASE in VK, somehow we will have to replace their explicit usage with simple functions performing summation under the hood for D3D12.

I think to merge Vulkan and D3D12 into one codebase, we should add define NRI_ENABLE_BASE_ATTRIBUTES_SUPPORT to declare root constants for D3D12 and do nothing for Vulkan. We should also define NRI_PLATFORM_BASE_VERTEX and NRI_PLATFORM_BASE_INSTANCE, and use code like RootConstants.BaseInstance + NRI_PLATFORM_BASE_INSTANCE for D3D12.

dzhdanNV commented 2 months ago

(Updated NRICompatibility.hlsli in the comment above, still WIP for sure)

vertver commented 2 months ago

Some updates for compatibility layer:

// © 2024 NVIDIA Corporation

#ifndef NRI_COMPATIBILITY_HLSLI
#define NRI_COMPATIBILITY_HLSLI

#ifndef __cplusplus
    #define NRI_MERGE_TOKENS(a, b) a##b
#endif

#define NRI_UV_TO_CLIP(uv) (uv * float2(2, -2) + float2(-1, 1))
#define NRI_CLIP_TO_UV(clip) (clip * float2(0.5, -0.5) + 0.5)

// Container detection
#ifdef __hlsl_dx_compiler
    #ifdef __spirv__
        #define NRI_SPIRV
        #define NRI_PRINTF_AVAILABLE
    #else
        #define NRI_DXIL
    #endif
#else
    #ifndef __cplusplus
        #define NRI_DXBC
    #endif
#endif

// Portability features
#ifdef NRI_DXBC
    #define NRI_SHADER_MODEL 50
#else
    #define NRI_SHADER_MODEL (__SHADER_TARGET_MAJOR * 10 + __SHADER_TARGET_MINOR)
#endif

#ifdef NRI_SPIRV
    #define NRI_RESOURCE(resourceType, resourceName, regName, bindingIndex, setIndex) \
        resourceType resourceName : register(NRI_MERGE_TOKENS(regName, bindingIndex), NRI_MERGE_TOKENS(space, setIndex))

    #define NRI_PUSH_CONSTANTS(structName, constantBufferName, bindingIndex) \
        [[vk::push_constant]] structName constantBufferName

    // Requires SPV_KHR_shader_draw_parameters
    #define NRI_ENABLE_BASE_ATTRIBUTES(bindingIndex)
    #define NRI_DECLARE_BASE_VERTEX \
        [[vk::builtin("BaseVertex")]] int nriBaseVertex : _SV_Nothing

    #define NRI_DECLARE_BASE_INSTANCE \
        [[vk::builtin("BaseInstance")]] uint nriBaseInstance : _SV_Nothing

    #define NRI_BASE_VERTEX(input) input.nriBaseVertex;
    #define NRI_BASE_INSTANCE(input) input.nriBaseInstance;
#endif

#ifdef NRI_DXIL
    #define NRI_RESOURCE(resourceType, resourceName, regName, bindingIndex, setIndex) \
        resourceType resourceName : register(NRI_MERGE_TOKENS(regName, bindingIndex), NRI_MERGE_TOKENS(space, setIndex))

    #define NRI_PUSH_CONSTANTS(structName, constantBufferName, bindingIndex) \
        ConstantBuffer<structName> constantBufferName : register(NRI_MERGE_TOKENS(b, bindingIndex), space0)

    #if (NRI_SHADER_MODEL >= 68)
    #define NRI_ENABLE_BASE_ATTRIBUTES(bindingIndex)
        #define NRI_DECLARE_BASE_VERTEX \
            int nriBaseVertex : SV_StartVertexLocation

        #define NRI_DECLARE_BASE_INSTANCE \
            uint nriBaseInstance : SV_StartInstanceLocation

        #define NRI_BASE_VERTEX(input) input.nriBaseVertex
        #define NRI_BASE_INSTANCE(input) input.nriBaseInstance
    #else
        #define NRI_ENABLE_BASE_ATTRIBUTES(bindingIndex) \
            struct BaseAttributeConstants { \
                uint BaseVertex; \
                uint BaseInstance; \
            }; \
            ConstantBuffer<BaseAttributeConstants> __BaseAttributes : register(NRI_MERGE_TOKENS(b, bindingIndex), space0)

        // TODO
        #define NRI_DECLARE_BASE_VERTEX uint nriBaseVertex : SV_VertexID
        #define NRI_DECLARE_BASE_INSTANCE uint nriBaseInstance : SV_InstanceID
        #define NRI_BASE_VERTEX(input) (__BaseAttributes.BaseVertex + input.nriBaseVertex)
        #define NRI_BASE_INSTANCE(input) (__BaseAttributes.BaseInstance + input.nriBaseInstance)
    #endif
#endif

#ifdef NRI_DXBC
    #define NRI_RESOURCE(resourceType, resourceName, regName, bindingIndex, setIndex) \
        resourceType resourceName : register(NRI_MERGE_TOKENS(regName, bindingIndex))

    #define NRI_PUSH_CONSTANTS(structName, constantBufferName, bindingIndex) \
        cbuffer structName##_##constantBufferName : register(NRI_MERGE_TOKENS(b, bindingIndex)) { \
            structName constantBufferName; \
        }

    // Unsupported
    #define NRI_ENABLE_BASE_ATTRIBUTES(bindingIndex)
    #define NRI_DECLARE_BASE_VERTEX uint nriUnsupportedBaseVertex : SV_VertexID
    #define NRI_DECLARE_BASE_INSTANCE uint nriUnsupportedBaseInstance : SV_InstanceID
    #define NRI_BASE_VERTEX(input) input.nriBaseVertex
    #define NRI_BASE_INSTANCE(input) input.nriBaseInstance
#endif

#ifdef __cplusplus
    #define NRI_RESOURCE(resourceType, resourceName, regName, bindingIndex, setIndex) \
        struct resourceName
#endif

#endif
dzhdanNV commented 2 months ago

Thanks! Here is the reworked variant (most important bits):

SPIRV:

    // Draw parameters (requires SPV_KHR_shader_draw_parameters)
    #define NRI_ENABLE_DRAW_PARAMETERS(bindingIndex)

    #define NRI_DECLARE_DRAW_PARAMETERS \
        int NRI_VERTEX_ID_OFFSET : SV_VertexID, \
        uint NRI_INSTANCE_ID_OFFSET : SV_InstanceID, \
        [[vk::builtin("BaseVertex")]] int NRI_BASE_VERTEX : _SV_Nothing, \
        [[vk::builtin("BaseInstance")]] uint NRI_BASE_INSTANCE : _SV_Nothing

    #define NRI_VERTEX_ID (NRI_VERTEX_ID_OFFSET - NRI_BASE_VERTEX)
    #define NRI_INSTANCE_ID (NRI_INSTANCE_ID_OFFSET - NRI_BASE_INSTANCE)

DXIL:

    // Draw parameters
    #if (NRI_SHADER_MODEL >= 68)
        #define NRI_ENABLE_DRAW_PARAMETERS(bindingIndex)

        #define NRI_DECLARE_DRAW_PARAMETERS \
            int NRI_VERTEX_ID : SV_VertexID, \
            uint NRI_INSTANCE_ID : SV_InstanceID, \
            uint NRI_BASE_VERTEX : SV_StartVertexLocation, \
            uint NRI_BASE_INSTANCE : SV_StartInstanceLocation
    #else
        #define NRI_ENABLE_DRAW_PARAMETERS(bindingIndex) \
            struct _BaseAttributeConstants { \
                uint BaseVertex; \
                uint BaseInstance; \
            }; \
            ConstantBuffer<_BaseAttributeConstants> _BaseAttributes : register(NRI_MERGE_TOKENS(b, bindingIndex), space0)

        #define NRI_BASE_VERTEX _BaseAttributes.BaseVertex
        #define NRI_BASE_INSTANCE _BaseAttributes.BaseInstance

        #define NRI_DECLARE_DRAW_PARAMETERS \
            uint NRI_VERTEX_ID : SV_VertexID, \
            uint NRI_INSTANCE_ID : SV_InstanceID
    #endif

    #define NRI_VERTEX_ID_OFFSET (NRI_BASE_VERTEX + NRI_VERTEX_ID)
    #define NRI_INSTANCE_ID_OFFSET (NRI_BASE_INSTANCE + NRI_INSTANCE_ID)

DXBC:

    // Draw parameters (partially supported)
    #define NRI_ENABLE_DRAW_PARAMETERS(bindingIndex)

    #define NRI_DECLARE_DRAW_PARAMETERS \
        uint NRI_VERTEX_ID : SV_VertexID, \
        uint NRI_INSTANCE_ID : SV_InstanceID

    #define NRI_BASE_VERTEX NRI_BASE_VERTEX_is_unsupported
    #define NRI_BASE_INSTANCE NRI_BASE_INSTANCE_is_unsupported
    #define NRI_VERTEX_ID_OFFSET NRI_VERTEX_ID_OFFSET_is_unsupported
    #define NRI_INSTANCE_ID_OFFSET NRI_INSTANCE_ID_OFFSET_is_unsupported

Usage:

Notes:

_OFFSET - does it sound good? Rename?

dzhdanNV commented 2 months ago

Full file: NRICompatibility.zip

vertver commented 2 months ago

_OFFSET - does it sound good? Rename?

Yes, it works fine for me. I'd also prefer to do some wrapper around the draw commands for indirect drawing, but that's the next step. Right now I'm making a basic implementation of the example using baseInstance.

vertver commented 2 months ago

I'm trying to visualise the instance id through the shader, but it seems very broken. What am I doing wrong?

struct Input
{
    float3 Position : POSITION;
    float2 TexCoord : TEXCOORD0;
    float3 Normal : NORMAL;
    float4 Tangent : TANGENT;
};

struct Attributes
{
    float4 Position : SV_Position;
    float4 Normal : TEXCOORD0; //.w = TexCoord.x
    float4 View : TEXCOORD1; //.w = TexCoord.y
    float4 Tangent : TEXCOORD2;
    float4 BaseAttributes : TEXCOORD3;  // .x = firstInstance, .y = firstVertex, .z = instanceId, .w - vertexId
};

NRI_RESOURCE( cbuffer, Global, b, 0, 0 )
{
    float4x4 gWorldToClip;
    float3 gCameraPos;
};

#ifdef NRI_DXBC
Attributes main( in Input input )
#else
Attributes main( in Input input, NRI_DECLARE_DRAW_PARAMETERS )
#endif
{
    Attributes output;

    float3 N = input.Normal * 2.0 - 1.0;
    float4 T = input.Tangent * 2.0 - 1.0;
    float3 V = gCameraPos - input.Position;

    output.Position = mul( gWorldToClip, float4( input.Position, 1 ) );
    output.Normal = float4( N, input.TexCoord.x );
    output.View = float4( V, input.TexCoord.y );
    output.Tangent = T;
#ifndef NRI_DXBC
    output.BaseAttributes.x = (NRI_INSTANCE_ID_OFFSET);
    output.BaseAttributes.y = (NRI_VERTEX_ID_OFFSET);
    output.BaseAttributes.z = (NRI_INSTANCE_ID);
    output.BaseAttributes.w = (NRI_VERTEX_ID);
#else
    output.BaseAttributes = float4(0,0,0,0);
#endif

    return output;
}
    uint instanceIndex = (uint)(input.BaseAttributes.x) + (uint)(input.BaseAttributes.z);
    uint vertexIndex = (uint)(input.BaseAttributes.y) + (uint)(input.BaseAttributes.w);
    uint materialIndex = Instances[instanceIndex].materialIndex;
    return VisualizeId((uint)(input.BaseAttributes.x));
vertver commented 2 months ago

Wow, it was really strange. The problem was with uint cast.

vertver commented 2 months ago

So, the problem at the moment: I am using the standard command signature for Vulkan, but on D3D12 I can't use it because I also need to use 2 constants. That means that stride of every indirect command will be different on each platform. The same problem occurs in the shader, as we also need to define the draw command stride and fields to fill them correctly. What solution can we propose?

vertver commented 2 months ago

Some updates for compatibility header:

// © 2024 NVIDIA Corporation

#ifndef NRI_COMPATIBILITY_HLSLI
#define NRI_COMPATIBILITY_HLSLI

#ifndef __cplusplus
    #define NRI_MERGE_TOKENS(a, b) a##b
#endif

#define NRI_UV_TO_CLIP(uv) (uv * float2(2, -2) + float2(-1, 1))
#define NRI_CLIP_TO_UV(clip) (clip * float2(0.5, -0.5) + 0.5)

// Container detection
#ifdef __hlsl_dx_compiler
    #ifdef __spirv__
        #define NRI_SPIRV
        #define NRI_PRINTF_AVAILABLE
    #else
        #define NRI_DXIL
    #endif
#else
    #ifndef __cplusplus
        #define NRI_DXBC
    #endif
#endif

// Shader model
#ifdef NRI_DXBC
    #define NRI_SHADER_MODEL 50
#else
    #define NRI_SHADER_MODEL (__SHADER_TARGET_MAJOR * 10 + __SHADER_TARGET_MINOR)
#endif

// SPIRV
#ifdef NRI_SPIRV
    #define NRI_RESOURCE_ARRAY(resourceType, resourceName, regName, bindingIndex, setIndex) \
        resourceType resourceName[] : register(NRI_MERGE_TOKENS(regName, bindingIndex), NRI_MERGE_TOKENS(space, setIndex))

    // Resources
    #define NRI_RESOURCE(resourceType, resourceName, regName, bindingIndex, setIndex) \
        resourceType resourceName : register(NRI_MERGE_TOKENS(regName, bindingIndex), NRI_MERGE_TOKENS(space, setIndex))

    #define NRI_PUSH_CONSTANTS(structName, constantBufferName, bindingIndex) \
        [[vk::push_constant]] structName constantBufferName

    // Draw parameters (requires SPV_KHR_shader_draw_parameters)
    #define NRI_ENABLE_DRAW_PARAMETERS(bindingIndex)

    #define NRI_DECLARE_DRAW_PARAMETERS \
        int NRI_VERTEX_ID_OFFSET : SV_VertexID, \
        uint NRI_INSTANCE_ID_OFFSET : SV_InstanceID, \
        [[vk::builtin("BaseVertex")]] int NRI_BASE_VERTEX : _SV_Nothing, \
        [[vk::builtin("BaseInstance")]] uint NRI_BASE_INSTANCE : _SV_Nothing

    #define NRI_DRAW_INDEXED_COMMAND_SIZE 5 * 4
    #define NRI_FILL_DRAW_INDEXED_COMMAND(buffer, cmdIndex, indexCount, instanceCount, startIndex, baseVertex, startInstance) \
        buffer.Store(cmdIndex * NRI_DRAW_INDEXED_COMMAND_SIZE + 0, indexCount); \
        buffer.Store(cmdIndex * NRI_DRAW_INDEXED_COMMAND_SIZE + 4, instanceCount); \
        buffer.Store(cmdIndex * NRI_DRAW_INDEXED_COMMAND_SIZE + 8, startIndex); \
        buffer.Store(cmdIndex * NRI_DRAW_INDEXED_COMMAND_SIZE + 12, baseVertex); \
        buffer.Store(cmdIndex * NRI_DRAW_INDEXED_COMMAND_SIZE + 16, startInstance)

    #define NRI_VERTEX_ID (NRI_VERTEX_ID_OFFSET - NRI_BASE_VERTEX)
    #define NRI_INSTANCE_ID (NRI_INSTANCE_ID_OFFSET - NRI_BASE_INSTANCE)
#endif

// DXIL
#ifdef NRI_DXIL
    #define NRI_RESOURCE_ARRAY(resourceType, resourceName, regName, bindingIndex, setIndex) \
        resourceType resourceName[] : register(NRI_MERGE_TOKENS(regName, bindingIndex), NRI_MERGE_TOKENS(space, setIndex))

    // Resources
    #define NRI_RESOURCE(resourceType, resourceName, regName, bindingIndex, setIndex) \
        resourceType resourceName : register(NRI_MERGE_TOKENS(regName, bindingIndex), NRI_MERGE_TOKENS(space, setIndex))

    #define NRI_PUSH_CONSTANTS(structName, constantBufferName, bindingIndex) \
        ConstantBuffer<structName> constantBufferName : register(NRI_MERGE_TOKENS(b, bindingIndex), space0)

    // Draw parameters
    #if (NRI_SHADER_MODEL >= 68)
        #define NRI_ENABLE_DRAW_PARAMETERS(bindingIndex)

        #define NRI_DECLARE_DRAW_PARAMETERS \
            uint NRI_VERTEX_ID : SV_VertexID, \
            uint NRI_INSTANCE_ID : SV_InstanceID, \
            int NRI_BASE_VERTEX : SV_StartVertexLocation, \
            uint NRI_BASE_INSTANCE : SV_StartInstanceLocation

        #define NRI_DRAW_INDEXED_COMMAND_SIZE 5 * 4
        #define NRI_FILL_DRAW_INDEXED_COMMAND(buffer, cmdIndex, indexCount, instanceCount, startIndex, baseVertex, startInstance) \
            buffer.Store(cmdIndex * NRI_DRAW_INDEXED_COMMAND_SIZE + 0, indexCount); \
            buffer.Store(cmdIndex * NRI_DRAW_INDEXED_COMMAND_SIZE + 4, instanceCount); \
            buffer.Store(cmdIndex * NRI_DRAW_INDEXED_COMMAND_SIZE + 8, startIndex); \
            buffer.Store(cmdIndex * NRI_DRAW_INDEXED_COMMAND_SIZE + 12, baseVertex); \
            buffer.Store(cmdIndex * NRI_DRAW_INDEXED_COMMAND_SIZE + 16, startInstance)
    #else
        #define NRI_ENABLE_DRAW_PARAMETERS(bindingIndex) \
            struct _BaseAttributeConstants { \
                uint BaseVertex; \
                uint BaseInstance; \
            }; \
            ConstantBuffer<_BaseAttributeConstants> _BaseAttributes : register(NRI_MERGE_TOKENS(b, bindingIndex), space0)

        #define NRI_BASE_VERTEX _BaseAttributes.BaseVertex
        #define NRI_BASE_INSTANCE _BaseAttributes.BaseInstance

        #define NRI_DECLARE_DRAW_PARAMETERS \
            uint NRI_VERTEX_ID : SV_VertexID, \
            uint NRI_INSTANCE_ID : SV_InstanceID

        #define NRI_DRAW_INDEXED_COMMAND_SIZE 7 * 4
        #define NRI_FILL_DRAW_INDEXED_COMMAND(buffer, cmdIndex, indexCount, instanceCount, startIndex, baseVertex, startInstance) \
            buffer.Store(cmdIndex * NRI_DRAW_INDEXED_COMMAND_SIZE + 0, baseVertex);     /* root constant */ \
            buffer.Store(cmdIndex * NRI_DRAW_INDEXED_COMMAND_SIZE + 4, startInstance);  /* root constant */ \
            buffer.Store(cmdIndex * NRI_DRAW_INDEXED_COMMAND_SIZE + 8, indexCount); \
            buffer.Store(cmdIndex * NRI_DRAW_INDEXED_COMMAND_SIZE + 12, instanceCount); \
            buffer.Store(cmdIndex * NRI_DRAW_INDEXED_COMMAND_SIZE + 16, startIndex); \
            buffer.Store(cmdIndex * NRI_DRAW_INDEXED_COMMAND_SIZE + 20, baseVertex); \
            buffer.Store(cmdIndex * NRI_DRAW_INDEXED_COMMAND_SIZE + 24, startInstance)
    #endif

    #define NRI_VERTEX_ID_OFFSET (NRI_BASE_VERTEX + NRI_VERTEX_ID)
    #define NRI_INSTANCE_ID_OFFSET (NRI_BASE_INSTANCE + NRI_INSTANCE_ID)
#endif

// DXBC
#ifdef NRI_DXBC
    // Resources
    #define NRI_RESOURCE_ARRAY(resourceType, resourceName, regName, bindingIndex, setIndex) \
        resourceType resourceName : register(NRI_MERGE_TOKENS(regName, bindingIndex))

    #define NRI_RESOURCE(resourceType, resourceName, regName, bindingIndex, setIndex) \
        resourceType resourceName : register(NRI_MERGE_TOKENS(regName, bindingIndex))

    #define NRI_PUSH_CONSTANTS(structName, constantBufferName, bindingIndex) \
        cbuffer structName##_##constantBufferName : register(NRI_MERGE_TOKENS(b, bindingIndex)) { \
            structName constantBufferName; \
        }

    // Draw parameters (partially supported)
    #define NRI_ENABLE_DRAW_PARAMETERS(bindingIndex)

    #define NRI_DECLARE_DRAW_PARAMETERS \
        uint NRI_VERTEX_ID : SV_VertexID, \
        uint NRI_INSTANCE_ID : SV_InstanceID

    #define NRI_DRAW_INDEXED_COMMAND_SIZE 5 * 4
    #define NRI_FILL_DRAW_INDEXED_COMMAND(buffer, cmdIndex, indexCount, instanceCount, startIndex, baseVertex, startInstance) \
        buffer.Store(cmdIndex * NRI_DRAW_INDEXED_COMMAND_SIZE + 0, indexCount); \
        buffer.Store(cmdIndex * NRI_DRAW_INDEXED_COMMAND_SIZE + 4, instanceCount); \
        buffer.Store(cmdIndex * NRI_DRAW_INDEXED_COMMAND_SIZE + 8, startIndex); \
        buffer.Store(cmdIndex * NRI_DRAW_INDEXED_COMMAND_SIZE + 12, baseVertex); \
        buffer.Store(cmdIndex * NRI_DRAW_INDEXED_COMMAND_SIZE + 16, startInstance)

    #define NRI_BASE_VERTEX NRI_BASE_VERTEX_is_unsupported
    #define NRI_BASE_INSTANCE NRI_BASE_INSTANCE_is_unsupported
    #define NRI_VERTEX_ID_OFFSET NRI_VERTEX_ID_OFFSET_is_unsupported
    #define NRI_INSTANCE_ID_OFFSET NRI_INSTANCE_ID_OFFSET_is_unsupported
#endif

// C/C++
#ifdef __cplusplus
    #define NRI_RESOURCE(resourceType, resourceName, regName, bindingIndex, setIndex) \
        struct resourceName
#endif

#endif

and compute shader:


groupshared uint DrawCount;

[numthreads(1024, 1, 1)]
void main(uint ThreadID : SV_DispatchThreadId)
{
    uint InstanceIndex = ThreadID.x;
    if (InstanceIndex >= Constants.DrawCount) {
        return;
    }

    uint DrawIndex = 0;
    uint MeshIndex = Instances[InstanceIndex].meshIndex;
    InterlockedAdd(DrawCount, 1, DrawIndex);

    NRI_FILL_DRAW_INDEXED_COMMAND(Commands, DrawIndex, 
        Meshes[MeshIndex].idxCount,
        1,  // TODO: batch draw instances with same mesh into one draw call
        Meshes[MeshIndex].idxOffset,
        Meshes[MeshIndex].vtxOffset,
        InstanceIndex
    );
}
vertver commented 2 months ago

Just uploaded my indirect bindless sample for NRI.

dzhdanNV commented 2 months ago

Just uploaded my indirect bindless sample

Thanks for the titanic work. Of course, at the end I will accept related NRI changes and the new sample and include it into the official code.

But questions / comments first:

Fixed header:

// © 2024 NVIDIA Corporation

#ifndef NRI_COMPATIBILITY_HLSLI
#define NRI_COMPATIBILITY_HLSLI

/*
USAGE:

Textures, buffers, samplers and acceleration structures:
    NRI_RESOURCE(Texture2D<float4>, gInput, t, 0, 2);
    NRI_RESOURCE(RWTexture2D<float>, gOutputs, u, 0, 0);
    NRI_RESOURCE(StructuredBuffer<InstanceData>, gInstanceData, t, 2, 2);
    NRI_RESOURCE(RaytracingAccelerationStructure, gTlas, t, 1, 2);
    NRI_RESOURCE(SamplerState, gLinearMipmapLinearSampler, s, 0, 0);

Texture and buffer arrays:
    NRI_RESOURCE(Texture2D<float3>, gInputs[], t, 0, 1); // DXIL/SPIRV only
    NRI_RESOURCE(Texture2D<float>, gInputs[8], t, 0, 0); // DXBC compatible

Constants:
    NRI_RESOURCE(cbuffer, Constants, b, 0, 3) {
        uint32_t gConst1;
        uint32_t gConst2;
        uint32_t gConst3;
        uint32_t gConst4;
    };

Push constants:
    struct PushConstants {
        float const1;
        uint32_t const2;
    };

    NRI_PUSH_CONSTANTS(PushConstants, gPushConstants, 7); // a constant buffer in DXBC

Draw parameters:
    - Add to global scope:
        NRI_ENABLE_DRAW_PARAMETERS;
    - Add to the entry point or a function input parameters list:
        void main(..., NRI_DECLARE_DRAW_PARAMETERS) {
            ...
        }
    - Use the following macros in the code:
        NRI_VERTEX_ID, NRI_INSTANCE_ID - start from 0
        NRI_BASE_VERTEX, NRI_BASE_INSTANCE - base vertex / instance from a "Draw" call
        NRI_VERTEX_ID_OFFSET, NRI_INSTANCE_ID_OFFSET - start from "base" vertex / instance

IndirectDraw parameters:
    TODO: add description
*/

#ifndef __cplusplus
    #define NRI_MERGE_TOKENS(a, b) a##b
#endif

// Container detection
#ifdef __hlsl_dx_compiler
    #ifdef __spirv__
        #define NRI_SPIRV
        #define NRI_PRINTF_AVAILABLE
    #else
        #define NRI_DXIL
    #endif
#else
    #ifndef __cplusplus
        #define NRI_DXBC
    #endif
#endif

// Shader model
#ifdef NRI_DXBC
    #define NRI_SHADER_MODEL 50
#else
    #define NRI_SHADER_MODEL (__SHADER_TARGET_MAJOR * 10 + __SHADER_TARGET_MINOR)
#endif

// SPIRV
#ifdef NRI_SPIRV
    #define NRI_RESOURCE(resourceType, resourceName, regName, bindingIndex, setIndex) \
        resourceType resourceName : register(NRI_MERGE_TOKENS(regName, bindingIndex), NRI_MERGE_TOKENS(space, setIndex))

    #define NRI_PUSH_CONSTANTS(structName, constantBufferName, bindingIndex) \
        [[vk::push_constant]] structName constantBufferName

    // Draw parameters (requires SPV_KHR_shader_draw_parameters)
    #define NRI_ENABLE_DRAW_PARAMETERS(bindingIndex)

    #define NRI_DECLARE_DRAW_PARAMETERS \
        int NRI_VERTEX_ID_OFFSET : SV_VertexID, \
        uint NRI_INSTANCE_ID_OFFSET : SV_InstanceID, \
        [[vk::builtin("BaseVertex")]] int NRI_BASE_VERTEX : _SV_Nothing, \
        [[vk::builtin("BaseInstance")]] uint NRI_BASE_INSTANCE : _SV_Nothing

    #define NRI_VERTEX_ID (NRI_VERTEX_ID_OFFSET - NRI_BASE_VERTEX)
    #define NRI_INSTANCE_ID (NRI_INSTANCE_ID_OFFSET - NRI_BASE_INSTANCE)

    // IndirectDraw parameters
    #define NRI_DRAW_INDEXED_COMMAND_SIZE (5 * 4)
    #define NRI_FILL_DRAW_INDEXED_COMMAND(buffer, cmdIndex, indexCount, instanceCount, startIndex, baseVertex, startInstance) \
        buffer.Store(cmdIndex * NRI_DRAW_INDEXED_COMMAND_SIZE + 0, indexCount); \
        buffer.Store(cmdIndex * NRI_DRAW_INDEXED_COMMAND_SIZE + 4, instanceCount); \
        buffer.Store(cmdIndex * NRI_DRAW_INDEXED_COMMAND_SIZE + 8, startIndex); \
        buffer.Store(cmdIndex * NRI_DRAW_INDEXED_COMMAND_SIZE + 12, baseVertex); \
        buffer.Store(cmdIndex * NRI_DRAW_INDEXED_COMMAND_SIZE + 16, startInstance)
#endif

// DXIL
#ifdef NRI_DXIL
    #define NRI_RESOURCE(resourceType, resourceName, regName, bindingIndex, setIndex) \
        resourceType resourceName : register(NRI_MERGE_TOKENS(regName, bindingIndex), NRI_MERGE_TOKENS(space, setIndex))

    #define NRI_PUSH_CONSTANTS(structName, constantBufferName, bindingIndex) \
        ConstantBuffer<structName> constantBufferName : register(NRI_MERGE_TOKENS(b, bindingIndex), space0)

    // Draw parameters
    #if (NRI_SHADER_MODEL >= 68)
        #define NRI_ENABLE_DRAW_PARAMETERS(bindingIndex)

        #define NRI_DECLARE_DRAW_PARAMETERS \
            uint NRI_VERTEX_ID : SV_VertexID, \
            uint NRI_INSTANCE_ID : SV_InstanceID, \
            int NRI_BASE_VERTEX : SV_StartVertexLocation, \
            uint NRI_BASE_INSTANCE : SV_StartInstanceLocation
    #else
        #define NRI_ENABLE_DRAW_PARAMETERS(bindingIndex) \
            struct _BaseAttributeConstants { \
                uint BaseVertex; \
                uint BaseInstance; \
            }; \
            ConstantBuffer<_BaseAttributeConstants> _BaseAttributes : register(NRI_MERGE_TOKENS(b, bindingIndex), space0)

        #define NRI_BASE_VERTEX _BaseAttributes.BaseVertex
        #define NRI_BASE_INSTANCE _BaseAttributes.BaseInstance

        #define NRI_DECLARE_DRAW_PARAMETERS \
            uint NRI_VERTEX_ID : SV_VertexID, \
            uint NRI_INSTANCE_ID : SV_InstanceID
    #endif

    #define NRI_VERTEX_ID_OFFSET (NRI_BASE_VERTEX + NRI_VERTEX_ID)
    #define NRI_INSTANCE_ID_OFFSET (NRI_BASE_INSTANCE + NRI_INSTANCE_ID)

    // IndirectDraw parameters
    #if (NRI_SHADER_MODEL >= 68)
        #define NRI_DRAW_INDEXED_COMMAND_SIZE (5 * 4)
        #define NRI_FILL_DRAW_INDEXED_COMMAND(buffer, cmdIndex, indexCount, instanceCount, startIndex, baseVertex, startInstance) \
            buffer.Store(cmdIndex * NRI_DRAW_INDEXED_COMMAND_SIZE + 0, indexCount); \
            buffer.Store(cmdIndex * NRI_DRAW_INDEXED_COMMAND_SIZE + 4, instanceCount); \
            buffer.Store(cmdIndex * NRI_DRAW_INDEXED_COMMAND_SIZE + 8, startIndex); \
            buffer.Store(cmdIndex * NRI_DRAW_INDEXED_COMMAND_SIZE + 12, baseVertex); \
            buffer.Store(cmdIndex * NRI_DRAW_INDEXED_COMMAND_SIZE + 16, startInstance)
    #else
        #define NRI_DRAW_INDEXED_COMMAND_SIZE (7 * 4)
        #define NRI_FILL_DRAW_INDEXED_COMMAND(buffer, cmdIndex, indexCount, instanceCount, startIndex, baseVertex, startInstance) \
            buffer.Store(cmdIndex * NRI_DRAW_INDEXED_COMMAND_SIZE + 0, baseVertex);     /* root constant */ \
            buffer.Store(cmdIndex * NRI_DRAW_INDEXED_COMMAND_SIZE + 4, startInstance);  /* root constant */ \
            buffer.Store(cmdIndex * NRI_DRAW_INDEXED_COMMAND_SIZE + 8, indexCount); \
            buffer.Store(cmdIndex * NRI_DRAW_INDEXED_COMMAND_SIZE + 12, instanceCount); \
            buffer.Store(cmdIndex * NRI_DRAW_INDEXED_COMMAND_SIZE + 16, startIndex); \
            buffer.Store(cmdIndex * NRI_DRAW_INDEXED_COMMAND_SIZE + 20, baseVertex); \
            buffer.Store(cmdIndex * NRI_DRAW_INDEXED_COMMAND_SIZE + 24, startInstance)
    #endif
#endif

// DXBC
#ifdef NRI_DXBC
    #define NRI_RESOURCE(resourceType, resourceName, regName, bindingIndex, setIndex) \
        resourceType resourceName : register(NRI_MERGE_TOKENS(regName, bindingIndex))

    #define NRI_PUSH_CONSTANTS(structName, constantBufferName, bindingIndex) \
        cbuffer structName##_##constantBufferName : register(NRI_MERGE_TOKENS(b, bindingIndex)) { \
            structName constantBufferName; \
        }

    // Draw parameters (partially supported)
    #define NRI_ENABLE_DRAW_PARAMETERS(bindingIndex)

    #define NRI_DECLARE_DRAW_PARAMETERS \
        uint NRI_VERTEX_ID : SV_VertexID, \
        uint NRI_INSTANCE_ID : SV_InstanceID

    #define NRI_BASE_VERTEX NRI_BASE_VERTEX_is_unsupported
    #define NRI_BASE_INSTANCE NRI_BASE_INSTANCE_is_unsupported
    #define NRI_VERTEX_ID_OFFSET NRI_VERTEX_ID_OFFSET_is_unsupported
    #define NRI_INSTANCE_ID_OFFSET NRI_INSTANCE_ID_OFFSET_is_unsupported

    // IndirectDraw parameters
    #define NRI_DRAW_INDEXED_COMMAND_SIZE (5 * 4)
    #define NRI_FILL_DRAW_INDEXED_COMMAND(buffer, cmdIndex, indexCount, instanceCount, startIndex, baseVertex, startInstance) \
        buffer.Store(cmdIndex * NRI_DRAW_INDEXED_COMMAND_SIZE + 0, indexCount); \
        buffer.Store(cmdIndex * NRI_DRAW_INDEXED_COMMAND_SIZE + 4, instanceCount); \
        buffer.Store(cmdIndex * NRI_DRAW_INDEXED_COMMAND_SIZE + 8, startIndex); \
        buffer.Store(cmdIndex * NRI_DRAW_INDEXED_COMMAND_SIZE + 12, baseVertex); \
        buffer.Store(cmdIndex * NRI_DRAW_INDEXED_COMMAND_SIZE + 16, startInstance)
#endif

// C/C++
#ifdef __cplusplus
    #define NRI_RESOURCE(resourceType, resourceName, regName, bindingIndex, setIndex) \
        struct resourceName
#endif

// Misc
#define NRI_UV_TO_CLIP(uv) (uv * float2(2, -2) + float2(-1, 1))
#define NRI_CLIP_TO_UV(clip) (clip * float2(0.5, -0.5) + 0.5)

#endif
vertver commented 2 months ago

NRI_RESOURCE_ARRAY is not needed. You can use NRI_RESOURCE with resource, resource[] and even resource[8]

I'll fix it.

it's unclear how to use IndirectDraw parameters. Please, resolve TODO: add description in the updated header below

There is an example for that

for NRI_FILL_DRAW_INDEXED_COMMAND for DXIL I would suggest moving missing "base" parameters to the end of the list to be consistent with the other GAPIs and their shader containers (please, fix)

Ok, I'll fix it. It seems to be possible to offset root constants in D3D12

why do we write baseVertex and baseInstance twice for DXIL?

Since D3D12 does not support offsets from baseVertex and baseInstance in shaders, we need to provide 2 parameters per draw call via root constants. D3D12 allows you to add root parameters directly to the indirect draw argument, which is what I'll do.

I understand that you are mostly interested in DrawIndirect, but since we have started... I would like to support draw parameters for Draw and DrawIndirect. Wanna look at complete code NRISamples + NRI. Please-please-please ^_^

Draw will also be supported, since it uses the same root constants for obtaining baseVertex and baseInstance. The only question I have is how can we provide a custom stride for a command to support this behaviour? Especially here and here (right now it uses default indirect draw argument)

dzhdanNV commented 2 months ago

Thanks:

Since D3D12 does not support offsets from baseVertex and baseInstance in shaders, we need to provide 2 parameters per draw call via root constants. D3D12 allows you to add root parameters directly to the indirect draw argument, which is what I'll do.

Got it. One pair is for a draw command, another pair is hooked up by unreleased NRI code, which maps them to _BaseAttributeConstants making the rest of the code work.

The only question I have is how can we provide a custom stride for a command to support this behavior?

I think we can extend command signatures in NRIDescs and add (WIP, see TODOs):

// See NRI_FILL_DRAW_COMMAND // TODO: add
NRI_STRUCT(DrawBaseDesc)
{
    uint32_t vertexNum;
    uint32_t instanceNum;
    uint32_t baseVertex; // vertex buffer offset = CmdSetVertexBuffers.offset + baseVertex * VertexStreamDesc::stride
    uint32_t baseInstance;
    uint32_t shaderEmulatedBaseVertex; // TODO: can we map root constants to "baseVertex"?
    uint32_t shaderEmulatedBaseInstance; // TODO: can we map root constants to "baseInstance"?
};

// See NRI_FILL_DRAW_INDEXED_COMMAND
NRI_STRUCT(DrawIndexedBaseDesc)
{
    uint32_t indexNum;
    uint32_t instanceNum;
    uint32_t baseIndex; // index buffer offset = CmdSetIndexBuffer.offset + baseIndex * sizeof(CmdSetIndexBuffer.indexType)
    int32_t baseVertex; // index += baseVertex
    uint32_t baseInstance;
    uint32_t shaderEmulatedBaseVertex; // TODO: can we map root constants to "baseVertex"?
    uint32_t shaderEmulatedBaseInstance; // TODO: can we map root constants to "baseInstance"?
};

Also expose DeviceDesc::isDrawParametersBaseEnabled (should we separate into indirect and non-indirect?), which is:

Additionally, we can introduce an inline function to handle selection more gracefully. But in any case user will have to adjust the code in several places and manually select one struct or another depending on this boolean.

(right now it uses default indirect draw argument)

It means that D3D12 doesn't work currently, because 7 dwords needed instead of 5. Right?

dzhdanNV commented 2 months ago

(edited comment above)

vertver commented 2 months ago

I'm trying to implement root constants in a command signature and have run into the problem that the draw command needs to be at the end of all indirect arguments. This means that I need different draw desc declarations for each platform. How can we manage this in NRI API? Maybe create a separate method for stride and arguments identification?

It means that D3D12 doesn't work currently, because 7 dwords needed instead of 5. Right?

Yeah.

dzhdanNV commented 2 months ago

I'm trying to implement root constants in a command signature and have run into the problem that the draw command needs to be at the end of all indirect arguments.

Understood.

This means that I need different draw desc declarations for each platform. How can we manage this in NRI API? Maybe create a separate method for stride and arguments identification?

I tried to answer this question. Here https://github.com/NVIDIAGameWorks/NRI/issues/61#issuecomment-2078673966

vertver commented 2 months ago

I tried to answer this question. Here https://github.com/NVIDIAGameWorks/NRI/issues/61#issuecomment-2078673966

Vulkan requires a fixed initial layout for each DrawIndexed and Draw indirect command, but D3D12 doesn't. At the same time, Vulkan can't shift the initial offset of a command, which is a requirement for D3D12, so instead I need separate structures for each platform. So, to recap:

NRI_STRUCT(DrawDesc)
{
    uint32_t indirectBaseVertex;       // This is a requirement for D3D12, but this will not work on Vulkan.
    uint32_t indirectBaseInstance;     // This is a requirement for D3D12, but this will not work on Vulkan.
    uint32_t vertexNum;
    uint32_t instanceNum;
    uint32_t baseVertex; 
    uint32_t baseInstance;
};

So, instead, we need to create 2 separate structs:

NRI_STRUCT(DrawDescD3D12)
{
    uint32_t indirectBaseVertex;
    uint32_t indirectBaseInstance;
    uint32_t vertexNum;
    uint32_t instanceNum;
    uint32_t baseVertex; 
    uint32_t baseInstance;
};

NRI_STRUCT(DrawDescVulkan)
{
    uint32_t vertexNum;
    uint32_t instanceNum;
    uint32_t baseVertex; 
    uint32_t baseInstance;
};
vertver commented 2 months ago

Hm, I just thought that I can fix this problem via offseting initial buffer by one dummy argument.

Vulkan:
    # padding
    # padding
    0 vertexNum
    0 instanceNum
    0 baseVertex
    0 baseInstance
    # padding
    # padding
    1 vertexNum
    1 instanceNum
    1 baseVertex
    1 baseInstance
    # padding
    # padding
D3D12:
    C padding
    C padding
    0 vertexNum
    0 instanceNum
    0 baseVertex
    0 baseInstance
    C padding
    C padding
    1 vertexNum
    1 instanceNum
    1 baseVertex
    1 baseInstance
    # padding
    # padding

But this will require extra management for such cases from user side.

dzhdanNV commented 2 months ago

I'm OK with this (massaged). We will explicitly state that this struct is needed only for D3D12 if:

NRI_STRUCT(DrawDesc) { uint32_t vertexNum; uint32_t instanceNum; uint32_t baseVertex; uint32_t baseInstance; };

vertver commented 2 months ago

I'm facing with a problem. On Vulkan is successfully runs and works, however, on D3D12 this code doesn't work.

        // Culling
        NRI_ABORT_ON_FAILURE(NRI.AllocateDescriptorSets(*m_DescriptorPool, *m_ComputePipelineLayout, 0,
            &m_DescriptorSets[BUFFERED_FRAME_MAX_NUM + 1], 1, 0));
        nri::DescriptorRangeUpdateDesc rangeUpdateDescs[2] = {};
        rangeUpdateDescs[0].descriptorNum = 1;
        rangeUpdateDescs[0].descriptors = &m_IndirectBufferStorageAttachement;
        rangeUpdateDescs[1].descriptorNum = BUFFER_COUNT;
        rangeUpdateDescs[1].descriptors = resourceViews;
        NRI.UpdateDescriptorRanges(*m_DescriptorSets[BUFFERED_FRAME_MAX_NUM + 1], 0, 2, rangeUpdateDescs);

m_IndirectBufferStorageAttachement is UAV, so maybe there is a problem with UAV. What do you think?

D3D12 ERROR: ID3D12Device::CopyDescriptorsSimple: Specified CPU descriptor handle ptr=0x0000029F4905C100 does not refer to a location in a descriptor heap. DestDescriptorRangeStart is the issue. [ EXECUTION ERROR #646: INVALID_DESCRIPTOR_HANDLE]

NRI Branch Commit

dzhdanNV commented 2 months ago

I will try to clone your NRI samples on Monday. Hope all modified dependencies will be correctly pulled.

dzhdanNV commented 2 months ago

IMPORTANT: What if we overcomplicate? NRI is a common denominator for D3D11, D3D12 and VK. And in this particular case instead of trying to reach VK functionality, we can demote VK functionality a bit to stay on par with D3D. DrawParameters are always supported in VK (except probably some Android devices), so we can stick only with 0-based IDs (a subtraction is needed for VK) and postulate:

=> this is much easier to support => D3D11 works => yes, minor inconvenience and overhead for VK, but it's really a minor and VK-only thing

vertver commented 2 months ago

I'm trying to reach Vulkan behaviour because it allows me to use 'baseInstance' as global instance id as is. You can check my sample and see that it works even in indirect calls, and it was a reason why I'm asking for fixing a problem.

vertver commented 2 months ago

There is definitely something wrong with descriptors and descriptor sets... I just checked my code with other samples on D3D12 and it works perfectly. However, in my sample it works really strange. Now it crashes on SetDescriptorSet function.

D3D12 ERROR: CGraphicsCommandList::SetGraphicsRootDescriptorTable: Specified GPU Descriptor Handle (ptr = 0x45678a00110100 at 36 offsetInDescriptorsFromDescriptorHeapStart), for Root Signature (0x000001B4E314D350:'Unnamed ID3D12RootSignature Object')'s Descriptor Table (at Parameter Index [4])'s Descriptor Range (at Range Index [0] of type D3D12_DESCRIPTOR_RANGE_TYPE_SRV) has not been initialized. All descriptors of descriptor ranges declared STATIC (not-DESCRIPTORS_VOLATILE) in a root signature must be initialized prior to being set on the command list. [ EXECUTION ERROR #646: INVALID_DESCRIPTOR_HANDLE] 

Also, for some reason, the previous message is fixed by simply increasing the size of the descriptor pool. Maybe there is something wrong with variable descriptor sets on D3D12?

#define TEST 100

    { // Descriptor pool
        nri::DescriptorPoolDesc descriptorPoolDesc = {};
        descriptorPoolDesc.descriptorSetMaxNum = materialNum + BUFFERED_FRAME_MAX_NUM + 2;
        descriptorPoolDesc.textureMaxNum = materialNum * TEXTURES_PER_MATERIAL;
        descriptorPoolDesc.samplerMaxNum = BUFFERED_FRAME_MAX_NUM;
        descriptorPoolDesc.storageStructuredBufferMaxNum = 1 * 2 * TEST;
        descriptorPoolDesc.storageBufferMaxNum = 1 * 2 * TEST;
        descriptorPoolDesc.bufferMaxNum = 3 * 2 * TEST;
        descriptorPoolDesc.structuredBufferMaxNum = 4 * 2 * TEST;
        descriptorPoolDesc.constantBufferMaxNum = BUFFERED_FRAME_MAX_NUM;

        NRI_ABORT_ON_FAILURE( NRI.CreateDescriptorPool(*m_Device, descriptorPoolDesc, m_DescriptorPool) );
    }
dzhdanNV commented 2 months ago

Also, for some reason, the previous message is fixed by simply increasing the size of the descriptor pool

Well, everything is possible. But usually it's an indicator of wrong descriptor management on the app side.

I have just cloned vertver/NRISamples and briefly looked at BindlessSceneViewer with --debugAPI and --debugNRI:

dzhdanNV commented 2 months ago

Looks like ForwardIndirect has been replaced to ForwardBindless... I would prefer to get official fixes from your side to avoid potential divergence. I wanna see what you see...

vertver commented 2 months ago

Fixed, Vulkan is still working. image

01Pollux commented 2 months ago

I have compiled and executed your code @vertver, but there are some points to make:

1) when using vulkan, it crashes on some hardwares due to them not supporting high number of bindeable resources the error message: NRI::ERROR(DeviceVK.cpp:1006) - VK::Intel(R) UHD Graphics - Validation Error: [ VUID-VkPipelineLayoutCreateInfo-descriptorType-06939 ] | MessageID = 0x609f841b | vkCreatePipelineLayout(): max per-stage sampled image bindings count (1024) exceeds device maxPerStageDescriptorSampledImages limit (200). The Vulkan spec states: The total number of descriptors in descriptor set layouts created without the VK_DESCRIPTOR_SET_LAYOUT_CREATE_UPDATE_AFTER_BIND_POOL_BIT bit set with a descriptorType of VK_DESCRIPTOR_TYPE_COMBINED_IMAGE_SAMPLER, VK_DESCRIPTOR_TYPE_SAMPLED_IMAGE, VK_DESCRIPTOR_TYPE_SAMPLE_WEIGHT_IMAGE_QCOM, VK_DESCRIPTOR_TYPE_BLOCK_MATCH_IMAGE_QCOM, and VK_DESCRIPTOR_TYPE_UNIFORM_TEXEL_BUFFER, accessible to any given shader stage across all elements of pSetLayouts must be less than or equal to VkPhysicalDeviceLimits::maxPerStageDescriptorSampledImages (https://vulkan.lunarg.com/doc/view/1.3.275.0/windows/1.3-extensions/vkspec.html#VUID-VkPipelineLayoutCreateInfo-descriptorType-06939) A breakpoint instruction (__debugbreak() statement or a similar call) was executed in BindlessSceneViewer.exe. the line in question:

nri::DescriptorRangeDesc textureDescriptorRange[1] = {};
textureDescriptorRange[0] = { 0, 1024, nri::DescriptorType::TEXTURE, nri::StageBits::FRAGMENT_SHADER, true, true };

Changing it to lower value will fix it, aswell as modifying how NRI creates descriptor pool and set by adding VK_DESCRIPTOR_POOL_CREATE_UPDATE_AFTER_BIND_BIT to the pool and VK_DESCRIPTOR_SET_LAYOUT_CREATE_UPDATE_AFTER_BIND_POOL_BIT to the set.

2) in D3D12, without doing running the sample is validation/debug layer, the output is this, which is way different than the one in vulkan d3d12: image image

vulkan: image image

3) in D3D12, i get the uninitialized descriptor table error if i dont enable the partiallyBound on descriptor set. the error message: D3D12 ERROR: CGraphicsCommandList::SetGraphicsRootDescriptorTable: Specified GPU Descriptor Handle (ptr = 0x2697d9730e0 at 36 offsetInDescriptorsFromDescriptorHeapStart), for Root Signature (0x000002697CB1C350:'Unnamed ID3D12RootSignature Object')'s Descriptor Table (at Parameter Index [4])'s Descriptor Range (at Range Index [0] of type D3D12_DESCRIPTOR_RANGE_TYPE_SRV) has not been initialized. All descriptors of descriptor ranges declared STATIC (not-DESCRIPTORS_VOLATILE) in a root signature must be initialized prior to being set on the command list. [ EXECUTION ERROR #646: INVALID_DESCRIPTOR_HANDLE] the line in question:

nri::DescriptorSetDesc descriptorSetDescs[] =
{
    {0, globalDescriptorRange, helper::GetCountOf(globalDescriptorRange)},
    {1, textureDescriptorRange, helper::GetCountOf(textureDescriptorRange)},
};

descriptorSetDescs[1].partiallyBound = true; // must be volatile

4) There is also this issue with sample position in d3d12: D3D12 ERROR: ID3D12CommandList::SetSamplePositions: NumSamplesPerPixel = 1, but can only be 2, 4 or 8 on hardware tier D3D12_PROGRAMMABLE_SAMPLE_POSITIONS_TIER_1. [ STATE_SETTING ERROR #1049: SETSAMPLEPOSITIONS_INVALIDARGS]

PS: the images provided above are results after disabled programmable sample position, enabled partially bound descriptor set for textures, added the flags in vk's side of nri.

dzhdanNV commented 2 months ago

I fixed the validation error. Here is the diff (PARTIALLY_BOUND was missing):

diff --git a/Source/BindlessSceneViewer.cpp b/Source/BindlessSceneViewer.cpp
index da5a016..31c60ae 100644
--- a/Source/BindlessSceneViewer.cpp
+++ b/Source/BindlessSceneViewer.cpp
@@ -246,12 +246,12 @@ bool Sample::Initialize(nri::GraphicsAPI graphicsAPI)

             // Bindless descriptors
             nri::DescriptorRangeDesc textureDescriptorRange[1] = {};
-            textureDescriptorRange[0] = { 0, 1024, nri::DescriptorType::TEXTURE, nri::StageBits::FRAGMENT_SHADER, true, true };
+            textureDescriptorRange[0] = { 0, 1024, nri::DescriptorType::TEXTURE, nri::StageBits::FRAGMENT_SHADER, nri::VARIABLE_DESCRIPTOR_NUM, nri::DESCRIPTOR_ARRAY };

             nri::DescriptorSetDesc descriptorSetDescs[] =
             {
                 {0, globalDescriptorRange, helper::GetCountOf(globalDescriptorRange)},
-                {1, textureDescriptorRange, helper::GetCountOf(textureDescriptorRange)},
+                {1, textureDescriptorRange, helper::GetCountOf(textureDescriptorRange), nullptr, 0, nri::PARTIALLY_BOUND},
             };

             nri::PipelineLayoutDesc pipelineLayoutDesc = {};

In D3D12 mode the sample works... but renders only sky if bindless is enabled. So, now the ball is on your side. I have spotted some placed which I want to modify, but it's a useless work until we get NRI and the sample in the working state.

vertver commented 2 months ago

when using vulkan, it crashes on some hardwares due to them not supporting high number of bindeable resources the error message:

Yeah, that makes sense. The Vulkan drivers on Intel Graphics are not capable of handling [more than 1800 images] (https://vulkan.gpuinfo.org/displayreport.php?id=30049#properties), so that may be the reason why it doesn't work on Vulkan.

in D3D12, i get the uninitialized descriptor table error if i dont enable the partiallyBound on descriptor set. the error message:

Thanks for reporting this problem, it helped me a lot. We need to add more validation for descriptor sets so that we don't run into this kind of problem again.

in D3D12, without doing running the sample is validation/debug layer, the output is this, which is way different than the one in vulkan

Yes, I am aware of this problem. I'll fix the indirect drawing first, and then fix the regular drawing.

D3D12 ERROR: ID3D12CommandList::SetSamplePositions: NumSamplesPerPixel = 1, but can only be 2, 4 or 8 on hardware tier D3D12_PROGRAMMABLE_SAMPLE_POSITIONS_TIER_1. [ STATE_SETTING ERROR #1049: SETSAMPLEPOSITIONS_INVALIDARGS]

It sounds like your hardware doesn't support programmable sampler positions. I see no reason (other than for VRS or conservative rendering) to use 1x samples for programmable sampler position, so I think we can easily remove that code.

vertver commented 2 months ago

In D3D12 mode the sample works... but renders only sky if bindless is enabled. So, now the ball is on your side. I have spotted some placed which I want to modify, but it's a useless work until we get NRI and the sample in the working state.

Thank you, I'll get on with my work now.

dzhdanNV commented 2 months ago

D3D12 ERROR: ID3D12CommandList::SetSamplePositions: NumSamplesPerPixel = 1, but can only be 2, 4 or 8 on hardware tier D3D12_PROGRAMMABLE_SAMPLE_POSITIONS_TIER_1. [ STATE_SETTING ERROR #1049: SETSAMPLEPOSITIONS_INVALIDARGS]

It sounds like your hardware doesn't support programmable sampler positions. I see no reason (other than for VRS or conservative rendering) to use 1x samples for programmable sampler position, so I think we can easily remove that code.

Sorry, guys! This one is on me. Several months ago when I worked on fixing completely broken code around PSL, I decided to use SceneViewer for this purpose. I will fix behavior on HW with limited PSL support (looks like NRI massaging is needed).

dzhdanNV commented 2 months ago

@01Pollux Thanks for jumping in (surprisingly we worked on the same stuff almost at the same time :D)