Devsh-Graphics-Programming / Nabla

Vulkan, OptiX and CUDA Interoperation Modular Rendering Library and Framework for PC/Linux/Android
http://devsh.eu
Apache License 2.0
458 stars 56 forks source link

Expose Raytracing Pipeline #191

Open Erfan-Ahmadi opened 3 years ago

Erfan-Ahmadi commented 3 years ago

This is a documentation of the vulkan objects that are part of VK_KHR_ray_tracing_pipeline we may want to expose and work with in Nabla.

Extension and Properties

VkPhysicalDeviceRayTracingPipelinePropertiesKHR

structure is included in the pNext chain of the VkPhysicalDeviceProperties2 structure passed to vkGetPhysicalDeviceProperties2, it is filled in with each corresponding implementation-dependent property.

VkPhysicalDeviceRayTracingPipelineFeaturesKHR :

structure is included in the pNext chain of the VkPhysicalDeviceFeatures2 structure passed to vkGetPhysicalDeviceFeatures2

These are needed for:

  1. ShaderGroup opaque handle's memory management (e.g shaderGroupHandleSize shaderGroupBaseAlignment, maxShaderGroupStride, shaderGroupHandleAlignment)
  2. Some value's max/min to validate (e.g maxRayHitAttributeSize maxRayRecursionDepth maxRayDispatchInvocationCount)

We must also expose some of these physical device values for the use to work with; If user wants to do everything low-level and manually (eg creating ShaderBindingTable Buffer)

VkPhysicalDeviceAccelerationStructureFeaturesKHR:

It has accelerationStructureHostCommands that indicates whether the implementation supports host side acceleration structure commands: (vkBuildAccelerationStructuresKHR, vkCopyAccelerationStructureKHR, vkCopyAccelerationStructureToMemoryKHR, vkCopyMemoryToAccelerationStructureKHR, and vkWriteAccelerationStructuresPropertiesKHR)

Just put Cmd after vk to make the functions above device functions instead of host ones.

Acceleration Structures (Creation, Build, Compaction, Copy, Related Enums and Structs...)

Geometry:

VkAccelerationStructureGeometryKHR:

typedef struct VkAccelerationStructureGeometryKHR {
    VkStructureType                           sType;
    const void*                               pNext;
    VkGeometryTypeKHR                         geometryType;
    VkAccelerationStructureGeometryDataKHR    geometry;
    VkGeometryFlagsKHR                        flags;
} VkAccelerationStructureGeometryKHR;

geometry member is a union of three other structs:

  1. Triangles specifies a geometry type consisting of triangles (used when building blas from vertex buffer)
  2. AABBs geometry type consisting of axis-aligned bounding boxes. (used when working with custom primitives that need a custom Intersection shader)
  3. Instances a geometry type consisting of acceleration structure instances. (used when building tlas frmo blas instances)

1. VkAccelerationStructureGeometryTrianglesDataKHR:

    VkStructureType                  sType;
    const void*                      pNext;
    VkFormat                         vertexFormat;
    VkDeviceOrHostAddressConstKHR    vertexData;
    VkDeviceSize                     vertexStride;
    uint32_t                         maxVertex;
    VkIndexType                      indexType;
    VkDeviceOrHostAddressConstKHR    indexData;
    VkDeviceOrHostAddressConstKHR    transformData;

2. VkAccelerationStructureGeometryAabbsDataKHR:

    VkStructureType                  sType;
    const void*                      pNext;
    VkDeviceOrHostAddressConstKHR    data;
    VkDeviceSize                     stride;

VkAabbPositionsKHR:

    float    minX;
    float    minY;
    float    minZ;
    float    maxX;
    float    maxY;
    float    maxZ;

3. VkAccelerationStructureGeometryInstancesDataKHR:

    VkStructureType                  sType;
    const void*                      pNext;
    VkBool32                         arrayOfPointers;
    VkDeviceOrHostAddressConstKHR    data;

VkAccelerationStructureInstanceKHR:

typedef struct VkAccelerationStructureInstanceKHR {
    VkTransformMatrixKHR          transform;
    uint32_t                      instanceCustomIndex:24;
    uint32_t                      mask:8;
    uint32_t                      instanceShaderBindingTableRecordOffset:24;
    VkGeometryInstanceFlagsKHR    flags:8;
    uint64_t                      accelerationStructureReference;
} VkAccelerationStructureInstanceKHR;

accelerationStructureReference is either:

VkGeometryTypeKHR:

    VK_GEOMETRY_TYPE_TRIANGLES_KHR = 0,
    VK_GEOMETRY_TYPE_AABBS_KHR = 1,
    VK_GEOMETRY_TYPE_INSTANCES_KHR = 2,

VkGeometryFlagBitsKHR:

    VK_GEOMETRY_OPAQUE_BIT_KHR = 0x00000001,
    VK_GEOMETRY_NO_DUPLICATE_ANY_HIT_INVOCATION_BIT_KHR = 0x00000002,

Acceleration Structures are built created from:

  1. One or more geometries (VkAccelerationStructureGeometryKHR ) filled in a VkAccelerationStructureBuildGeometryInfoKHR (referenced later in this text)
  2. And for each geometry we should have a build range ( VkAccelerationStructureBuildRangeInfoKHR)

VkAccelerationStructureBuildRangeInfoKHR:

    uint32_t    primitiveCount;
    uint32_t    primitiveOffset;
    uint32_t    firstVertex;
    uint32_t    transformOffset;

In the case of triangle geometry, primitiveCount is the number of triangles.

VkAccelerationStructureBuildGeometryInfoKHR:

    VkStructureType                                     sType;
    const void*                                         pNext;
    VkAccelerationStructureTypeKHR                      type;
    VkBuildAccelerationStructureFlagsKHR                flags;
    VkBuildAccelerationStructureModeKHR                 mode;
    VkAccelerationStructureKHR                          srcAccelerationStructure;
    VkAccelerationStructureKHR                          dstAccelerationStructure;
    uint32_t                                            geometryCount;
    const VkAccelerationStructureGeometryKHR*           pGeometries;
    const VkAccelerationStructureGeometryKHR* const*    ppGeometries;
    VkDeviceOrHostAddressKHR                            scratchData;

Most of the members are clear enough and explained in the spec, there is only a few notes:

vkGetAccelerationStructureBuildSizesKHR:

void vkGetAccelerationStructureBuildSizesKHR(
    VkDevice                                    device,
    VkAccelerationStructureBuildTypeKHR         buildType,
    const VkAccelerationStructureBuildGeometryInfoKHR* pBuildInfo,
    const uint32_t*                             pMaxPrimitiveCounts,
    VkAccelerationStructureBuildSizesInfoKHR*   pSizeInfo);

VkAccelerationStructureBuildSizesInfoKHR:

    VkStructureType    sType;
    const void*        pNext;
    VkDeviceSize       accelerationStructureSize;
    VkDeviceSize       updateScratchSize;
    VkDeviceSize       buildScratchSize;

We should usually call this function before Creating our AS becuase sizeInfo contains sizeInfo.accelerationStructureSize We should usually call this function before Building our AS becuase sizeInfo contains sizeInfo.buildScratchSize We should usually call this function before Updating our AS becuase sizeInfo contains sizeInfo.updateScratchSize

After Querying the sizes using vkGetAccelerationStructureBuildSizesKHR we must:

  1. Create the scratch buffer with ( size = sizeInfo.buildScratchSize)
  2. Continue to fill our buildInfo: buildInfo.scratchData.deviceAddress = scratchAddress;

IMPORTANT NOTE @devshgraphicsprogramming: Do we want to expose a function for vkGetBufferDeviceAddress? Since most of the functions and struct related to this raytracing extension works with deviceAddresses and not buffers and AS's directly. I think we can also take our Nabla objects and call vkGetBufferDeviceAddress behind the scenes.

Host/Device Commands for Build, CopyAStoAS, CopyASToMemory, CopyMemoryToAS, WriteProperties

Note for Host Commands:

Build AS

All function parameters are explained above

vkCmdBuildAccelerationStructuresKHR:

void vkCmdBuildAccelerationStructuresKHR(
    VkCommandBuffer                             commandBuffer,
    uint32_t                                    infoCount,
    const VkAccelerationStructureBuildGeometryInfoKHR* pInfos,
    const VkAccelerationStructureBuildRangeInfoKHR* const* ppBuildRangeInfos);

vkCmdBuildAccelerationStructuresIndirectKHR:

void vkCmdBuildAccelerationStructuresIndirectKHR(
    VkCommandBuffer                             commandBuffer,
    uint32_t                                    infoCount,
    const VkAccelerationStructureBuildGeometryInfoKHR* pInfos,
    const VkDeviceAddress*                      pIndirectDeviceAddresses,
    const uint32_t*                             pIndirectStrides,
    const uint32_t* const*                      ppMaxPrimitiveCounts);

vkBuildAccelerationStructuresKHR:

VkResult vkBuildAccelerationStructuresKHR(
    VkDevice                                    device,
    VkDeferredOperationKHR                      deferredOperation,
    uint32_t                                    infoCount,
    const VkAccelerationStructureBuildGeometryInfoKHR* pInfos,
    const VkAccelerationStructureBuildRangeInfoKHR* const* ppBuildRangeInfos);

Write Properties

vkCmdWriteAccelerationStructuresPropertiesKHR:

void vkCmdWriteAccelerationStructuresPropertiesKHR(
    VkCommandBuffer                             commandBuffer,
    uint32_t                                    accelerationStructureCount,
    const VkAccelerationStructureKHR*           pAccelerationStructures,
    VkQueryType                                 queryType,
    VkQueryPool                                 queryPool,
    uint32_t                                    firstQuery);

Note for write properties:

  1. We could expose a QueryType and QueryPool interface to the user but that would be very Vulkan Specific.
  2. We can also have functions for each QueryType and store QueryPool somewhere internal without exposing the interface to user. I suggest we go with 2. for example : QueryAccelerationStructuresCompactionSizes(....)

There is no need for QueryPool for the respective Host Operation

vkWriteAccelerationStructuresPropertiesKHR:

VkResult vkWriteAccelerationStructuresPropertiesKHR(
    VkDevice                                    device,
    uint32_t                                    accelerationStructureCount,
    const VkAccelerationStructureKHR*           pAccelerationStructures,
    VkQueryType                                 queryType,
    size_t                                      dataSize,
    void*                                       pData,
    size_t                                      stride);

Copy AS to AS

Example usage is when copying AS to CompactedAS

vkCmdCopyAccelerationStructureKHR:

void vkCmdCopyAccelerationStructureKHR(
    VkCommandBuffer                             commandBuffer,
    const VkCopyAccelerationStructureInfoKHR*   pInfo);

vkCopyAccelerationStructureKHR:

VkResult vkCopyAccelerationStructureKHR(
    VkDevice                                    device,
    VkDeferredOperationKHR                      deferredOperation,
    const VkCopyAccelerationStructureInfoKHR*   pInfo);

VkCopyAccelerationStructureInfoKHR:

    VkStructureType                       sType;
    const void*                           pNext;
    VkAccelerationStructureKHR            src;
    VkAccelerationStructureKHR            dst;
    VkCopyAccelerationStructureModeKHR    mode;

Important note for memory barriers

VkCopyAccelerationStructureModeKHR:

    VK_COPY_ACCELERATION_STRUCTURE_MODE_CLONE_KHR = 0,
    VK_COPY_ACCELERATION_STRUCTURE_MODE_COMPACT_KHR = 1,
    VK_COPY_ACCELERATION_STRUCTURE_MODE_SERIALIZE_KHR = 2,
    VK_COPY_ACCELERATION_STRUCTURE_MODE_DESERIALIZE_KHR = 3,

Copy AS To Memory

vkCmdCopyAccelerationStructureToMemoryKHR:

void vkCmdCopyAccelerationStructureToMemoryKHR(
    VkCommandBuffer                             commandBuffer,
    const VkCopyAccelerationStructureToMemoryInfoKHR* pInfo);

vkCopyAccelerationStructureToMemoryKHR:

VkResult vkCopyAccelerationStructureToMemoryKHR(
    VkDevice                                    device,
    VkDeferredOperationKHR                      deferredOperation,
    const VkCopyAccelerationStructureToMemoryInfoKHR* pInfo);

VkCopyAccelerationStructureToMemoryInfoKHR:

    VkStructureType                       sType;
    const void*                           pNext;
    VkAccelerationStructureKHR            src;
    VkDeviceOrHostAddressKHR              dst;
    VkCopyAccelerationStructureModeKHR    mode;

Copy Memory To AS

vkCopyMemoryToAccelerationStructureKHR:

VkResult vkCopyMemoryToAccelerationStructureKHR(
    VkDevice                                    device,
    VkDeferredOperationKHR                      deferredOperation,
    const VkCopyMemoryToAccelerationStructureInfoKHR* pInfo);

vkCmdCopyMemoryToAccelerationStructureKHR:

void vkCmdCopyMemoryToAccelerationStructureKHR(
    VkCommandBuffer                             commandBuffer,
    const VkCopyMemoryToAccelerationStructureInfoKHR* pInfo);

VkCopyMemoryToAccelerationStructureInfoKHR:

    VkStructureType                       sType;
    const void*                           pNext;
    VkDeviceOrHostAddressConstKHR         src;
    VkAccelerationStructureKHR            dst;
    VkCopyAccelerationStructureModeKHR    mode;

Comptability Check

To check if a serialized acceleration structure is compatible with the current device call. We need a function to use these functions and structs for the compatibility Check.

Creating AS

VkAccelerationStructureCreateInfoKHR:

    VkStructureType                          sType;
    const void*                              pNext;
    VkAccelerationStructureCreateFlagsKHR    createFlags;
    VkBuffer                                 buffer;
    VkDeviceSize                             offset;
    VkDeviceSize                             size;
    VkAccelerationStructureTypeKHR           type;
    VkDeviceAddress                          deviceAddress;

Enums Used

VkAccelerationStructureTypeKHR:

VK_ACCELERATION_STRUCTURE_TYPE_TOP_LEVEL_KHR = 0,
VK_ACCELERATION_STRUCTURE_TYPE_BOTTOM_LEVEL_KHR = 1,
VK_ACCELERATION_STRUCTURE_TYPE_GENERIC_KHR = 2,

VkDeviceOrHostAddressConstKHR:

typedef union VkDeviceOrHostAddressConstKHR {
VkDeviceAddress deviceAddress;
const void* hostAddress;
} VkDeviceOrHostAddressConstKHR;

Fill hostAddress when working with host side acceleration structure and fill in deviceAddress otherwise. Exposing this is a matter of choice, function could also take different inputs that might not need DeviceOrHostAddressConst (also has a non-const version) I suggest exposing it as an struct.

VkBuildAccelerationStructureModeKHR:

VK_BUILD_ACCELERATION_STRUCTURE_MODE_BUILD_KHR = 0,
VK_BUILD_ACCELERATION_STRUCTURE_MODE_UPDATE_KHR = 1,

We also could write different build/update AS functions.

VkBuildAccelerationStructureFlagBitsKHR:

VK_BUILD_ACCELERATION_STRUCTURE_ALLOW_UPDATE_BIT_KHR = 0x00000001,
VK_BUILD_ACCELERATION_STRUCTURE_ALLOW_COMPACTION_BIT_KHR = 0x00000002,
VK_BUILD_ACCELERATION_STRUCTURE_PREFER_FAST_TRACE_BIT_KHR = 0x00000004,
VK_BUILD_ACCELERATION_STRUCTURE_PREFER_FAST_BUILD_BIT_KHR = 0x00000008,
VK_BUILD_ACCELERATION_STRUCTURE_LOW_MEMORY_BIT_KHR = 0x00000010,

VkAccelerationStructureBuildTypeKHR:

    VK_ACCELERATION_STRUCTURE_BUILD_TYPE_HOST_KHR = 0,
    VK_ACCELERATION_STRUCTURE_BUILD_TYPE_DEVICE_KHR = 1,
    VK_ACCELERATION_STRUCTURE_BUILD_TYPE_HOST_OR_DEVICE_KHR = 2,

VkAccelerationStructureCreateFlagBitsKHR:

    VK_ACCELERATION_STRUCTURE_CREATE_DEVICE_ADDRESS_CAPTURE_REPLAY_BIT_KHR = 0x00000001,
  // Provided by VK_NV_ray_tracing_motion_blur
    VK_ACCELERATION_STRUCTURE_CREATE_MOTION_BIT_NV = 0x00000004,

VkGeometryInstanceFlagBitsKHR:

    VK_GEOMETRY_INSTANCE_TRIANGLE_FACING_CULL_DISABLE_BIT_KHR = 0x00000001,
    VK_GEOMETRY_INSTANCE_TRIANGLE_FLIP_FACING_BIT_KHR = 0x00000002,
    VK_GEOMETRY_INSTANCE_FORCE_OPAQUE_BIT_KHR = 0x00000004,
    VK_GEOMETRY_INSTANCE_FORCE_NO_OPAQUE_BIT_KHR = 0x00000008,
    VK_GEOMETRY_INSTANCE_TRIANGLE_FRONT_COUNTERCLOCKWISE_BIT_KHR = VK_GEOMETRY_INSTANCE_TRIANGLE_FLIP_FACING_BIT_KHR,
  // Provided by VK_NV_ray_tracing
    VK_GEOMETRY_INSTANCE_TRIANGLE_CULL_DISABLE_BIT_NV = VK_GEOMETRY_INSTANCE_TRIANGLE_FACING_CULL_DISABLE_BIT_KHR,
  // Provided by VK_NV_ray_tracing
    VK_GEOMETRY_INSTANCE_TRIANGLE_FRONT_COUNTERCLOCKWISE_BIT_NV = VK_GEOMETRY_INSTANCE_TRIANGLE_FRONT_COUNTERCLOCKWISE_BIT_KHR,
  // Provided by VK_NV_ray_tracing
    VK_GEOMETRY_INSTANCE_FORCE_OPAQUE_BIT_NV = VK_GEOMETRY_INSTANCE_FORCE_OPAQUE_BIT_KHR,
  // Provided by VK_NV_ray_tracing
    VK_GEOMETRY_INSTANCE_FORCE_NO_OPAQUE_BIT_NV = VK_GEOMETRY_INSTANCE_FORCE_NO_OPAQUE_BIT_KHR,

Deferred Operations

(Fill if needed to expose)

RayTracing Pipeline

vkCreateRayTracingPipelinesKHR:

VkResult vkCreateRayTracingPipelinesKHR(
    VkDevice                                    device,
    VkDeferredOperationKHR                      deferredOperation,
    VkPipelineCache                             pipelineCache,
    uint32_t                                    createInfoCount,
    const VkRayTracingPipelineCreateInfoKHR*    pCreateInfos,
    const VkAllocationCallbacks*                pAllocator,
    VkPipeline*                                 pPipelines);

VkRayTracingPipelineCreateInfoKHR:

    VkStructureType                                      sType;
    const void*                                          pNext;
    VkPipelineCreateFlags                                flags;
    uint32_t                                             stageCount;
    const VkPipelineShaderStageCreateInfo*               pStages;
    uint32_t                                             groupCount;
    const VkRayTracingShaderGroupCreateInfoKHR*          pGroups;
    uint32_t                                             maxPipelineRayRecursionDepth;
    const VkPipelineLibraryCreateInfoKHR*                pLibraryInfo;
    const VkRayTracingPipelineInterfaceCreateInfoKHR*    pLibraryInterface;
    const VkPipelineDynamicStateCreateInfo*              pDynamicState;
    VkPipelineLayout                                     layout;
    VkPipeline                                           basePipelineHandle;
    int32_t                                              basePipelineIndex;

VkRayTracingShaderGroupCreateInfoKHR:

    VkStructureType                   sType;
    const void*                       pNext;
    VkRayTracingShaderGroupTypeKHR    type;
    uint32_t                          generalShader;
    uint32_t                          closestHitShader;
    uint32_t                          anyHitShader;
    uint32_t                          intersectionShader;
    const void*                       pShaderGroupCaptureReplayHandle;

VkRayTracingShaderGroupTypeKHR:

    VK_RAY_TRACING_SHADER_GROUP_TYPE_GENERAL_KHR = 0,
    VK_RAY_TRACING_SHADER_GROUP_TYPE_TRIANGLES_HIT_GROUP_KHR = 1,
    VK_RAY_TRACING_SHADER_GROUP_TYPE_PROCEDURAL_HIT_GROUP_KHR = 2,

VK_RAY_TRACING_SHADER_GROUP_TYPE_GENERAL_KHR indicates a shader group with a single VK_SHADER_STAGE_RAYGEN_BIT_KHR, VK_SHADER_STAGE_MISS_BIT_KHR, or VK_SHADER_STAGE_CALLABLE_BIT_KHR shader in it.

VK_RAY_TRACING_SHADER_GROUP_TYPE_TRIANGLES_HIT_GROUP_KHR specifies a shader group that only hits triangles and must not contain an intersection shader, only closest hit and any-hit shaders.

VK_RAY_TRACING_SHADER_GROUP_TYPE_PROCEDURAL_HIT_GROUP_KHR specifies a shader group that only intersects with custom geometry and must contain an intersection shader

Pipeline Library

Should we add and handle VK_KHR_pipeline_library extension?

A pipeline library is a special pipeline that cannot be bound, instead it defines a set of shaders and shader groups which can be linked into other pipelines. This extension defines the infrastructure for pipeline libraries, but does not specify the creation or usage of pipeline libraries. This is left to additional dependent extensions.

VK_KHR_pipeline_library a soft requirement for VK_KHR_ray_tracing_pipeline instead of a strict requirement, so applications only need to enable it if they are actually using it.

Shader Binding Table

In order to build Buffer of Opaque ShaderGroupHandles (+ probable ShaderRecordData)

vkGetRayTracingShaderGroupHandlesKHR:

VkResult vkGetRayTracingShaderGroupHandlesKHR(
    VkDevice                                    device,
    VkPipeline                                  pipeline,
    uint32_t                                    firstGroup,
    uint32_t                                    groupCount,
    size_t                                      dataSize,
    void*                                       pData);

This is the only function needed (with no helper functions) to construct the ShaderBindingTable. shaderGroupHandleSize and shaderGroupBaseAlignment will be taken into consideration when constructing the SBT Buffer and computing offset for vkCmdTraceRaysKHR.

Also we could have a wrapper/helper class for SBT that does all the computation and construction of SBT Buffers for each ShaderGroupType (ragen, miss, hit, callable). And helps with the invocation of vkCmdTraceRaysKHR

Ray Tracing Pipeline Stack

Ray tracing pipelines have a potentially large set of shaders which may be invoked in various call chain combinations to perform ray tracing. To store parameters for a given shader execution, an implementation may use a stack of data in memory. This stack must be sized to the sum of the stack sizes of all shaders in any call chain executed by the application

For example, if an application has two types of closest hit and miss shaders that it can use but the first level of rays will only use the first kind (possibly reflection) and the second level will only use the second kind (occlusion or shadow ray, for example) then the application can compute the stack size by something similar to: rayGenStack + max(closestHit1Stack, miss1Stack) + max(closestHit2Stack, miss2Stack

In order to get/set Stack Sizes:

vkGetRayTracingShaderGroupStackSizeKHR:

VkDeviceSize vkGetRayTracingShaderGroupStackSizeKHR(
    VkDevice                                    device,
    VkPipeline                                  pipeline,
    uint32_t                                    group,
    VkShaderGroupShaderKHR                      groupShader);

vkCmdSetRayTracingPipelineStackSizeKHR:

void vkCmdSetRayTracingPipelineStackSizeKHR(
    VkCommandBuffer                             commandBuffer,
    uint32_t                                    pipelineStackSize);

VkShaderGroupShaderKHR is just an enum :

VkShaderGroupShaderKHR:

    VK_SHADER_GROUP_SHADER_GENERAL_KHR = 0,
    VK_SHADER_GROUP_SHADER_CLOSEST_HIT_KHR = 1,
    VK_SHADER_GROUP_SHADER_ANY_HIT_KHR = 2,
    VK_SHADER_GROUP_SHADER_INTERSECTION_KHR = 3,

RayTracing Commands

vkCmdTraceRaysKHR:

void vkCmdTraceRaysKHR(
    VkCommandBuffer                             commandBuffer,
    const VkStridedDeviceAddressRegionKHR*      pRaygenShaderBindingTable,
    const VkStridedDeviceAddressRegionKHR*      pMissShaderBindingTable,
    const VkStridedDeviceAddressRegionKHR*      pHitShaderBindingTable,
    const VkStridedDeviceAddressRegionKHR*      pCallableShaderBindingTable,
    uint32_t                                    width,
    uint32_t                                    height,
    uint32_t                                    depth);

VkStridedDeviceAddressRegionKHR:

typedef struct VkStridedDeviceAddressRegionKHR {
    VkDeviceAddress    deviceAddress;
    VkDeviceSize       stride;
    VkDeviceSize       size;
} VkStridedDeviceAddressRegionKHR;

Indirect Trace Rays

vkCmdTraceRaysIndirectKHR:

void vkCmdTraceRaysIndirectKHR(
    VkCommandBuffer                             commandBuffer,
    const VkStridedDeviceAddressRegionKHR*      pRaygenShaderBindingTable,
    const VkStridedDeviceAddressRegionKHR*      pMissShaderBindingTable,
    const VkStridedDeviceAddressRegionKHR*      pHitShaderBindingTable,
    const VkStridedDeviceAddressRegionKHR*      pCallableShaderBindingTable,
    VkDeviceAddress                             indirectDeviceAddress);

VkTraceRaysIndirectCommandKHR:

typedef struct VkTraceRaysIndirectCommandKHR {
    uint32_t    width;
    uint32_t    height;
    uint32_t    depth;
} VkTraceRaysIndirectCommandKHR;
devshgraphicsprogramming commented 3 years ago

It has accelerationStructureHostCommands that indicates whether the implementation supports host side acceleration structure commands: (vkBuildAccelerationStructuresKHR, vkCopyAccelerationStructureKHR, vkCopyAccelerationStructureToMemoryKHR, vkCopyMemoryToAccelerationStructureKHR, and vkWriteAccelerationStructuresPropertiesKHR)

Support is optional for all 5 (host) or all 10 (device and host).

devshgraphicsprogramming commented 3 years ago

VkDeviceOrHostAddressConstKHR

what decides which one this is?

what function I call? (Cmd vs no Cmd)

Erfan-Ahmadi commented 3 years ago

Support is optional for all 5 (host) or all 10 (device and host).

If you enable VK_KHR_acceleration_structure extension, It enables you to use the device functions. But in order to use the host functions you must enable accelerationStructureHostCommands feature. (after checking the physical device supports It)

VkSpec for vkCopyAccelerationStructureToMemoryKHR:

VUID-vkCopyAccelerationStructureToMemoryKHR-accelerationStructureHostCommands-03584 The VkPhysicalDeviceAccelerationStructureFeaturesKHR::accelerationStructureHostCommands feature must be enabled

devshgraphicsprogramming commented 3 years ago

weird but ok, kinda hard to extract "hard" dependencies (i.e. you just have at least one host or device thing supported)

devshgraphicsprogramming commented 3 years ago

IMPORTANT NOTE @devshgraphicsprogramming: Do we want to expose a function for vkGetBufferDeviceAddress? Since most of the functions and struct related to this raytracing extension works with deviceAddresses and not buffers and AS's directly. I think we can also take our Nabla objects and call vkGetBufferDeviceAddress behind the scenes.

So raytracing requires BDA?

Erfan-Ahmadi commented 3 years ago

VkDeviceOrHostAddressConstKHR

what decides which one this is?

what function I call? (Cmd vs no Cmd)

Yes, Vulkan takes VkDeviceOrHostAddressConstKHR for Infos like VkAccelerationStructureGeometryInstancesDataKHR or VkCopyMemoryToAccelerationStructureInfoKHR but if you're using the host function the hostAddress must be filled and if you're using device functions (with Cmd), the deviceAddress must be filled.

devshgraphicsprogramming commented 3 years ago

Vulkan takes VkDeviceOrHostAddressConstKHR for Infos like VkAccelerationStructureGeometryInstancesDataKHR or VkCopyMemoryToAccelerationStructureInfoKHR but if you're using the host function the hostAddress must be filled and if you're using device functions (with Cmd), the deviceAddress must be filled.

sounds like a thing to solve with C++ templates template<typename address_type_t>

then IGPUCommandBuffer methods would use stuff with <const buffer_device_address_t> and ILogicalDevice methods would use <const void*>

Erfan-Ahmadi commented 3 years ago

So raytracing requires BDA?

Yes,

VK_KHR_ray_tracing_pipeline requires VK_KHR_acceleration_structure

and VK_KHR_acceleration_structure

Requires Vulkan 1.1 Requires VK_EXT_descriptor_indexing Requires VK_KHR_buffer_device_address Requires VK_KHR_deferred_host_operations

See https://www.khronos.org/registry/vulkan/specs/1.2-extensions/man/html/VK_KHR_ray_tracing_pipeline.html

devshgraphicsprogramming commented 3 years ago

Do we have any guarantees on whether host commands or device commands will always be available?

devshgraphicsprogramming commented 3 years ago

from my reading it looks like host commands are optional, but device commands are always there

what queue do we need to dispatch the device commands?

Erfan-Ahmadi commented 3 years ago

Do we have any guarantees on whether host commands or device commands will always be available?

These CPU-based commands are optional, but the device versions of these commands (vkCmd*) are always supported.

More info here: https://www.khronos.org/registry/vulkan/specs/1.2-extensions/html/vkspec.html#host-acceleration-structure

Erfan-Ahmadi commented 3 years ago

from my reading it looks like host commands are optional, but device commands are always there

what queue do we need to dispatch the device commands?

Good question, Any queue that supports compute

• VUID-vkCmdBuildAccelerationStructuresKHR-commandBuffer-cmdpool The VkCommandPool that commandBuffer was allocated from must support compute operations image

devshgraphicsprogramming commented 3 years ago

from my reading it looks like host commands are optional, but device commands are always there what queue do we need to dispatch the device commands?

Good question, Any queue that supports compute

• VUID-vkCmdBuildAccelerationStructuresKHR-commandBuffer-cmdpool The VkCommandPool that commandBuffer was allocated from must support compute operations image

ok so its just like computing mip-maps, just do it on the compute queue.

devshgraphicsprogramming commented 3 years ago

Do we have any guarantees on whether host commands or device commands will always be available?

These CPU-based commands are optional, but the device versions of these commands (vkCmd*) are always supported.

More info here: https://www.khronos.org/registry/vulkan/specs/1.2-extensions/html/vkspec.html#host-acceleration-structure

I think the cpu2gpu object converter should try and use the host methods to build the AS (massively parallel building BVHs produces them faster but they're lower quality)

The initial AS should be in HOST_CACHED non device local memory, and then host-copied and compacted to unmappable DEVICE_LOCAL (copy AS to AS).

Erfan-Ahmadi commented 3 years ago

ok so its just like computing mip-maps, just do it on the compute queue.

Shouldn't user allocate the command buffer from compute queue and give it to the functions I provided as a parameter? I think I only can validate if the cmdBuffer is from a supported (compute) queue

devshgraphicsprogramming commented 3 years ago

ok so its just like computing mip-maps, just do it on the compute queue.

Shouldn't user allocate the command buffer from compute queue and give it to the functions I provided as a parameter? I think I only can validate if the cmdBuffer is from a supported (compute) queue

cpu2gpu converter already has this option/works this way

Erfan-Ahmadi commented 3 years ago

cpu2gpu converter already has this option/works this way

Understood

devshgraphicsprogramming commented 3 years ago

Because we are taking steps towards threading the cpu2gpu conversion and asset loading, we should expose Deferred operations.

Maybe ILogicalDevice could hand out core::smart_refctd_ptr<ILogicalDevice::IDeferredOperation> which are placement new allocated on a CMemoryPool like the one @achalpandeyy is using for commmandbuffers (lets not murder the heap)

Then IDeferredOperation could have join and get as methods (and a wait built on top of get which also forces at least one join), then its destructor and the refcounting could ensure that we dont vk-destroy and incomplete operation.

devshgraphicsprogramming commented 3 years ago

deviceAddress in these function parameters is related to accelerationStructureCaptureReplay and this optional functionality is intended to be used by tools and not by applications directly.

We'll definitely be using NSight a lot, and Renderdoc whenever it starts supporting raytracing. So we need this.

devshgraphicsprogramming commented 3 years ago

We will not support serializing Device & Driver Version dependent Acceleration Structures (we dont really support downloading compiled shaders back from the driver for faster loading either), any time soon....

So no need to worry about that.

Erfan-Ahmadi commented 3 years ago

So no need to worry about that.

I believe you're refering to the Comptability Check section?

devshgraphicsprogramming commented 3 years ago

serialization and deserialization in general.

devshgraphicsprogramming commented 3 years ago

All the Acceleration Structure flags are REALLY IMPORTANT and should be exposed

The most important things for performance are the abilitity to build a single unified AS (no TLAS, everything is one BLAS) with no/little instancing.

Second most important is the DXR no-anyhit-shader flag.

And backface triangle culling is actually more expensive being enabled in raytracing

There's also an important correctness (not perf) flag about whether anyhit shaders should only be called once per primitive.

Erfan-Ahmadi commented 3 years ago

The most important things for performance are the abilitity to build a single unified AS (no TLAS, everything is one BLAS) with no/little instancing.

Note that in Vulkan's Perspective you cannot bind BLAS directly as a descriptor, you should always create a TLAS.

See Vulkan Spec:

VUID-VkWriteDescriptorSetAccelerationStructureKHR-pAccelerationStructures-03579 Each acceleration structure in pAccelerationStructures must have been created with a type of VK_ACCELERATION_STRUCTURE_TYPE_TOP_LEVEL_KHR or VK_ACCELERATION_STRUCTURE_TYPE_GENERIC_KHR

You might wonder what VK_ACCELERATION_STRUCTURE_TYPE_GENERIC_KHR is. Vulkan Spec Also answers that in the issues section:

(5) What is VK_ACCELERATION_STRUCTURE_TYPE_GENERIC_KHR for? RESOLVED: It is primarily intended for API layering. In DXR, the acceleration structure is basically just a buffer in a special layout, and you don’t know at creation time whether it will be used as a top or bottom level acceleration structure. We thus added a generic acceleration structure type whose type is unknown at creation time, but is specified at build time instead. Applications which are written directly for Vulkan should not use it

Erfan-Ahmadi commented 3 years ago

All the Acceleration Structure flags are REALLY IMPORTANT and should be exposed

I agree.

devshgraphicsprogramming commented 3 years ago

These should be the defaults for cpu2gpu conversion and anything else that doesnt get overriden by explicit user choice

VK_BUILD_ACCELERATION_STRUCTURE_ALLOW_COMPACTION_BIT_KHR
VK_BUILD_ACCELERATION_STRUCTURE_PREFER_FAST_TRACE_BIT_KHR
VK_ACCELERATION_STRUCTURE_BUILD_TYPE_HOST_KHR // if feature present, otherwise device only

If there's a sign that the geometry could be animated (such as a meshbuffer having boneor animation info), use these instead

VK_BUILD_ACCELERATION_STRUCTURE_ALLOW_UPDATE_BIT_KHR
VK_BUILD_ACCELERATION_STRUCTURE_PREFER_FAST_BUILD_BIT_KHR 
VK_ACCELERATION_STRUCTURE_BUILD_TYPE_DEVICE_KHR 
// later on
VK_ACCELERATION_STRUCTURE_CREATE_MOTION_BIT_NV // if VK_NV_ray_tracing_motion_blur present

add VK_ACCELERATION_STRUCTURE_CREATE_DEVICE_ADDRESS_CAPTURE_REPLAY_BIT_KHR if you detect Nsight.

devshgraphicsprogramming commented 3 years ago

Note that in Vulkan's Perspective you cannot bind BLAS directly as a descriptor, you should always create a TLAS.

Potatoe, potato

I presume there's an option to create a TLAS without any BLASes?

Erfan-Ahmadi commented 3 years ago

I presume there's an option to create a TLAS without any BLASes?

Unfortunately I don't think so.

• VUID-VkAccelerationStructureBuildGeometryInfoKHR-type-03789 If type is VK_ACCELERATION_STRUCTURE_TYPE_TOP_LEVEL_KHR, the geometryType member of elements of either pGeometries or ppGeometries must be VK_GEOMETRY_TYPE_INSTANCES_KHR • VUID-VkAccelerationStructureBuildGeometryInfoKHR-type-03790 If type is VK_ACCELERATION_STRUCTURE_TYPE_TOP_LEVEL_KHR, geometryCount must be 1

The geometry type must be VK_GEOMETRY_TYPE_INSTANCES_KHR which is instances of other Acceleration Structures

https://www.khronos.org/registry/vulkan/specs/1.2-extensions/man/html/VkAccelerationStructureInstanceKHR.html

devshgraphicsprogramming commented 3 years ago

hmm but I've seen and heard of people tracing just the BLAS for maximum gains in static scenes in DXR/OptiX.

So what do we do then, TLAS with a single instance? No better way to do it?

Erfan-Ahmadi commented 3 years ago

hmm but I've seen and heard of people tracing just the BLAS for maximum gains in static scenes in DXR/OptiX.

So what do we do then, TLAS with a single instance? No better way to do it?

The simplest case would be 1 BLAS and 1 TLAS with 1 instance refering to the BLAS.

Other than Vulkan Spec you could also see the nvpro_samples which provide a good vision on how one must work with these structs and functions: https://github.com/nvpro-samples/vk_raytracing_tutorial_KHR It uses it all I believe in the projects.

devshgraphicsprogramming commented 3 years ago

cpu2gpu converter should use host commands whenever VK_ACCELERATION_STRUCTURE_BUILD_TYPE_HOST_KHR or VK_ACCELERATION_STRUCTURE_BUILD_TYPE_HOST_OR_DEVICE_KHR are specified, unless the Vulkan implementation has no support for host commands.

In that case the type should be overwritten with VK_ACCELERATION_STRUCTURE_BUILD_TYPE_DEVICE_KHR and the build happen on a compute queue.

Erfan-Ahmadi commented 3 years ago

Do we have any guarantees on whether host commands or device commands will always be available?

These CPU-based commands are optional, but the device versions of these commands (vkCmd*) are always supported. More info here: https://www.khronos.org/registry/vulkan/specs/1.2-extensions/html/vkspec.html#host-acceleration-structure

I think the cpu2gpu object converter should try and use the host methods to build the AS (massively parallel building BVHs produces them faster but they're lower quality)

The initial AS should be in HOST_CACHED non device local memory, and then host-copied and compacted to unmappable DEVICE_LOCAL (copy AS to AS).

@devshgraphicsprogramming I just realized this might not be possible because "Building" is different than "Creating" When ICPUAS->IGPUAS It's just basically this in Vulkan's Perspective: vkCreateAccelerationStructureKHR

And You cannot "build" a ICPUAccelerationStructure, because the accelerationStructure must always be created with a IGPUBuffer/VkBuffer (must reside in device mem) as you can see in the in VulkanAccelerationStructureCreateInfo

There are only "hostBuildCommands" which help you get your host or device data (meshes, instances, .. etc) get built into the VkAccelerationStructureKHR

devshgraphicsprogramming commented 3 years ago

@devshgraphicsprogramming I just realized this might not be possible because "Building" is different than "Creating" When ICPUAS->IGPUAS It's just basically this in Vulkan's Perspective: vkCreateAccelerationStructureKHR

Yes, look at ICPUImage->IGPUImage:

  1. ICPUImage holds parameters to create and "build" (in this case copy-to/fill) an image
  2. the ICPUBuffers sourcing the "build" parameters are converted to IGPUBuffers
  3. IGPUImage discards (drops references to) the buffers which held the regions it was built/filled from

Yes, CPU2GPU will both vkCreateAccelerationStructureKHR and vkBuildAccelerationStructuresKHR/vkCmdBuildAccelerationStructuresKHR

And You cannot "build" a ICPUAccelerationStructure, because the accelerationStructure must always be created with a IGPUBuffer/VkBuffer (must reside in device mem) as you can see in the in VulkanAccelerationStructureCreateInfo

I never asked for an ICPUAccelerationStructure which has had any building performed on it, I asked for it to just hold parameters required for building (much how ICPUImage holds paramters for filling).

I know that the AS needs to reside in a VkBuffer, what I asked for is "when you allocate the vkMemory to bind to the vkBuffer, and you're using host commands for building the AS, the heap you allocate from should not be be Device Local (to make CPU access fast and easy), but then to make GPU traversal fast and easy, the AS should be copied to another vkBuffer which has bound to vkMemory that is on a Device Local heap".

I dont see any constraints in VkAccelerationStructureCreateInfoKHR stating that the vkBuffer must be bound to memory from a particular heap, only that it cannot be sparse.