Open Erfan-Ahmadi opened 3 years ago
It has accelerationStructureHostCommands that indicates whether the implementation supports host side acceleration structure commands: (vkBuildAccelerationStructuresKHR, vkCopyAccelerationStructureKHR, vkCopyAccelerationStructureToMemoryKHR, vkCopyMemoryToAccelerationStructureKHR, and vkWriteAccelerationStructuresPropertiesKHR)
Support is optional for all 5 (host) or all 10 (device and host).
VkDeviceOrHostAddressConstKHR
what decides which one this is?
what function I call? (Cmd vs no Cmd)
Support is optional for all 5 (host) or all 10 (device and host).
If you enable VK_KHR_acceleration_structure
extension, It enables you to use the device functions.
But in order to use the host functions you must enable accelerationStructureHostCommands
feature. (after checking the physical device supports It)
VkSpec for vkCopyAccelerationStructureToMemoryKHR
:
VUID-vkCopyAccelerationStructureToMemoryKHR-accelerationStructureHostCommands-03584 The VkPhysicalDeviceAccelerationStructureFeaturesKHR::accelerationStructureHostCommands feature must be enabled
weird but ok, kinda hard to extract "hard" dependencies (i.e. you just have at least one host or device thing supported)
IMPORTANT NOTE @devshgraphicsprogramming: Do we want to expose a function for vkGetBufferDeviceAddress? Since most of the functions and struct related to this raytracing extension works with deviceAddresses and not buffers and AS's directly. I think we can also take our Nabla objects and call vkGetBufferDeviceAddress behind the scenes.
So raytracing requires BDA?
VkDeviceOrHostAddressConstKHR
what decides which one this is?
what function I call? (Cmd vs no Cmd)
Yes,
Vulkan takes VkDeviceOrHostAddressConstKHR
for Infos like VkAccelerationStructureGeometryInstancesDataKHR
or VkCopyMemoryToAccelerationStructureInfoKHR
but if you're using the host function the hostAddress must be filled and if you're using device functions (with Cmd), the deviceAddress must be filled.
Vulkan takes VkDeviceOrHostAddressConstKHR for Infos like VkAccelerationStructureGeometryInstancesDataKHR or VkCopyMemoryToAccelerationStructureInfoKHR but if you're using the host function the hostAddress must be filled and if you're using device functions (with Cmd), the deviceAddress must be filled.
sounds like a thing to solve with C++ templates
template<typename address_type_t>
then IGPUCommandBuffer
methods would use stuff with <const buffer_device_address_t>
and ILogicalDevice
methods would use <const void*>
So raytracing requires BDA?
Yes,
VK_KHR_ray_tracing_pipeline
requires VK_KHR_acceleration_structure
and VK_KHR_acceleration_structure
Requires Vulkan 1.1 Requires VK_EXT_descriptor_indexing Requires VK_KHR_buffer_device_address Requires VK_KHR_deferred_host_operations
Do we have any guarantees on whether host commands or device commands will always be available?
from my reading it looks like host commands are optional, but device commands are always there
what queue do we need to dispatch the device commands?
Do we have any guarantees on whether host commands or device commands will always be available?
These CPU-based commands are optional, but the device versions of these commands (vkCmd*) are always supported.
More info here: https://www.khronos.org/registry/vulkan/specs/1.2-extensions/html/vkspec.html#host-acceleration-structure
from my reading it looks like host commands are optional, but device commands are always there
what queue do we need to dispatch the device commands?
Good question, Any queue that supports compute
• VUID-vkCmdBuildAccelerationStructuresKHR-commandBuffer-cmdpool The VkCommandPool that commandBuffer was allocated from must support compute operations
from my reading it looks like host commands are optional, but device commands are always there what queue do we need to dispatch the device commands?
Good question, Any queue that supports compute
• VUID-vkCmdBuildAccelerationStructuresKHR-commandBuffer-cmdpool The VkCommandPool that commandBuffer was allocated from must support compute operations
ok so its just like computing mip-maps, just do it on the compute queue.
Do we have any guarantees on whether host commands or device commands will always be available?
These CPU-based commands are optional, but the device versions of these commands (vkCmd*) are always supported.
More info here: https://www.khronos.org/registry/vulkan/specs/1.2-extensions/html/vkspec.html#host-acceleration-structure
I think the cpu2gpu object converter should try and use the host methods to build the AS (massively parallel building BVHs produces them faster but they're lower quality)
The initial AS should be in HOST_CACHED non device local memory, and then host-copied and compacted to unmappable DEVICE_LOCAL (copy AS to AS).
ok so its just like computing mip-maps, just do it on the compute queue.
Shouldn't user allocate the command buffer from compute queue and give it to the functions I provided as a parameter? I think I only can validate if the cmdBuffer is from a supported (compute) queue
ok so its just like computing mip-maps, just do it on the compute queue.
Shouldn't user allocate the command buffer from compute queue and give it to the functions I provided as a parameter? I think I only can validate if the cmdBuffer is from a supported (compute) queue
cpu2gpu converter already has this option/works this way
cpu2gpu converter already has this option/works this way
Understood
Because we are taking steps towards threading the cpu2gpu conversion and asset loading, we should expose Deferred operations.
Maybe ILogicalDevice could hand out core::smart_refctd_ptr<ILogicalDevice::IDeferredOperation>
which are placement new allocated on a CMemoryPool like the one @achalpandeyy is using for commmandbuffers (lets not murder the heap)
Then IDeferredOperation
could have join
and get
as methods (and a wait
built on top of get
which also forces at least one join
), then its destructor and the refcounting could ensure that we dont vk-destroy and incomplete operation.
deviceAddress in these function parameters is related to accelerationStructureCaptureReplay and this optional functionality is intended to be used by tools and not by applications directly.
We'll definitely be using NSight a lot, and Renderdoc whenever it starts supporting raytracing. So we need this.
We will not support serializing Device & Driver Version dependent Acceleration Structures (we dont really support downloading compiled shaders back from the driver for faster loading either), any time soon....
So no need to worry about that.
So no need to worry about that.
I believe you're refering to the Comptability Check
section?
serialization and deserialization in general.
The most important things for performance are the abilitity to build a single unified AS (no TLAS, everything is one BLAS) with no/little instancing.
Second most important is the DXR no-anyhit-shader flag.
And backface triangle culling is actually more expensive being enabled in raytracing
There's also an important correctness (not perf) flag about whether anyhit shaders should only be called once per primitive.
The most important things for performance are the abilitity to build a single unified AS (no TLAS, everything is one BLAS) with no/little instancing.
Note that in Vulkan's Perspective you cannot bind BLAS directly as a descriptor, you should always create a TLAS.
See Vulkan Spec:
VUID-VkWriteDescriptorSetAccelerationStructureKHR-pAccelerationStructures-03579 Each acceleration structure in pAccelerationStructures must have been created with a type of VK_ACCELERATION_STRUCTURE_TYPE_TOP_LEVEL_KHR or VK_ACCELERATION_STRUCTURE_TYPE_GENERIC_KHR
You might wonder what VK_ACCELERATION_STRUCTURE_TYPE_GENERIC_KHR
is.
Vulkan Spec Also answers that in the issues section:
(5) What is VK_ACCELERATION_STRUCTURE_TYPE_GENERIC_KHR for? RESOLVED: It is primarily intended for API layering. In DXR, the acceleration structure is basically just a buffer in a special layout, and you don’t know at creation time whether it will be used as a top or bottom level acceleration structure. We thus added a generic acceleration structure type whose type is unknown at creation time, but is specified at build time instead. Applications which are written directly for Vulkan should not use it
All the Acceleration Structure flags are REALLY IMPORTANT and should be exposed
I agree.
These should be the defaults for cpu2gpu conversion and anything else that doesnt get overriden by explicit user choice
VK_BUILD_ACCELERATION_STRUCTURE_ALLOW_COMPACTION_BIT_KHR
VK_BUILD_ACCELERATION_STRUCTURE_PREFER_FAST_TRACE_BIT_KHR
VK_ACCELERATION_STRUCTURE_BUILD_TYPE_HOST_KHR // if feature present, otherwise device only
If there's a sign that the geometry could be animated (such as a meshbuffer having boneor animation info), use these instead
VK_BUILD_ACCELERATION_STRUCTURE_ALLOW_UPDATE_BIT_KHR
VK_BUILD_ACCELERATION_STRUCTURE_PREFER_FAST_BUILD_BIT_KHR
VK_ACCELERATION_STRUCTURE_BUILD_TYPE_DEVICE_KHR
// later on
VK_ACCELERATION_STRUCTURE_CREATE_MOTION_BIT_NV // if VK_NV_ray_tracing_motion_blur present
add VK_ACCELERATION_STRUCTURE_CREATE_DEVICE_ADDRESS_CAPTURE_REPLAY_BIT_KHR
if you detect Nsight.
Note that in Vulkan's Perspective you cannot bind BLAS directly as a descriptor, you should always create a TLAS.
Potatoe, potato
I presume there's an option to create a TLAS without any BLASes?
I presume there's an option to create a TLAS without any BLASes?
Unfortunately I don't think so.
• VUID-VkAccelerationStructureBuildGeometryInfoKHR-type-03789 If type is VK_ACCELERATION_STRUCTURE_TYPE_TOP_LEVEL_KHR, the geometryType member of elements of either pGeometries or ppGeometries must be VK_GEOMETRY_TYPE_INSTANCES_KHR • VUID-VkAccelerationStructureBuildGeometryInfoKHR-type-03790 If type is VK_ACCELERATION_STRUCTURE_TYPE_TOP_LEVEL_KHR, geometryCount must be 1
The geometry type must be VK_GEOMETRY_TYPE_INSTANCES_KHR
which is instances of other Acceleration Structures
hmm but I've seen and heard of people tracing just the BLAS for maximum gains in static scenes in DXR/OptiX.
So what do we do then, TLAS with a single instance? No better way to do it?
hmm but I've seen and heard of people tracing just the BLAS for maximum gains in static scenes in DXR/OptiX.
So what do we do then, TLAS with a single instance? No better way to do it?
The simplest case would be 1 BLAS and 1 TLAS with 1 instance refering to the BLAS.
Other than Vulkan Spec you could also see the nvpro_samples
which provide a good vision on how one must work with these structs and functions: https://github.com/nvpro-samples/vk_raytracing_tutorial_KHR
It uses it all I believe in the projects.
cpu2gpu converter should use host commands whenever VK_ACCELERATION_STRUCTURE_BUILD_TYPE_HOST_KHR
or VK_ACCELERATION_STRUCTURE_BUILD_TYPE_HOST_OR_DEVICE_KHR
are specified, unless the Vulkan implementation has no support for host commands.
In that case the type should be overwritten with VK_ACCELERATION_STRUCTURE_BUILD_TYPE_DEVICE_KHR
and the build happen on a compute queue.
Do we have any guarantees on whether host commands or device commands will always be available?
These CPU-based commands are optional, but the device versions of these commands (vkCmd*) are always supported. More info here: https://www.khronos.org/registry/vulkan/specs/1.2-extensions/html/vkspec.html#host-acceleration-structure
I think the cpu2gpu object converter should try and use the host methods to build the AS (massively parallel building BVHs produces them faster but they're lower quality)
The initial AS should be in HOST_CACHED non device local memory, and then host-copied and compacted to unmappable DEVICE_LOCAL (copy AS to AS).
@devshgraphicsprogramming I just realized this might not be possible because "Building" is different than "Creating" When ICPUAS->IGPUAS It's just basically this in Vulkan's Perspective: vkCreateAccelerationStructureKHR
And You cannot "build" a ICPUAccelerationStructure
, because the accelerationStructure must always be created with a IGPUBuffer
/VkBuffer
(must reside in device mem) as you can see in the in VulkanAccelerationStructureCreateInfo
There are only "hostBuildCommands" which help you get your host or device data (meshes, instances, .. etc) get built into the VkAccelerationStructureKHR
@devshgraphicsprogramming I just realized this might not be possible because "Building" is different than "Creating" When ICPUAS->IGPUAS It's just basically this in Vulkan's Perspective: vkCreateAccelerationStructureKHR
Yes, look at ICPUImage->IGPUImage:
Yes, CPU2GPU will both vkCreateAccelerationStructureKHR
and vkBuildAccelerationStructuresKHR/vkCmdBuildAccelerationStructuresKHR
And You cannot "build" a ICPUAccelerationStructure, because the accelerationStructure must always be created with a IGPUBuffer/VkBuffer (must reside in device mem) as you can see in the in VulkanAccelerationStructureCreateInfo
I never asked for an ICPUAccelerationStructure
which has had any building performed on it, I asked for it to just hold parameters required for building (much how ICPUImage
holds paramters for filling).
I know that the AS needs to reside in a VkBuffer, what I asked for is "when you allocate the vkMemory to bind to the vkBuffer, and you're using host commands for building the AS, the heap you allocate from should not be be Device Local (to make CPU access fast and easy), but then to make GPU traversal fast and easy, the AS should be copied to another vkBuffer which has bound to vkMemory that is on a Device Local heap".
I dont see any constraints in VkAccelerationStructureCreateInfoKHR
stating that the vkBuffer
must be bound to memory from a particular heap, only that it cannot be sparse.
This is a documentation of the vulkan objects that are part of
VK_KHR_ray_tracing_pipeline
we may want to expose and work with in Nabla.vkCmdBuildAccelerationStructuresKHR
andvkBuildAccelerationStructuresKHR
)Extension and Properties
VkPhysicalDeviceRayTracingPipelinePropertiesKHR
VkPhysicalDeviceRayTracingPipelineFeaturesKHR
:These are needed for:
shaderGroupHandleSize
shaderGroupBaseAlignment
,maxShaderGroupStride
,shaderGroupHandleAlignment
)maxRayHitAttributeSize
maxRayRecursionDepth
maxRayDispatchInvocationCount
)We must also expose some of these physical device values for the use to work with; If user wants to do everything low-level and manually (eg creating ShaderBindingTable Buffer)
VkPhysicalDeviceAccelerationStructureFeaturesKHR
:It has
accelerationStructureHostCommands
that indicates whether the implementation supports host side acceleration structure commands: (vkBuildAccelerationStructuresKHR
,vkCopyAccelerationStructureKHR
,vkCopyAccelerationStructureToMemoryKHR
,vkCopyMemoryToAccelerationStructureKHR
, andvkWriteAccelerationStructuresPropertiesKHR
)Just put
Cmd
aftervk
to make the functions abovedevice
functions instead of host ones.Acceleration Structures (Creation, Build, Compaction, Copy, Related Enums and Structs...)
Geometry:
VkAccelerationStructureGeometryKHR
:geometry member is a union of three other structs:
Triangles
specifies a geometry type consisting of triangles (used when building blas from vertex buffer)AABBs
geometry type consisting of axis-aligned bounding boxes. (used when working with custom primitives that need a custom Intersection shader)Instances
a geometry type consisting of acceleration structure instances. (used when building tlas frmo blas instances)1.
VkAccelerationStructureGeometryTrianglesDataKHR
:2.
VkAccelerationStructureGeometryAabbsDataKHR
:VkAabbPositionsKHR
:3.
VkAccelerationStructureGeometryInstancesDataKHR
:VkAccelerationStructureInstanceKHR
structures or packed motion instance information as described in motion instances if arrayOfPointers is VK_TRUE, or the address of an array of VkAccelerationStructureInstanceKHR or VkAccelerationStructureMotionInstanceNV structures. Addresses and VkAccelerationStructureInstanceKHR structures are tightly packed.VkAccelerationStructureMotionInstanceNV
have a stride of 160 bytes.VkAccelerationStructureInstanceKHR
:accelerationStructureReference is either:
VkAccelerationStructureKHR
object (used by host operations which reference acceleration structures).VkGeometryTypeKHR
:VkGeometryFlagBitsKHR
:Acceleration Structures are built
createdfrom:VkAccelerationStructureGeometryKHR
) filled in aVkAccelerationStructureBuildGeometryInfoKHR
(referenced later in this text)VkAccelerationStructureBuildRangeInfoKHR
)VkAccelerationStructureBuildRangeInfoKHR
:geometry.triangles
andBuildRangeInfo
is similar to the relation betweenvertexBuffer+inputAttributes
and parameters ofvkCmdDraw
In the case of triangle geometry,
primitiveCount
is the number of triangles.VkAccelerationStructureBuildGeometryInfoKHR
:Most of the members are clear enough and explained in the spec, there is only a few notes:
vkGetAccelerationStructureBuildSizesKHR
vkGetAccelerationStructureBuildSizesKHR
:VkAccelerationStructureBuildRangeInfoKHR
's primitiveCountVkAccelerationStructureBuildSizesInfoKHR
:We should usually call this function before Creating our AS becuase sizeInfo contains
sizeInfo.accelerationStructureSize
We should usually call this function before Building our AS becuase sizeInfo containssizeInfo.buildScratchSize
We should usually call this function before Updating our AS becuase sizeInfo containssizeInfo.updateScratchSize
After Querying the sizes using
vkGetAccelerationStructureBuildSizesKHR
we must:buildInfo.scratchData.deviceAddress = scratchAddress;
IMPORTANT NOTE @devshgraphicsprogramming: Do we want to expose a function for
vkGetBufferDeviceAddress
? Since most of the functions and struct related to this raytracing extension works with deviceAddresses and not buffers and AS's directly. I think we can also take our Nabla objects and callvkGetBufferDeviceAddress
behind the scenes.Host/Device Commands for Build, CopyAStoAS, CopyASToMemory, CopyMemoryToAS, WriteProperties
Note for Host Commands:
Build AS
All function parameters are explained above
vkCmdBuildAccelerationStructuresKHR
:vkCmdBuildAccelerationStructuresIndirectKHR
:pIndirectDeviceAddresses is a pointer to an array of infoCount buffer device addresses which point to pInfos[i].geometryCount
VkAccelerationStructureBuildRangeInfoKHR
structures defining dynamic offsets to the addresses where geometry data is stored, as defined by pInfos[i].Accesses to any element of pIndirectDeviceAddresses must be synchronized with the VK_PIPELINE_STAGE_ACCELERATION_STRUCTURE_BUILD_BIT_KHR pipeline stage and an access type of VK_ACCESS_INDIRECT_COMMAND_READ_BIT.
vkBuildAccelerationStructuresKHR
:Write Properties
vkCmdWriteAccelerationStructuresPropertiesKHR
:Note for write properties:
QueryAccelerationStructuresCompactionSizes(....)
There is no need for QueryPool for the respective Host Operation
vkWriteAccelerationStructuresPropertiesKHR
:Copy AS to AS
Example usage is when copying AS to CompactedAS
vkCmdCopyAccelerationStructureKHR
:vkCopyAccelerationStructureKHR
:VkCopyAccelerationStructureInfoKHR
:Important note for memory barriers
VkCopyAccelerationStructureModeKHR
:Copy AS To Memory
vkCmdCopyAccelerationStructureToMemoryKHR
:vkCopyAccelerationStructureToMemoryKHR
:VkCopyAccelerationStructureToMemoryInfoKHR
:Copy Memory To AS
vkCopyMemoryToAccelerationStructureKHR
:vkCmdCopyMemoryToAccelerationStructureKHR
:VkCopyMemoryToAccelerationStructureInfoKHR
:Comptability Check
To check if a serialized acceleration structure is compatible with the current device call. We need a function to use these functions and structs for the compatibility Check.
vkGetDeviceAccelerationStructureCompatibilityKHR
VkAccelerationStructureCompatibilityKHR
VkAccelerationStructureVersionInfoKHR
Creating AS
VkAccelerationStructureCreateInfoKHR
:deviceAddress in these function parameters is related to
accelerationStructureCaptureReplay
and this optional functionality is intended to be used by tools and not by applications directly.createInfo.buffer is a buffer allocated most likely with size of
sizeInfo.accelerationStructureSize
(SeeVkAccelerationStructureBuildSizesInfoKHR
above)Enums Used
VkAccelerationStructureTypeKHR
:VkDeviceOrHostAddressConstKHR
:Fill
hostAddress
when working with host side acceleration structure and fill indeviceAddress
otherwise. Exposing this is a matter of choice, function could also take different inputs that might not needDeviceOrHostAddressConst
(also has a non-const version) I suggest exposing it as an struct.VkBuildAccelerationStructureModeKHR
:We also could write different build/update AS functions.
VkBuildAccelerationStructureFlagBitsKHR
:VkAccelerationStructureBuildTypeKHR
:VkAccelerationStructureCreateFlagBitsKHR
:VkGeometryInstanceFlagBitsKHR
:Deferred Operations
(Fill if needed to expose)
RayTracing Pipeline
vkCreateRayTracingPipelinesKHR
:VkRayTracingPipelineCreateInfoKHR
:VkRayTracingShaderGroupCreateInfoKHR
:VkRayTracingShaderGroupTypeKHR
:Pipeline Library
Should we add and handle VK_KHR_pipeline_library extension?
Shader Binding Table
In order to build Buffer of Opaque ShaderGroupHandles (+ probable ShaderRecordData)
vkGetRayTracingShaderGroupHandlesKHR
:This is the only function needed (with no helper functions) to construct the ShaderBindingTable.
shaderGroupHandleSize
andshaderGroupBaseAlignment
will be taken into consideration when constructing the SBT Buffer and computing offset forvkCmdTraceRaysKHR
.Also we could have a wrapper/helper class for SBT that does all the computation and construction of SBT Buffers for each ShaderGroupType (ragen, miss, hit, callable). And helps with the invocation of
vkCmdTraceRaysKHR
Ray Tracing Pipeline Stack
In order to get/set Stack Sizes:
vkGetRayTracingShaderGroupStackSizeKHR
:vkCmdSetRayTracingPipelineStackSizeKHR
:VkShaderGroupShaderKHR
is just an enum :VkShaderGroupShaderKHR
:RayTracing Commands
vkCmdTraceRaysKHR
:VkStridedDeviceAddressRegionKHR
:Indirect Trace Rays
vkCmdTraceRaysIndirectKHR
:indirectDeviceAddress
points to.VkTraceRaysIndirectCommandKHR
structure containing the trace ray parameters.VkTraceRaysIndirectCommandKHR
: