KhronosGroup / MoltenVK

MoltenVK is a Vulkan Portability implementation. It layers a subset of the high-performance, industry-standard Vulkan graphics and compute API over Apple's Metal graphics framework, enabling Vulkan applications to run on macOS, iOS and tvOS.
Apache License 2.0
4.81k stars 425 forks source link

Implementing Vulkan Acceleration Structures #1956

Open AntarticCoder opened 1 year ago

AntarticCoder commented 1 year ago

At the moment, MoltenVK does not support raytracing(#427), and to support VK_KHR_ray_tracing_pipeline and VK_KHR_ray_query, we need to implement acceleration structures. PR #1954 (issue #1953) implemented VK_KHR_deferred_host_operations that finishes off the dependencies for VK_KHR_acceleration_structures. The only thing left to do is to actually implement it. This issue will provide a place to discuss the design decisions for acceleration structures.

I'm also planning on trying to implement this myself.

natevm commented 1 year ago

If it's any help, consolidating some information

In vulkan I have examples of AABB accels here, Triangular geometry accels here, and Instance accels here

These code snippets include building and rebuilding trees, refitting and compaction for each of the types, include things like alignment for the scratch space of the acceleration structure build, etc...

I'm missing some features like storing and loading acceleration structures from memory, but that could be added if could benefit from more reference material.

It appears that these types roughly translate to MTLAccelerationStructureBoundingBoxGeometryDescriptor, MTLAccelerationStructureTriangleGeometryDescriptor, and MTLInstanceAccelerationStructureDescriptor

https://developer.apple.com/documentation/metal/mtlaccelerationstructure

natevm commented 1 year ago

One place that might be a good starting point is populating the structures of data for acceleration structure features and properties. In my code I do that here

In VkPhysicalDeviceAccelerationStructureFeaturesKHR, we could probably just return true for the "accelerationStructure" field, and false for all the other fields.

In VkPhysicalDeviceAccelerationStructurePropertiesKHR, we'd need to somehow figure out various limits imposed by metal ray tracing (max geometry count, instance and primitive count, minimum accel scratch offset alignment, etc)... I'm not sure how these are queried in Metal RT tbh

AntarticCoder commented 1 year ago

Looking at VkAccelerationStructureGeometryDataKHR and MTLAccelerationStructureTriangleGeometryDescriptor seem to be almost identical with a few minor differences.

One I noticed was the in the index type for the geometry descriptor, Vulkan allows you to simply pass in no indices along with the standard uint16 and uint32, however Metal does not seem to have an option of none within their index type struct.

Vulkan Index Type

Metal Index Type

AntarticCoder commented 1 year ago

@natevm I'm not sure if I'm looking in the wrong place however this link to the metal documentation seems to tell us the max count for some of these properties in standard and extended mode, iiuc.

https://developer.apple.com/documentation/metal/mtlaccelerationstructureusage/3750490-extendedlimits

natevm commented 1 year ago

@natevm I'm not sure if I'm looking in the wrong place however this link to the metal documentation seems to tell us the max count for some of these properties in standard and extended mode, iiuc.

https://developer.apple.com/documentation/metal/mtlaccelerationstructureusage/3750490-extendedlimits

Nice find. Yep, those seem like what I had in mind.

So, we know the following,

// Provided by VK_KHR_acceleration_structure
typedef struct VkPhysicalDeviceAccelerationStructureFeaturesKHR {
    VkStructureType    sType;
    void*              pNext;
    VkBool32           accelerationStructure; // true
    VkBool32           accelerationStructureCaptureReplay; // false (for now)
    VkBool32           accelerationStructureIndirectBuild; // false (for now)
    VkBool32           accelerationStructureHostCommands; // false (for now)
    VkBool32           descriptorBindingAccelerationStructureUpdateAfterBind; // false (for now)
} VkPhysicalDeviceAccelerationStructureFeaturesKHR;

// Provided by VK_KHR_acceleration_structure
typedef struct VkPhysicalDeviceAccelerationStructurePropertiesKHR {
    VkStructureType    sType;
    void*              pNext;
    uint64_t           maxGeometryCount; // "Geometries in primitive acceleration structure, (2^24 / 2^30)
    uint64_t           maxInstanceCount; // "Instances in instance acceleration structure", (2^24 / 2^30)
    uint64_t           maxPrimitiveCount; // "Primitives in primitive acceleration structure", (2^28 / 2^30)
    uint32_t           maxPerStageDescriptorAccelerationStructures; // ???
    uint32_t           maxPerStageDescriptorUpdateAfterBindAccelerationStructures; // ???
    uint32_t           maxDescriptorSetAccelerationStructures; // ???
    uint32_t           maxDescriptorSetUpdateAfterBindAccelerationStructures; // ???
    uint32_t           minAccelerationStructureScratchOffsetAlignment; // ???
} VkPhysicalDeviceAccelerationStructurePropertiesKHR;

Here there is a mention of an alignment derived from "the platform's buffer offset alignment". What I don't entirely know is how metal handles the idea of "scratch" memory for acceleration structure builds.

rcaridade145 commented 1 year ago

@natevm https://developer.apple.com/documentation/metal/mtlaccelerationstructuresizes/3553967-accelerationstructuresize and https://developer.apple.com/videos/play/wwdc2023/10128/?time=564 are of interest to you?

AntarticCoder commented 1 year ago

@rcaridade145 Thanks, I think MTLAccelerationStructureSizes.accelerationStructureSize could be used for the vkGetAccelerationStructureBuildSizesKHR function which provides the expected acceleration structure size.

natevm commented 1 year ago

@natevm https://developer.apple.com/documentation/metal/mtlaccelerationstructuresizes/3553967-accelerationstructuresize and https://developer.apple.com/videos/play/wwdc2023/10128/?time=564 are of interest to you?

ah yeah, the "buildScratchBufferSize" in that first link was one of the things I was wondering about. Still not sure what the "minAccelerationStructureScratchOffsetAlignment" should be for that buffer, @rcaridade145 do you know what minimum offset alignment rules there might be?

Try commented 1 year ago

Just a small note about Metal BLAS: Documentation about MTL::AccelerationStructureTriangleGeometryDescriptor::setIndexBufferOffset says:

Specify an offset that is a multiple of the index data type size and a multiple of the platform’s buffer offset alignment.

In feature table https://developer.apple.com/metal/Metal-Feature-Set-Tables.pdf, buffer offset alignment ranges from 4 bytes to 32 bytes (Mac2). In Vulkan primitiveOffset must be multiple of component size.

natevm commented 1 year ago

Just a small note about Metal BLAS: Documentation about MTL::AccelerationStructureTriangleGeometryDescriptor::setIndexBufferOffset says:

Specify an offset that is a multiple of the index data type size and a multiple of the platform’s buffer offset alignment.

In feature table https://developer.apple.com/metal/Metal-Feature-Set-Tables.pdf, buffer offset alignment ranges from 4 bytes to 32 bytes (Mac2). In Vulkan primitiveOffset must be multiple of component size.

do you know if there is a pragmatic way to query this buffer offset alignment?

Try commented 1 year ago

do you know if there is a pragmatic way to query this buffer offset alignment?

Oh, I wish to know, but doesn't seem to be any

rcaridade145 commented 1 year ago

@natevm https://developer.apple.com/documentation/metal/mtlaccelerationstructuresizes/3553967-accelerationstructuresize and https://developer.apple.com/videos/play/wwdc2023/10128/?time=564 are of interest to you?

ah yeah, the "buildScratchBufferSize" in that first link was one of the things I was wondering about. Still not sure what the "minAccelerationStructureScratchOffsetAlignment" should be for that buffer, @rcaridade145 do you know what minimum offset alignment rules there might be?

Not really. All the info i could find was

https://github.com/MetalKit/metal/blob/master/raytracing/Renderer.swift

It seems to use alignedUniformsSize .

https://gist.github.com/ctreffs/1cf72cd0d5e23d77fe55a011ea01a153

AntarticCoder commented 1 year ago

Is it possible to get a scratch buffer from it's device address? Looking at the Metal API documentation, there's basically nothing on device addresses, except for a single property on the MTLBuffer. I know NVIDIA used to pass in a VkBuffer directly but now we have to use device addresses.

rcaridade145 commented 1 year ago

Will this help @AntarticCoder https://developer.apple.com/documentation/metal/mtlbuffer/1515716-contents ?

AntarticCoder commented 1 year ago

I believe I saw this during my research, but I probably didn't read the docs properly. I'll try it out later. Thanks @rcaridade145

rcaridade145 commented 1 year ago

The problem here is that afaik the scratch buffer is handled by Metal itself so perhaps you cannot use the contents function only with a custom buffer?

K0bin commented 1 year ago

@AntarticCoder @rcaridade145 The contents function will just give you a CPU pointer to the data of a shared buffer. That's not useful here unless you want to copy all the data around on the CPU every time. (which would also involve a GPU sync)

What you have to do is basically maintain a map that maps BDA VAs to their original buffer objects. Keep in mind that this VA map has to be extremely fast and should minimize locking as much as possible. An example for that can be found in vkd3d-Proton: https://github.com/HansKristian-Work/vkd3d-proton/blob/master/libs/vkd3d/va_map.c

AntarticCoder commented 1 year ago

@K0bin This looks quite interesting, I'll see if i can get an efficient map working later.

billhollings commented 1 year ago

do you know if there is a pragmatic way to query this buffer offset alignment?

Check MVKPhysicalDeviceMetalFeatures::mtlBufferAlignment.

natevm commented 1 year ago

With iPhone 15 now having native hardware ray tracing support, I am guessing M3 is soon to follow suit. @AntarticCoder what's the status on this PR? Any blocking issues we should know about?

AntarticCoder commented 1 year ago

@natevm The only real blocking issue is how accelerations are handled in gpu memory because we have copy commands and noncommand copies. The solution seems to MTLHeaps accoring to a commenter on the PR. As for the status, I've been a bit busy with personal matters, but I've definitely wanted to get back into this. I could probably continue working next week. Thanks

natevm commented 1 year ago

@AntarticCoder totally understand. I’ll check out the MTLHeaps proposal on the PR.

I don’t suppose you have a discord where we could stay in touch, do you? Over there my username’s @natemorrical. We have a little Vulkan raytracing research group there that acts a bit like a slack space. If not, no worries, but figured I’d ask just in case :)

AntarticCoder commented 1 year ago

@natevm I just send a friend request. My username is Noble 6 the Penguin. 😀