The VkAccelerationStructureInfo structure is serialised followed by the unified geometry readback buffer. On replay the upload buffer is populated from the capture and then copied to GPU local mem, at which point the upload mem is freed as replay cannot modify the AS input data.
When the initial state Apply() is called the ASes are built from the input data one at a time so that a single scratch buffer can be built (enlarging when needed) and re-used. Although this is slower it is necessary on Mali as it has poor space efficiency for the scratch buffer and so can easily OOM a device if all the ASes are built in a single command buffer submission.
Tested on:
Android Mali G715 r49p1 with various Arm tech demos (some based on UE5) and Vulkan Samples ray_queries
Ubuntu 24.04 NV 3090FE 535.183.06, same above but ported to x86_64
I've rebased to pick up the mutable VkInitialContents change you made, and added the single input data mem block and build-on-apply-once change as separate commits.
The
VkAccelerationStructureInfo
structure is serialised followed by the unified geometry readback buffer. On replay the upload buffer is populated from the capture and then copied to GPU local mem, at which point the upload mem is freed as replay cannot modify the AS input data.When the initial state Apply() is called the ASes are built from the input data one at a time so that a single scratch buffer can be built (enlarging when needed) and re-used. Although this is slower it is necessary on Mali as it has poor space efficiency for the scratch buffer and so can easily OOM a device if all the ASes are built in a single command buffer submission.
Tested on: