Open ghost opened 5 years ago
@procedural are you pointing to #890 because it addresses your initial question, or for another reason?
Yes, because it partially addresses my initial question.
Partially because the question was also about bindless image handlers and why they're not in Vulkan yet, given OpenGL has them for 6 years now.
Official response from the Vulkan Working Group follows.
The VK_EXT_descriptor_indexing and VK_EXT_buffer_device_adddress extensions cover many of the capabilities of comparable OpenGL functionality. Khronos is actively evaluating both extensions for promotion to KHR status and/or inclusion in a future core version of Vulkan, although we cannot confirm a timeline for this.
We recognize that VK_EXT_descriptor_indexing does not fully address all GL_ARB_bindless_texture capabilities, such as querying a texture handle from the driver and putting in anywhere in memory. Currently it is not a high priority to support this specific style of bindless image functionality for Vulkan.
If you have specific additional feature requests and/or use cases which could feed into our evaluation of these EXTs, that would be helpful feedback.
@oddhack That's disappointing news ...
VK_EXT_descriptor_indexing is not an ideal binding model. We don't want to deal with binding everything to a descriptor set since it adds additional overhead. We want GPU addressable resources backed in memory. Here are the additional feature requests for the working group which is making everything in Vulkan bindless!
Bindless UBOs, bindless vertex buffers, and most importantly bindless textures as well. Also if and when the working group is creating the extension they should add an optional capability to do divergent dynamic indexing because the ARB bindless texturing variant had that annoying limitation ...
Currently it is not a high priority to support this specific style of bindless image functionality for Vulkan.
Vulkan's motto is "industry forged", will see what the industry has to say about this. :)
We don't want to deal with binding everything to a descriptor set since it adds additional overhead.
What is the perceived overhead? You only ever need to bind it once. Is the perceived overhead the cost of binding it once, or of accessing the resource through a descriptor set?
AFAIK all implementations of ARB/NV_bindless_texture used a table of descriptors behind the scenes, and were ultimately not much different than going through a descriptor set. So I don't see what the perceived benefit of the ARB extension is. On the other hand, one of the benefits of the descriptor_indexing model is that the app has control over placement of related textures in the descriptor set, so that a single index can be used to select multiple textures. This was a big feature request for portability from D3D12, where applications have that similar control.
an optional capability to do divergent dynamic indexing because the ARB bindless texturing variant had that annoying limitation
That's already supported in VK_EXT_descriptor_indexing (see also https://github.com/KhronosGroup/GLSL/blob/master/extensions/ext/GL_EXT_nonuniform_qualifier.txt).
@Degerz
Bindless UBOs, bindless vertex buffers, and most importantly bindless textures as well.
I find this list partially perplexing, given your "perceived overhead" issue. I at least theoretically understand the bindless texture concerns, though jeffbolznv made it clear that most implementations of bindless are just using tables behind the scenes, so there's no overhead lost by you maintaining the table yourself.
My issue is understanding why bindless vertex/uniform buffers matter. I understand why they matter in OpenGL; it's because of aspects of the OpenGL API.
When you attach a buffer to a VAO or bind it to the context in a uniform buffer slot, and then try to render, the implementation has to do a number of things, only some of which is required by the hardware. It has to convert the buffer object name into an actual buffer object pointer. It has to do validation work; for example, it has to make sure that the buffer + offset does not exceed the size of the buffer. Any validation that fails results in a GL error. Also, the implementation needs to check to see if the buffer object's storage has been or needs to be moved around. Lastly, it converts the buffer to a GPU address, which is then sent to the GPU to do its work.
But in Vulkan, almost none of that happens. A VkBuffer
is a pointer directly to the implementation's data structure, so no conversion from an integer to a pointer needs to happen. Vulkan doesn't do validation, so none of that happens. Buffer storage will never be moved around, ever. And so forth. The only thing that needs to happen is getting the GPU address out of the VkBuffer
, which should be nothing more than a memory read. Yes, this read may well be an uncached read, so in a hot loop, an extra uncached read can be a problem. But in the grand scheme of things, I think that's going to be a pretty minor problem.
Now, I could understand your desires if your goal was to send a GPU address to the shader, allow it to do some arbitrary pointer arithmetic, then cast it into an object and read from that memory. But neither NV_vertex_buffer_unified_memory nor NV_uniform_buffer_unified_memory allow anything even remotely like that. They are solely about how you provide vertex and uniform buffers to the implementation; the shader interface is not affected by them at all.
So, could you clarify why you are interested in these two features in particular?
So, could you clarify why you are interested in these two features in particular?
It's exactly as you stated before ...
Yes, this read may well be an uncached read, so in a hot loop, an extra uncached read can be a problem.
Letting a cache miss happening in the driver is undesirable and if we want true AZDO then we must embrace bindless uniform/vertex buffers as a part of that goal as well hence my fervent desire to bindless all the things!
Even though GL bindless uniform/vertex buffers don't allow for arbitrary pointer arithmetic, there's still a utility for it as has been said. Would be great if a potential Vulkan version of those extensions could allow us to do arbitrary pointer arithmetic on bindless uniform/vertex buffers.
The only thing that needs to happen is getting the GPU address out of the VkBuffer, which should be nothing more than a memory read. Yes, this read may well be an uncached read, so in a hot loop, an extra uncached read can be a problem. But in the grand scheme of things, I think that's going to be a pretty minor problem.
This issue is not about performance, it's about usability: all GPU resources should be accessed by pointers. Period. Programmers should be able to fill some device memory with pointers to various types of resources and access them in GPU programs in arbitrary, programmable ways. Just like on CPU. Anything other than that ("descriptor" "tables", "root" "signatures", "magic" "dust", etc.) is an arbitrarily named piece of state in some memory GPU vendors hide from ISVs to either not to talk about the fact that they can't access memory directly due to various secret reasons (performance, security, hardware architecture, etc.), or that they can but other IHVs can't.
It's probably silly to talk about hardware in an issue for an API that tries very hard to be as generic as possible and play nice with all hardware it can possibly target, including ovens, probably, but:
If some IHVs want to end this endless binding stupidity and expose direct access to resources as Vulkan and SPIR-V extensions, they are very welcomed to do so. If they are free to do so, without asking a permission from The Khronos Group, of course.
If some IHVs want to end this endless binding stupidity and expose direct access to resources as Vulkan and SPIR-V extensions, they are very welcomed to do so. If they are free to do so, without asking a permission from The Khronos Group, of course.
No vendor needs permission from Khronos to release a vendor extension. Full stop.
No vendor needs permission from Khronos to release a vendor extension. Full stop.
I hope so!
As this request has come back up again via #1474, as a small status update, we are currently looking at improving functionality in this area, although can't comment on timeframe or details of how that will look at this time. We'd again welcome any specific requests or use cases for this type of functionality that could help us shape these extensions.
IMO the best thing Vulkan can offer in this regard is "dynamic resource binding" coming to SM6.6 soon (see https://devblogs.microsoft.com/directx/in-the-works-hlsl-shader-model-6-6/). As a graphics engineer, time spent managing descriptor tables, binding sets to slots, and updating entries in general is time lost that could be spent being productive. Especially with more work being pushed into compute, on top of raytracing, the need for dynamic resource access is basically a must. This feature coming in for DX12 really hit the nail on the head. Bind the heap to the resources, then access them. You're on your own with regard to read and write hazards, but that was always the case.
There's nothing stopping you from doing SM 6.6 style resource access today, you can alias descriptor arrays just fine.
layout(set = 0, binding = 0) uniform texture2D Tex2D[];
layout(set = 0, binding = 0) uniform texture2DArray Tex2DArray[];
layout(set = 0, binding = 0) uniform texture3D Tex3D[];
etc. HLSL never allowed that directly, but you could hack around it using multiple register spaces and a large root signature that aliased multiple spaces over the same range.
Huh, I had no idea this was possible with GLSL. My immediate thought is, is there any way to go from HLSL with SM6.6 style heaps to the snippet you posted in DXC?
For HLSL 6.6 and DirectX 12...
<resource variable> = ResourceDescriptorHeap[uint index];
<sampler variable> = SamplerDescriptorHeap[uint index];
Texture2D<float4> myTexture = ResourceDescriptorHeap[texIdx];
float4 result = myTexture.Sample(SamplerDescriptorHeap[sampIdx], coord);
RWByteAddressBuffer buf = ResourceDescriptorHeap[NonUniformResourceIndex(index)];
SamplerState samp = SamplerDescriptorHeap[NonUniformResourceIndex(index)];
But for GLSL should to be…
// but needs much smoke with SSBO or UBO (inline uniform limited to 256 bytes)
layout (binding = 0, set = 0, scalar) buffer ResourceDescriptorHeap{ uint64_t heap[]; } resources;
layout (binding = 1, set = 0, scalar) uniform sampler SamplerDescriptorHeap[];
layout (buffer_reference, scalar, buffer_reference_align = 1) buffer byteBuffer { uint8_t data[]; };
void main(){
sampler2D texture = sampler2D(resources.heap[texIdx]); // needs
byteBuffer buf = byteBuffer(resources.heap[bufferIdx]);
}
The main issue that I can see honestly is that we still need pipeline layouts and a "root signature" mapping for bindings even with the uniform aliasing as written by @HansKristian-Work . The primary thing SM6.6 is giving me at the moment is the ability to just index into heaps without any need to manage root signatures or swap them.
I'm not talking about "bind everything", I'm talking about pointers for buffers and handles for images we had in OpenGL for past 10 years?
Same question: https://twitter.com/Icare3D/status/1060207523041550336