KhronosGroup / Vulkan-Docs

The Vulkan API Specification and related tools
Other
2.77k stars 465 forks source link

Realtime Raytracing extensions #686

Closed ghost closed 4 years ago

ghost commented 6 years ago

Microsoft announced at GDC realtime raytracing for DirectX 12, Vulkan needs raytracing capabilities soon too for it to continue to stay relevant among developers.

AlonzoTG commented 6 years ago

Not quite. Vulkan is a lower level API. The issue is whether vulkan supports the primitives required to implement raytracing. Indeed, it is important that this open platform remain competitive, and maintain feature parity with the plantation platform.

krOoze commented 6 years ago

primitives required to implement raytracing

Well, that's not hard. Even OpenGL has them in forever.

I would say the decisive factor is whether any actual HW has accelerators specific only to raytracing. TMK, that is not so.

The DXR whitepaper seems to admit it (https://blogs.msdn.microsoft.com/directx/2018/03/19/announcing-microsoft-directx-raytracing/):

You may have noticed that DXR does not introduce a new GPU engine to go alongside DX12’s existing Graphics and Compute engines. This is intentional – DXR workloads can be run on either of DX12’s existing engines. The primary reason for this is that, fundamentally, DXR is a compute-like workload.

That being said, Vulkan already has programmable graphics and compute engine. No extensions needed.

ghost commented 6 years ago

Nvidia has dedicated tensor cores in the next generation volta cards specific for AI use, and they use them to accelerate realtime raytracing.

krOoze commented 6 years ago

https://en.wikipedia.org/wiki/Volta_(microarchitecture) :

A tensor core is a unit that multiplies two 4×4 FP16 matrices, and then adds a third FP16 or FP32 matrix to the result by using fused multiply–add operations

Coincidentally, GLSL already supports mat4 as well as matrix multiplication, even fma. You can accelerate whatever you want with those.

Kaisha commented 6 years ago

While you can do raytracing with a normal graphics shader or compute shader, its awkward in places and driver/hardware specific extensions can greatly improve performance.

I think it would make sense to add a VkRaytracePipeline with raytracing specific shaders (hit shader, miss shader, etc...), and queue support, coupled with a vkCmdRaytrace() function (as we have vkCmdDraw() for graphics pipeline and vkCmdDispatch for compute pipeline). Even if its not initially supported by any cards, having the spec/interface ready to go would be nice.

krOoze commented 6 years ago

Even if its not initially supported by any cards, having the spec/interface ready to go would be nice.

Google "design by committee", and look for pros and cons of that approach.

spoiler: Adding something by democratical process is a suboptimal thing to do. Features should be added based on real technical and empirical considerations and constraints. The usual way to do that is by having an actual HW (preferrably many vendors) and let it mature a bit before trying to abstract it in a normative API.

driver/hardware specific extensions can greatly improve performance.

You need to support that assertion with evidence. Assuming GPUs so far have no raytracing (and raytracing only) HW accelerators, where would that great improvements in performance come from?

oscarbg commented 6 years ago

Well you have 4 posts with 16 slides for Nv proposed Vk_nv_raytracing shown at GTC here: They proposed new API new functions,structs and GLSL additions https://twitter.com/oscarbg81/status/982081304345882624 https://twitter.com/oscarbg81/status/982081963669491718 https://twitter.com/oscarbg81/status/982082108918321153 https://twitter.com/oscarbg81/status/982082318386040834

krOoze commented 6 years ago

@oscarbg If I understand it correctly the AMD seems to answer with a library-style solution: https://gpuopen.com/gdc-2018-presentation-real-time-ray-tracing-techniques-integration-existing-renderers/

Which, if my assumptions are correct, seems to be the superior alternative. I.e. assuming there is not special HW for that case (and that case only), then there is no reason to believe one or the other solution would be more efficient. BUT a library solution would work on all GPUs, and also not creep features into a normative API. PS: library have also bit more flexibility, e.g. can experiment with different accelleration structures.

oscarbg commented 6 years ago

@krOoze similar to other question: well it depends.. DXR and Vulkan raytracing seem lowest level crossvendor APIs IHV want to agree support for.. (vulkan is in the works).. nice for mixed raster+raytrace on each API.. Radeon Rays seems good but being built over Vulkan/OpenCL compute might miss some optimization once HW gets dedicated raytracing support (aka transistors) for traversal/intersect/accel. structure build etc.. Might say NV doesn't say much about RTX tech and also still we don't have public DXR (396) drivers and radeon rays improvements shown at GDC at github site.. once all this is public would be nice to benchmark performance of every solution!

ratchetfreak commented 6 years ago

If hardware is going to get dedicated hardware support for raytracing which needs custom code to get leveraged thent he IHV will create an extension for it which other IHVs can adopt or adapt until there is a canonical EXT/KHR that may or may not be promoted down the road.

Doing clean room speculation about how that extension should look like is foolish and is what leads to most designed by committee warts.

Cazadorro commented 6 years ago

@krOoze You won't be able to use mat4x4's, those are 16bit floats (and as you know vulkan just doesn't support those right now) and also an addition the result is 32 bits (not 16bits) and a 4x4 matrix addition takes place that is also 32bits.

krOoze commented 6 years ago

@Cazadorro I sit corrected. Nevertheless, that is a case for adding half to Vulkan (which SPIR-V already seems to support), not for adding new pipeline types.

ArturoBlas commented 6 years ago

Here is the full GTC 2018 presentation on RTX integration:

http://on-demand.gputechconf.com/gtc/2018/presentation/s8521-advanced-graphics-extensions-for-vulkan.pdf

ahcox commented 6 years ago

There is also a video of the presentation.

leafi commented 6 years ago

@krOoze From the presentation above, it says NV has already proposed the extension to Khronos as a potential multivendor API for Vulkan. Did you refuse it because of the added pipeline types, or is the Vulkan WG planning to adopt it?

...Actually, I guess overall, there's not too much use speculating. Either @TomOlson (assigned to issue) & @Tobski et al are already aware and working on it, or Khronos will be late to the party.

It seems NV will be aggressively pushing ray-tracing to graphics developers for the next several years. Many partners lined up, Microsoft woke up and announced DXR very quickly as a competitive response to a potential NV VK / GL extension, that flashy Epic demo that went viral, etc.

It must be true that there will be a noticeable performance advantage to this extension w/HW support versus Nvidia OptiX or Radeon Rays. Otherwise they wouldn't do it.

krOoze commented 6 years ago

@leafi It is not my place to refuse anything. I am just a general public contributor, not a Khronos member. I think Khronos itself does not refuse any extensions unless they violate some formal rules or break something.

My issue with this is that it is not a low-level thing, and it has been done for years by other means. For me, the presentation lacks motivation to do this as layer0. Also lacks explanation why to use this specific accelleration structure until the end of time. Also I am not aware of HW units that would actually do this specifically -- it will probably just run on your usual general-purpose shader units. If there is something missing in Compute pipeline, that is the place that should be developed, rather than add artificially restricted new type of pipeline.

Cazadorro commented 6 years ago

@krOoze I tend to agree with you.

Nvidia is achieving this via first using normal rasterization, then using raytracing from then on. In normal raytracing, you have to oversample to avoid noise artifacts caused by rays not hitting the same spot, this means between 2 to 8 rays per pixel are used. Nvidia solved this using a real time recurrent neural network on Pascal/Maxwell architecture, and also handles antiailiasing on raytracing with neural networks.

Nvidia only managed to do this at 30fps 720p however with very highend cards at the time (and requires a lot of training)

Nvidia today has been showcasing real time raytracing with volta, which has "Tensor Cores" which are 4x4x4 16 bit matrix multiply 32 bit add units, 1 4x4 for every 2 cuda cores. This was implemented to help with neural networks.

Nvidia is using a neural network solution with special hardware to speed up raytracing. In addition they could be speeding up triangle intersection using the tensor cores, where because of the spatial segmentation structures that are necessary to make sure that you don't test every triangle, 16bit float precision would likely be enough to not have any position artifacts.

All of this is highly "nvidia" and nvidia is investing in neural-networks on their own cards because it helps in both graphics and scientific/AI computing where they are trying to diversify economically.

I wouldn't be surprised if in 1 to 2 years their set up is not the most efficient API for better methods of ray-tracing on future hardware/ algorithms

toomuchvoltage commented 6 years ago

So ummm... Turing was announced and RT-cores were unveiled.

Where are we with this? Any plans on adoption soon?

SuperSodaSea commented 6 years ago

@krOoze I don't agree with you. The graphics pipeline is also a "high-level" thing. Why don't we throw it away and just use the compute shader to finish everything? Because we need performance. Specified hardware can do far better than the general one on specified tasks. The raytrace pipeline is the same. If we use the existing graphics pipe line and/or the compute pipeline to "simulate" raytracing, the performance can be very low. Now that the GeForce RTX 20xx supports hardware raytracing, there should be more graphics cards that support hardware raytracing in the future, so it is necessary for Vulkan to add new raytrace pipeline.

pdaniell-nv commented 6 years ago

NVIDIA is working on a vendor extension to expose RTX to Vulkan, which will be released soon.

Here is an overview of what it will look like: http://on-demand.gputechconf.com/gtc/2018/presentation/s8521-advanced-graphics-extensions-for-vulkan.pdf

xGnoSiSx commented 6 years ago

Apologies on asking this here, but it is relevant: Are there any plans/efforts to add a raytracing extensions on OpenGL for HW RT support? Or is OpenGL in essence abandoned as far as new features are concerned?

pdaniell-nv commented 6 years ago

All the focus for raytracing extensions is on Vulkan at the moment. No comment on plans for OpenGL at this time.

ivalylo commented 6 years ago

I would probably not get an answer but...

xGnoSiSx commented 6 years ago

All the focus for raytracing extensions is on Vulkan at the moment. No comment on plans for OpenGL at this time.

I understand your answer.

cheako commented 6 years ago

My issues with VK_NV_raytacing are the binary?? black box nature of VkGeometryNV and the handling of code and data in the same Shader Binding Table that also currently looks like a binary black box. I also don't like how they just ignored SPIR-V.

I think it would be wise to wait for a few other vendors to implement hardware raytracing to see how the inputs for those areas might differ.

Also, in five to six years it may be that the special parts of these RTX cores will just become part of the regular compute cores.

Since VkGeometryNV is passed to vkCmdBuildAccelerationStructureNV, I feel it would be better to make that a vkCmdBuildAccelerationStructureBeginNV/End with the drawing of triangles and the initial creation of rays being the contents. As I've pointed out there is no way of knowing what that would look like cross-vender.

So add RTX to OpenGL, where the damage would be less noticeable, and keep Vulkan pristine PLEASE.

dgkoch commented 5 years ago

Thanks for expressing your interest in realtime raytracing extensions in Vulkan.

We have formed a Vulkan Ray Tracing subcommittee to explore how to expose ray tracing in Vulkan, and the topic is under active discussion. This is a high priority for some members, and we have received design contributions from several hardware vendors and are reviewing feature requirements from software developers as inputs to the standardization process. As always, our priorities for new technology in Vulkan are driven by input from the Vulkan developer community and the broader Vulkan ecosystem.

-Daniel (Vulkan Ray Tracing Chair)

revillo commented 5 years ago

I believe much of the headache of integrating raytracing into vulkan would be mitigated by adding support for recursion into glsl (which is supported already by the hardware), since raytracing is a fundamentally recursive process.

From working with the current NVidia extension for Vulkan, I believe it is largely inelegant for this reason, and that things should shift to a more "compute" oriented pipeline that simply exposes built in hardware accelerated raytracing functionality into the shaders.

The acceleration structure building part generally makes sense as one needs some means of adding triangle meshes into a scene (although instance data should shift to column major transforms to be more consistent with GLM/GLSL)

However the process of linking multitudes of shader groups with obscure stage names.. miss shaders, hit shaders, intersection shaders and "general" shaders - this is clearly not the direction we want to be heading in.

Zingam commented 5 years ago

Thanks for expressing your interest in realtime raytracing extensions in Vulkan.

We have formed a Vulkan Ray Tracing subcommittee to explore how to expose ray tracing in Vulkan, and the topic is under active discussion. This is a high priority for some members, and we have received design contributions from several hardware vendors and are reviewing feature requirements from software developers as inputs to the standardization process. As always, our priorities for new technology in Vulkan are driven by input from the Vulkan developer community and the broader Vulkan ecosystem.

-Daniel (Vulkan Ray Tracing Chair)

What is your (the committee's) stand on DirectML included in DirectX? Will (Are) Machine learning extensions also find (needed) their way in Vulkan?

TomOlson commented 5 years ago

@Zingam, in the interest of making the your question easier for people to find, I've opened #955 to discuss ML on Vulkan. Looking forward to hearing what the community has to say.

jaynus commented 5 years ago

Has the committee come to any definitive decision on this? It's been almost a year now since it was formed to discuss this. Or where can I find any discussions or notes concerning progress?

dgkoch commented 5 years ago

The last public update from the TSG was given at SIGGRAPH 2019. See https://www.khronos.org/assets/uploads/developers/library/2019-siggraph/Vulkan-01-Update-SIGGRAPH-Jul19.pdf Slides 27-30

krOoze commented 5 years ago

^ video: https://youtu.be/1fU4w2ZGxH4?t=1465

devshgraphicsprogramming commented 4 years ago

I believe much of the headache of integrating raytracing into vulkan would be mitigated by adding support for recursion into glsl (which is supported already by the hardware), since raytracing is a fundamentally recursive process.

From working with the current NVidia extension for Vulkan, I believe it is largely inelegant for this reason, and that things should shift to a more "compute" oriented pipeline that simply exposes built in hardware accelerated raytracing functionality into the shaders.

However the process of linking multitudes of shader groups with obscure stage names.. miss shaders, hit shaders, intersection shaders and "general" shaders - this is clearly not the direction we want to be heading in.

Pretty sure actually one could emulate NV_raytracing via plain old SM 6.0 compute to an extent, not the exact same API but functionally identical with descriptor indexing/bindless and a little bit of shuffling of external resource usage.

I mean DXR Fallback is basically a bunch of HLSL 6.0 compute thrown together.

Let me share my musings about improving Radeon Rays 3.0

 I wonder how much sense it would make to set up the kernel as "persistent threads" like the Book-Of-Clay, basically dedicate 2x 4 bytes as stack/ringbuffer counters and a buffer to keep rays to trace in a queue/stack using workgroup prefix sums (with subgroup intrinsics) + global atomics to push and pop from this queue/stack.

Also shuffle the rays periodically in LDS within a work-group to mitigate divergence (especially when branching to miss/closest hit shader, or even custom intersection vs triangle intersection)

But at this point you could just make a NV_raytracing to plain old compute shader compiler

As it stands, after every hit you need to basically make a second Kernel launch (RR 2.0) or indirect dispatch (in RR 3.0, thankfully you don't need to query the number of rays)

So for a max depth of 35 in a even a Whitted Ray Tracer, you'll end up having to make full 35 trips with ray payloads to VRAM.

Whereas you could offload some of the ray stack/queue to LDS, or at least count on it staying in L2

But synchronisation/serialization is the biggest cost here, because all the rays in the first trace that spawns the new rays need to finish before the second can be ran (a barrier between shader output SSBO and INDIRECT dispatch buffer in RR 3.0, or worse a full OpenCL event in RR 2.0)
This makes Russian Roulette very costly, because at the end of the dispatch you have starving threads and latency/idling between the end of the previous dispatch and the next
and very small bundles of rays to trace in a single iteration are expensive to trace because of that, because they have to run synchronously.

The acceleration structure building part generally makes sense as one needs some means of adding triangle meshes into a scene (although instance data should shift to column major transforms to be more consistent with GLM/GLSL)

GLM/GLSL ordering of matrix data is a mistake, especially for SSE/SIMD.

dgkoch commented 4 years ago

Today we released provisional Vulkan ray tracing extensions. Please see #1205 and give any feedback you may have via #1206