NVIDIA / OptiX_Apps

Advanced Samples for the NVIDIA OptiX 7 Ray Tracing SDK
Other
275 stars 48 forks source link

OptiX Applications

Advanced Samples for the NVIDIA OptiX 7 Ray Tracing SDK

The goal of the three initial introduction examples is to show how to port an existing OptiX application based on the previous OptiX 5 or 6 API to OptiX 7.

For that, two of the existing OptiX Introduction Samples have been ported to the OptiX 7 SDK.

intro_runtime and intro_driver are ports from optixIntro_07, and intro_denoiser is a port of the optixIntro_10 example showing the built-in AI denoiser. Those are already demonstrating some advanced methods to architect renderers using OptiX 7 on the way.

If you need a basic introduction into OptiX 7 programming, please refer to the OptiX 7 SIGGRAPH course material first and maybe read through the OptiX developer forum as well for many topics about OptiX 7.

The landing page for online NVIDIA ray tracing programming guides and API reference documentation can be found here: NVIDIA ray tracing documentation. This contains more up-to-date information compared to documents shipping with the SDKs and is easy to search including cross-reference links.

Please always read the OptiX SDK Release Notes before setting up a development environment.

Overview

OptiX 7 applications are written using the CUDA programming APIs. There are two to choose from: The CUDA Runtime API and the CUDA Driver API.

The CUDA Runtime API is a little more high-level and usually requires a library to be shipped with the application if not linked statically, while the CUDA Driver API is more explicit and always ships with the NVIDIA display drivers. The documentation inside the CUDA API headers cross-reference the respective function names of each other API.

Introductory Examples

To demonstrate the CUDA host API differences, intro_runtime and intro_driver are both a port of OptiX Introduction sample #7 just using the CUDA Runtime API resp. CUDA Driver API for easy comparison.

intro_runtime with constant environment light

intro_driver with a null environment and parallelogram area light

intro_denoiser is a port from OptiX Introduction sample #10 to OptiX 7. That example is the same as intro_driver with additional code demonstrating the built-in denoiser functionality with HDR denoising on beauty and optional albedo and normal buffers, all in float4 and half4 format (compile time options in config.h).

intro_denoiser with spherical environment light

intro_motion_blur demonstrates how to implement motion blur with linear matrix transforms, scale-rotate-translate (SRT) motion transforms, and optional camera motion blur in an animation timeline where frame number, frames per seconds, object velocity and angular velocity of the rotating object can be changed interactively. It's also based on intro_driver which makes it easy to see the code differences adding the transform and camera motion blur. intro_motion_blur will only be built when the OptiX SDK 7.2.0 or newer is found, because that version removed the OptixBuildInputInstanceArray aabbs and numAabbs fields which makes adding motion blur a lot simpler.

intro_motion_blur

All four intro examples implement the exact same rendering with their scene data generated at runtime and make use of a single device (ordinal 0) only. (If you have multiple NVIDIA devices installed you can switch between them, by using the CUDA_VISIBLE_DEVICES environment variable.)

Advanced Examples

Multi-GPU Rendering

rtigo3 is meant as a testbed for multi-GPU rendering distribution and OpenGL interoperability. There are different multi-GPU strategies implemented (single GPU, dual GPU peer-to-peer, multi-GPU pinned memory, multi-GPU local distribution and compositing). Then there are three different OpenGL interop modes (none, render to pixel buffer object, copy to mapped texture array).

The implementation is using the CUDA Driver API on purpose because that allows more fine grained control over CUDA contexts and devices and alleviates the need to ship a CUDA runtime library when not using the static version.

This example contains the same runtime generated geometry as the introduction examples, but also implements a simple file loader using ASSIMP for triangle mesh data. The application operation and scene setup is controlled by two simple text files which also allows generating any scene setup complexity for tests. It's not rendering infinitely as the introduction examples but uses a selectable number of camera samples, as well as render resolutions independent of the windows client area.

rtigo3 with all built-in geometries

rtigo3 with some Cornell Box scene

rtigo3 with instanced OBJ model

rtigo3 with Buggy.gltf model

Multi-GPU Data Sharing

nvlink_shared demonstrates peer-to-peer sharing of texture data and/or geometry acceleration structures among GPU devices in an NVLINK island. Peer-to-peer device resource sharing can effectively double the scene size loaded onto a dual-GPU NVLINK setup. Texture sharing comes at a moderate performance cost while geometry acceleration structure and vertex attribute sharing can be considerably slower and depends on the use case, but it's reasonably fast given the bandwidth difference between NVLINK and VRAM transfers. Still a lot better than not being able to load a scene at all on a single board.

To determine the system's NVLINK topology it uses the NVIDIA Management Library NVML which is loaded dynamically. Headers for that library are included inside the CUDA Toolkits and the library ships with the display drivers. The implementation is prepared to fetch all NVML entry points, but currently only needs six functions for the required NVLINK queries and GPU device searches. Note that peer-to-peer access under Windows requires Windows 10 64-bit and SLI enabled inside the NVIDIA Display Control Panel. Under Linux it should work out of the box.

This example is derived from rtigo3 but uses only one rendering strategy ("local-copy") and while it also runs on single GPU systems, the CUDA peer-to-peer sharing functionality will obviously only run on multi-GPU NVLINK systems. The Raytracer class got more smarts over the Device class because the resource distribution decisions need to happen above the devices. The scene description format has been slightly changed to allow different albedo and/or cutout opacity textures per material reference. Still, it's a slightly newer application architecture compared to rtigo3 when you're planning to derive own applications from these examples.

nvlink_shared with 5x5x5 spheres, each over 1M triangles

Material Systems and Lights

rtigo9 is similar to nvlink_shared but optimized for single-GPU as well to not do the compositing step unless multiple GPUs are used. The main difference is that it shows how to implement more light types. It's supporting the following light types:

To be able to define scenes with these different light types, this example's scene description file format has been enhanced. The camera settings as well as the tone mapper settings defined inside the system description file now can be overridden inside the scene description. The previous hardcoded light definitions inside the system description file have been removed and the scene description has been changed to allow light material definitions and creation of specific light types with these emissive materials, resp. assigning them to arbitrary triangle meshes. Please read the system_rtigo9_demo.txt and scene_rtigo9_demo.txt files which explain the creation of all supported light types inside a single scene.

Also, the previous compile time switch inside the config.h file to enable or disable direct lighting ("next event estimation") has been converted to a runtime switch which can be toggled inside the GUI. Note that all singular light types do not work without direct lighting enabled because they do not exist as geometry inside the scene and cannot be hit implicitly. (The probability for that is zero. Such lights do not exist in the physical world.)

Additionally to CUDA peer-to-peer data sharing via NVLINK, the rtigo9 example also allows that via PCI-E, but this is absolutely not recommended for geometry for performance reasons. Please read the explanation of the peerToPeer option inside the system description.

rtigo9 light types demo

Light types shown in the image above: The grey background is from a constant environment light. Then from left to right: point light, point light with projection texture, spot light with cone angle and falloff, spot light with projection texture, IES light, IES light with projection texture, rectangle area light, rectangle area light with importance sampled emission texture, arbitrary mesh light (cow), arbitrary mesh light with emission texture.

Opacity Micro-Maps

rtigo9_omm is exactly the same as rtigo9, just using the new Opacity Micromap (OMM) feature added in OptiX SDK 7.6.0. It uses the OptiX Toolkit CUDA based OMM Baking Tool to generate OMMs from the RGBA cutout textures. The OptiX Toolkit also requires OptiX SDK 7.6.0 at this time (2023-03-30).

With OMMs, the sharing of geometry acceleration structures (GAS) among different materials is restricted for materials with cutout opacity because the OMM is part of the GAS. The cutout opacity value calculation has been changed from using the RGB intensity to the alpha channel because that is what the OMM Baking tool defaults to when using RGBA textures. Another difference is that the shadow/visibility ray implementation can use a faster algorithm with OPTIX_RAY_FLAG_TERMINATE_ON_FIRST_HIT because fully transparent and fully opaque microtriangles of geometry with cutout opacity textures do not call into the anyhit program anymore. That also means there is no anyhit shadow program for geometry without cutout opacity required anymore.

rtigo9_omm opacity micromap demo

Performance and Shader Binding Tables

rtigo10 is meant to show how to architect a renderer for maximum performance with the fastest possible shadow/visibility ray type implementation and the smallest possible shader binding table layout.

It's based on rtigo9 and supports the same system and scene description file format but removed support for cutout opacity and surface materials on emissive area light geometry (arbitrary mesh lights.) The renderer architecture implements all materials as individual closesthit programs instead of a single closesthit program and direct callable programs per material as in all previous examples above. Lens shaders and the explicit light sampling is still done with direct callable programs per light type for optimal code size.

To reduce the shader binding table size, where the previous examples used a hit record entry per instance with additional data for the geometry vertex attribute data and index data defining the mesh topology plus material and light IDs, the shader binding table in rtigo10 holds only one hit record per material shader which is selected via the instance sbtOffset field. All other data is indexed with via the user defined instance ID field.

On top of that, by not supporting cutout opacity there is no need for anyhit programs in the whole pipeline. The shadow/visibility test ray type is implemented with just a miss shader, which also means there is no need to store hit records for the shadow ray type inside the shader binding table at all.

rtigo12 is based on rtigo10 but changed the integrator to handle the throughput, pdf, and lights like the MDL_renderer.

The GGX-Smith BXDF implementation has been replaced with excerpts from the MDL SDK libbsdf to support direct lighting of glossy transparent materials as well. That means singular light types will now show proper reflections on glossy transparent objects and even caustics (when the roughness is not too smooth) because hitting backfaces will be directly lit from lights on the transmission side which adds radiance. While changing that, support for Specular and GGX-Smith BTDF materials has been added.

Also homogeneous volume scattering is implemented in this example via a random walk through volumes with scattering coefficients the same way as inside the MDL_renderer. (See scenertigo12*.txt files inside the data folder for example scenes.)

Note that mesh and rect lights are now defined with radiant exitance instead of radiant intensity, so with the diffuse EDF these are 1/PI darker than in rtigo10 but match the MDL_renderer.

rtigo12 BXDF demo

rtigo12 volume scattering bias

Photo-Realistic Rendering

MDL_renderer is based on rtigo9 but replaced the previously few simple hardcoded BSDFs with NVIDIA Material Definition Language (MDL) shaders.

If you're not familiar with the NVIDIA Material Definition Language, please find the MDL Introduction, Handbook, and Language Specifications on the NVIDIA Ray Tracing Documentation site.

The example is based on functionality shown inside the MDL SDK examples optix7 and df_cuda. The renderer architecture stayed similar, just that all material-related direct callable programs are now generated by the MDL SDK at runtime. Meaning this renderer requires the MDL SDK to compile. There is either the open-source MDL SDK, used while developing this example, or the binary MDL SDK release.

The device code details changed quite a bit though because all shading data structures needed to match to what the MDL-generated code expects and it is actually more elegant in some areas than the previous examples, especially for the path throughput and pdf handling.

The scene description syntax has been adjusted to allow material references selecting an exported MDL material from a given MDL file. The definition of the hardcoded lights has been changed from taking a material reference to using the scene description emission parameters directly. Arbitrary mesh lights are generated automatically for all geometry instances which have an emissive MDL material assigned.

The system description options added a searchPath option which allows to add arbitrary many paths where MDL files and their resources should be searched for. The system and user path for the MDL vMaterials set via the environment variables MDL_SYSTEM_PATH and MDL_USER_PATH set by the MDL vMaterials installer are automatically added by the application.

Peer-to-peer sharing of MDL texture array resources, measured BSDFs and their CDFs, IES light profiles and their CDFs is supported. The system description option peerToPeer has two new bits (4 and 5) controlling sharing of the MBSDF resp. light profile data among GPU devices in a CUDA peer-to-peer island. If the peerToPeer value is not set, the default is to only share textures because that comes at almost no cost via NVLINK.

Please read the system_mdl_vMaterials.txt and scene_mdl_vMaterials.txt inside the data folder for more information on additional system and scene options.

The renderer implementation has the following limitations at this time:

Everything else inside the MDL specifications should just work!

MDL_renderer with MDL materials MDL_renderer with vMaterials

The MDL_renderer has now been updated to also support cubic B-spline curve primitives and the MDL Hair BSDF.

Because that requires two texture coordinates to be fully featured, the NUM_TEXTURE_SPACES define has been added to the config.h to allow switching between one and two texture coordinates. If you do not need the hair BSDF, you can set NUM_TEXTURE_SPACES to 1 for a little more performance.

The MDL hair BSDF supports a fully parameterized fiber surface accessible via the state::texture_coordinate(0) providing (uFiber, vFiber, thickness) values, which allows implementing parameter changes along the whole fiber and even around it. The provided mdl/bsdf_hair_uv.mdl material shows this by placing tiny arrows on the fibers pointing from root to tip.

Additionally the second texture coordinate state::texture_coordinate(1) defines a fixed texture coordinate per fiber, which allows coloring of individual fibers depending on some texture value. The image below used a Perlin noise function to produce highlights in the hair, resp. a 2D texture to color the fibers of the fur.hair model (included).

The renderer currently loads only *.hair models which do not have texture coordinates. The example auto-generates a 2D coordinate with a cubemap projection from the root points' center coordinate. There are better ways to do this when actually growing hair from surfaces, not done in this example. Transparency and color values of *.hair files are ignored. The assigned MDL hair material defines these properties.

MDL_renderer with hair rendering MDL_renderer with fur rendering

Simple and Fast Physically Based Rendering

GLTF_renderer shows how to implement a simple and fast Physically Based Rendering (PBR) material model inside a progressive Monte Carlo global illumination renderer. It implements much of the glTF 2.0 core specification plus quite a number of glTF 2.0 extensions which make the material model a lot more interesting.

NEW: It's now a standalone solution because it shows how to use the native CMake LANGUAGES CUDA feature to build an application which uses native CUDA kernels compiled to binary code and called with the CUDA runtime chevron <<<>>> operator, as well as OptiX device code translated to OptiX-IR or PTX modules, which is done via a CMake Object Library.

To build it, a separate solution needs to be built using the GLTF_renderer/CMakeLists.txt directly! It requires OptiX SDK 8.0.0 or OptiX SDK 7.7.0 to build. See more details in the Building chapter.

Please refer to the specific README.md inside the GLTF_renderer/doc folder for an explanation of its features and limitations.

GLTF_renderer

User Interaction inside the examples

Additionally in all non-intro examples:

Building

In the following paragraphs, the * in all OptiX* expressions stands for the major and minor OptiX version as 70, 71, 72, 73, 74, 75, 76, 77, 80.

The application framework for all these examples uses GLFW for the window management, GLEW 2.1.0 for the OpenGL functions, DevIL 1.8.0 (optionally 1.7.8) for all image loading and saving, local ImGUI code for the simple GUI, and all non-intro examples use ASSIMP to load triangle mesh geometry. rtigo9_omm uses the OptiX Toolkit CUDA-based Opacity Micromap (OMM) Baking tool to generate OMMs from cutout opacity textures.

GLEW 2.1.0 is required for all examples not named with prefix intro for the UUID matching of devices between OpenGL and CUDA which requires a specific OpenGL extension not supported by GLEW 2.0.0. The intro examples compile with GLEW 2.0.0 though.

The top-level CMakeLists.txt file will try to find all currently released OptiX SDK versions via the FindOptiX*.cmake scripts inside the 3rdparty/CMake folder. These search OptiX SDK 7.0.0 to 8.0.0 locations by looking at the resp. OPTIX*_PATH environment variables a developer can set to override the default SDK locations. If those OPTIX*_PATH environment variables are not set, the scripts try the default SDK installation folders. Since OptiX 7 and 8 are header-only APIs, only the include directory is required.

The FindOptiX*.cmake scripts set the resp. OptiX*_FOUND CMake variables which are later used to select which examples are built at all and with which OptiX SDK. (intro_motion_blur requires OptiX SDK 7.2.0 or higher, rtigo9_omm requires 7.6.0 or higher.)

The individual applications' CMakeLists.txt files are setup to use the newest OptiX SDK found and automatically handle API differences via the OPTIX_VERSION define.

When using OptiX SDK 7.5.0 or newer and CUDA Toolkit 11.7 or newer, the OptiX device code will automatically be compiled to the new binary OptiX Intermediate Representation (OptiX IR) instead of PTX code. This can be changed inside the CMakeLists.txt files of the individual examples by commenting out the three lines enabling USE_OPTIX_IR and setting nvcc target option --optixir and the *.optixir filename extension.

When using the OptiX SDK 8.0.0, the MDL_renderer example will use the Shader Execution Reordering (SER) API added in OptiX 8 which will improve the rendering performance on RTX boards with Ada GPUs.

The GLTF_renderer must be built as a standalone solution directly from the GLTF_renderer/CMakeLists.txt because it uses the native CMake LANGUAGES CUDA feature to build an application which uses native CUDA kernels compiled to binary code and called with the CUDA runtime chevron <<<>>> operator, as well as OptiX device code translated to OptiX-IR or PTX modules, which is done via a CMake Object Library.

Also note that the GLTF_renderer OptiX device code modules are only copied into the GLTF_renderer_core foder next to the current build target executable when the INSTALL target is built. That can be done automatically when enabling it inside the MSVS Build -> Configuration Manager dialog. It will then always copy only the changed modules on each build. (I have not found a better automatic method under the multi-target build system, where target file names are only provided as generator expressions.) Unfortunately that INSTALL build option needs to be re-enabled every time the CMakeLists.txt is changed.

Windows

Pre-requisites:

3rdparty library setup:

DevIL:

OptiX Toolkit:

MDL SDK:

Generate the solution:

Building the examples:

Adding the libraries and data (Yes, this could be done automatically but this is required only once.):

Linux

Pre-requisites:

Build the Examples:

Instead of setting the temporary OPTIX80_PATH environment variable, you can also adjust the line set(OPTIX80_PATH "~/NVIDIA-OptiX-SDK-8.0.0-linux64") inside the 3rdparty/CMake/FindOptiX80.cmake script to your local OptiX SDK 8.0.0 installation. Similar for the OptiX 7.x.0 versions.

Running

IMPORTANT: When running the examples from inside the debugger, make sure the working directory points to the folder with the executable because resources are searched relative to that. In Visual Studio that is the same as $(TargetDir). The default is $(ProjectDir) which will not work!

Open a command prompt and change directory to the folder with the executables (same under Linux, just without the .exe suffix.)

For the intro-runtime examples, issue the commands:

Use the same command line options for intro_driver, intro_denoiser and intro_motion_blur.

For rtigo3, issue the commands (similar for the other scene description files):

The following scene description uses the Buggy.gltf model from Khronos which is not contained inside this source code repository. The link is also listed inside the scene_rtigo3_models.txt file.

If you run a multi-GPU system, read the system_rtigo3_dual_gpu_local.txt for the modes of operation and interop settings.

The nvlink_shared example is meant for multi-GPU systems with NVLINK bridge. It is working on single-GPU setups as well though. I have prepared a geometry-heavy scene with 125 spheres of more than 1 million triangles each. That scene requires about 10 GB of VRAM on a single board.

The rtigo9 and rtigo10 examples use an enhanced scene description where camera and tone mapper values can be overridden and materials for surfaces and lights and all light types themselves can be defined per scene now. For that the material definition has changed slightly to support surface and emission distribution functions and some more parameters. Read the provided scene_rtigo9_demo.txt file for how to define all supported light types.

That scene_rtigo9_demo.txt is not using cutout opacity or surface materials on arbitrary mesh lights, which means using it with rtigo10 will result in the same image, it will just run considerably faster.

The rtigo9_omm example uses Opacity Micromaps (OMM) which are built using the OptiX Toolkit CUDA OMM Baking tool. The following command loads a generated OBJ file with 15,000 unit quads randomly placed and oriented inside a sphere with radius 20 units. (Generator code is in createQuads()). The material assigned to the quads is texture mapped with a leaf texture for albedo and cutout opacity. The same command line can be used with rtigo9 to see the performance difference esp. on Ada generation GPUs which accelerate OMMs in hardware. (Try higher rendering resolutions than the default 1024x1024.)

The rtigo12 example uses a slightly enhanced scene description format than rtigo9 and rtigo10 in that it added material parameters for the volume scattering color, scale and bias. Above command lines for rtigo10 work as well, though mesh and rectangle lights will be 1/PI darker due to a change from radiant intensity to radiant exitance definition with diffuse EDFs. The following scene files demonstrate all BXDF implementations and the volume scattering parameters and shows that volumetric shadows just work when placing lights and objects into surrounding objects with volume scattering.

The MDL_renderer example uses the NVIDIA Material definition language for the shader generation. The following scene only uses the *.mdl files and resources from the data/mdl folder you copied next to the executable after building the examples. These show most of the fundamental MDL BSDFs, EDFs, VDFs, layers, mixers, modifiers, thin-walled geometry, textures, cutout opacity, base helper functions, etc.

For a lot more complex materials (this scene requires about 5.4 GB of VRAM), the following command line will work if you have all(!) the NVIDIA MDL vMaterials 1.7, 2.0, 2.1, 2.2, 2.2.1 and 2.3 installed on the system. The application should then automatically find the referenced materials via the two environment variables MDL_SYSTEM_PATH and MDL_USER_PATH set by the vMaterials installation script. If a material exists as reference but couldn't be compiled because it isn't found or had errors, the renderer will not put the geometry with that invalid shader into the render graph. Means without the vMaterials installed only the area light should work in that scene because that is using one of the *.mdl files from the data folder. The result should look similar to the MDL_renderer example screenshot above, where some parameter values had been tweaked.

For version 2.4 download the materials and run

For the curves rendering with MDL hair BSDF materials, issue the command line. That will display a sphere with cubic B-spline curves using a red hair material lit by an area light from above. Please read the scene_mdl_hair.txt for other possible material and model configurations.

For the GLTF_renderer please read the specific README.md for all command line options.

Pull Requests

NVIDIA is happy to review and consider pull requests for merging into the main tree of the optix_apps for bug fixes and features. Before providing a pull request to NVIDIA, please note the following:

Support

Technical support is available on NVIDIA's Developer Forum, or you can create a git issue.