OptiX testrender overhaul

tgrant-nv commented 3 months ago

Description

This PR adds support for full path tracing in the OptiX mode of testrender, including full BSDF sampling and evaluation. The benefit of this change is that we're able to render most of the scenes from the testsuite in OptiX mode with results that closely match the host output. This comes at the cost of increased coupling between the host and OptiX renderers, and therefore an increased maintenance burden.

ID-based dispatch

The main difference between the host and OptiX paths is how the individual BSDFs are evaluated in the CompositeBSDF class. Virtual function calls aren't well supported in OptiX, so rather than using regular C++ polymorphism to invoke the sample(), eval(), and get_albedo() functions for each of the BSDF sub-types, we manually invoke the correct function based on the closure ID (which we have added as a member of the BSDF class).

// from shading_cuda.cpp

#define BSDF_CAST(BSDF_TYPE, bsdf) reinterpret_cast<const BSDF_TYPE*>(bsdf)

OSL_HOSTDEVICE Color3
CompositeBSDF::get_bsdf_albedo(const BSDF* bsdf, const Vec3& wo) const
{
    ...
    switch (bsdf->id) {
    case DIFFUSE_ID:
        albedo = BSDF_CAST(Diffuse<0>, bsdf)->get_albedo(wo);
        break;
    case TRANSPARENT_ID:
    case MX_TRANSPARENT_ID:
        albedo = BSDF_CAST(Transparent, bsdf)->get_albedo(wo);
        break;
    ...

Iterative closure evaluation

Another key difference from the host path is the non-recursive closure evaluation. We retain the same style of iterative tree traversal used in the previous OptiX version of process_closure(). This PR also adds evaluate_layer_opacity(), process_medium_closure(), process_background_closure(), which follow the same evaluation pattern.

`subpixel_radiance()`

The raytracing pipeline mirrors the host code very closely, including camera ray generation and the spawning of secondary rays. This allows a close visual match between the host and OptiX modes.

We've implemented a CUDA version of subpixel_radiance() (in optix_raytracer.cu) that closely mirrors the host version, with the main difference being in how rays are traced and how the shaders are executed. It might be possible to unify the implementations if it would ease the maintenance burden, but for now it seemed cleaner to leave them separate.

Background sampling

We've included support for background closures. This includes a CUDA implementation of the Background::prepare() function. We've broken that function into three phases, where phases 1 and 3 are parallelized across a warp and phase 2 is executed on a single thread. This offers a decent speedup over a single-threaded implementation without the complexity of a more sophisticated implementation.

    // from background.h

    template<typename F>
    OSL_HOSTDEVICE void prepare_cuda(int stride, int idx, F cb)
    {
        prepare_cuda_01(stride, idx, cb);
        if (idx == 0)
            prepare_cuda_02();
        prepare_cuda_03(stride, idx);
    }

Tests

Checklist:

[x] I have read the contribution guidelines.
[x] I have updated the documentation, if applicable.
[x] I have ensured that the change is tested somewhere in the testsuite (adding new test cases if necessary).
[x] My code follows the prevailing code style of this project. If I haven't already run clang-format v17 before submitting, I definitely will look at the CI test that runs clang-format and fix anything that it highlights as being nonconforming.

fpsunflower commented 3 months ago

Thank Tim - this is looking great overall, but I'll need a while to go through it carefully.

My initial reaction is that there still seems to be a fair bit of duplicated code. It would be nice if we could find a way to share the rendering logic even more.

I'll also need to figure out how to best handle the upcoming work to make testrender triangle oriented instead of spheres/quads.

fpsunflower commented 3 months ago

My other initial comment would be -- can we use the same "switch based" dispatch mechanism the same on cpu as well instead of having virtual dispatch on cpu and switch based on gpu.

tgrant-nv commented 3 months ago

Thanks for the feedback, Chris. The code duplication is definitely one of my main concerns. Using the switch-based dispatch on the CPU should be a straightforward change. I wasn't sure if that would be an acceptable approach, but based on your suggestion and from talking to the Imageworks folks that sounds like the way to go.

Unifying the subpixel_radiance() implementations will take a bit more effort but shouldn't be too difficult. A small CUDA version of the Scene class should get us most of the way there. I'll prototype that and see how close it gets us.

aconty commented 3 months ago

Other than doing the switch case on CPU like Chris said this this LGTM

I sent you a couple of recipes to reduce code duplication, hope it helps.

tgrant-nv commented 2 months ago

I've converted this PR to a draft, pending Chris's mesh support.

@fpsunflower, I should point out that testrender does already run in OptiX mode, but it only supports simple diffuse shading. This PR is to enable full path tracing. I can definitely help out with the OptiX side of mesh support, so don't hesitate to reach out.

fpsunflower commented 2 months ago

Will do! I've pushed my branch so far to https://github.com/fpsunflower/OpenShadingLanguage/tree/testrender-triangles in case you want to take a look. I was working on this on my mac, so the existing Optix path is most likely broken.

We should definitely circle back on this after Siggraph. There's a few more loose ends to fix up around derivatives (which are technically not set up right in the current implementation either).

fpsunflower commented 2 months ago

Also, that branch adds a few scene to the testsuite, but github complained about the files being too large -- we'll probably want to remove those from the history before merging and find smaller models to use instead.

tgrant-nv commented 2 months ago

Sounds good. I'll check out your branch and see what needs to be done to get OptiX up and running again. This PR includes a refactor of the OptixRaytracer class that we can fold into the mesh effort.

lgritz commented 2 months ago

Also, that branch adds a few scene to the testsuite, but github complained about the files being too large -- we'll probably want to remove those from the history before merging and find smaller models to use instead.

Much like how OIIO has oiio-images and OpenEXR has openexr-images, OSL could also have a second repo just for "big things only used for tests" so it doesn't clutter the main repo or make a big download for people who just want to inspect or build the code. The testing scripts can enable certain tests only if that other repo is found at build time in a sibling directory,

tgrant-nv commented 1 month ago

@fpsunflower, please check out the testrender-triangles-optix branch from my fork to see what I've done to get the OptiX path working. It includes most of the host-side refactoring from this PR. Having only one primitive type is a nice simplification.

fpsunflower commented 1 month ago

I haven't go through your branch in great detail, but it sounds like you got everything working already which is great :) What would you say is the best path forward in terms of sequencing the changes into the main repo?

I was thinking I should probably put forward a first PR to move testrender over to triangles. It would temporarily break the optix side, but it looks we could probably follow up quickly with your changes.

@lgritz would that be ok? Or would you prefer if Tim and I coordinate to make a single PR that submits both things in one go?

lgritz commented 1 month ago

I'm ok doing it in stages. It's ok to temporarily break as long as we are confident that the part 2 is coming right on its heels.

fpsunflower commented 1 month ago

Sounds good. When I get some cycles I will polish off my end of the branch and submit a PR.

Do you want me to prepare splitting out the shaderball test into its own repo (in preparation of maybe having more big obj files and textures)?

With maps/geo, the current render-shaderball folder is 61Mb. For comparison, the rest of the testsuite folder is ~42Mb, so its more than doubling the size for just one test.

At the same time, I'm not sure its really worth the hassle to create a second repo + extra cmake hoops to jump through (same amount of data to download in the end, and a slightly clunkier workflow). We'll just have to be somewhat disciplined in what we add to the testsuite.

I can also punt and skip adding that particular test in the initial PR.

lgritz commented 1 month ago

Do you want me to prepare splitting out the shaderball test into its own repo (in preparation of maybe having more big obj files and textures)?

With maps/geo, the current render-shaderball folder is 61Mb. For comparison, the rest of the testsuite folder is ~42Mb, so its more than doubling the size for just one test.

At the same time, I'm not sure its really worth the hassle to create a second repo + extra cmake hoops to jump through (same amount of data to download in the end, and a slightly clunkier workflow). We'll just have to be somewhat disciplined in what we add to the testsuite.

I feel like with the addition of meshes and surely other features to come, it's just a matter of time before we're going to want to add a number of bigger tests. So I'm inclined to bite the bullet and just set it up now. Most tests will continue to live in the main repo, but large data files referenced by tests -- say, more than a couple MB -- should live in the test data repo.

This is analogous to the OpenEXR-images or OpenImageIO-images repos, which are used to store large things used only for tests, to keep them from cluttering the main repo or forcing people to download and store them merely to build the main project.

Maybe ask John to create a new repo for this purpose. OpenShadingLanguage-tests? OpenShadingLanguage-data? -testdata? -models? -scenes? I'm open to suggestions.

AcademySoftwareFoundation / OpenShadingLanguage