OptiX testrender overhaul (take two)

tgrant-nv commented 3 weeks ago

Description

This PR is a continuation of #1829, updated to include the recently added triangle mesh support. It enables full path tracing support for the OptiX backend in testrender. We have tried to share code between the CPU and OptiX backends where practical. There is more sharing in this PR than there was in #1829, which should reduce the maintenance burden a bit.

ID-based dispatch

Virtual function calls aren't well supported in OptiX, so rather than using regular C++ polymorphism to invoke the sample(), eval(), and get_albedo() functions for each of the BSDF sub-types, we manually invoke the correct function based on the closure ID (which we have added as a member of the BSDF class).

#define BSDF_CAST(BSDF_TYPE, bsdf) reinterpret_cast<const BSDF_TYPE*>(bsdf)

OSL_HOSTDEVICE Color3
CompositeBSDF::get_albedo(const BSDF* bsdf, const Vec3& wo) const
{
    Color3 albedo(0);
    switch (bsdf->id) {
    case DIFFUSE_ID:
        albedo = BSDF_CAST(Diffuse<0>, bsdf)->get_albedo(wo);
        break;
    case TRANSPARENT_ID:
    case MX_TRANSPARENT_ID:
        albedo = BSDF_CAST(Transparent, bsdf)->get_albedo(wo);
        break;

Iterative closure evaluation

Another key change is the non-recursive closure evaluation. We apply the same style of iterative tree traversal used in the previous OptiX version of process_closure() to the shared implementations of process_closure(), evaluate_layer_opacity(), process_medium_closure(), and process_background_closure().

Background sampling

We've included support for background closures. This includes an OptiX implementation of the Background::prepare() function. We've broken that function into three phases, where phases 1 and 3 are parallelized across a warp and phase 2 is executed on a single thread. This offers a decent speedup over a single-threaded implementation without the complexity of a more sophisticated implementation.

    // from background.h

    template<typename F>
    OSL_HOSTDEVICE void prepare_cuda(int stride, int idx, F cb)
    {
        prepare_cuda_01(stride, idx, cb);
        if (idx == 0)
            prepare_cuda_02();
        prepare_cuda_03(stride, idx);
    }

Tests

I have enabled the render-* tests for OptiX mode. I've added alternative reference images, since the GPU output exceeds the difference threshold on many of the tests. But in most cases the difference between the CPU and GPU output is very small.

Checklist:

[x] I have read the contribution guidelines.
[x] I have updated the documentation, if applicable.
[x] I have ensured that the change is tested somewhere in the testsuite (adding new test cases if necessary).
[x] My code follows the prevailing code style of this project. If I haven't already run clang-format v17 before submitting, I definitely will look at the CI test that runs clang-format and fix anything that it highlights as being nonconforming.

lgritz commented 1 week ago

Does this fully replace #1829? Should we close that other one to avoid confusion?

lgritz commented 1 week ago

@chellmuth and @aconty does this look reasonable to you? On an absolute scale, but also, using a set of idioms that make it a decent proxy for what we care about in a real renderer?

lgritz commented 1 week ago

@tgrant-nv This LGTM, I ran tests on my machine and came up with all sorts of failures (not your fault). The vector2/color2 tests are unrelated, I will look into that separately. But there were lots of optix tests that failed because of relatively small number of differences in the sampling noise. I see you added reference images, but even those didn't match quite right for me -- maybe different version of optix, or driver? Anyway, loosening up the thresholds did the trick. (I also changed the names of your ref images to the usual convention, a very nit-picky thing.)

So I went to push these updates on top of your branch, and it wouldn't let me, despite this very page saying "Maintainers are allowed to edit this pull request" -- I get an error "Authentication required: You must have push access to verify locks". I can do this to PRs on OIIO, but not on OSL, for reasons I don't understand.

So, could I trouble you to please take the optix-testrender-overhaul-take2 branch from my "lgritz" account (it's public) and then push that to yours, to amend this PR?

AcademySoftwareFoundation / OpenShadingLanguage