match code with paper weight_i computation

Originally brought up in issue #426 created earlier

Here is the copied info:

This may be long and a little involved. I will try my best to explain the possible issue.

It's an issue of implementation detail of harmonic weight used by warped-area sampling (WAS) differentiable path tracer. Here is the paper

I am talking about this function: float computeHarmonicWeight() in Source\RenderPasses\WARDiffPathTracer\WarpedAreaReparam.slang This is the current implementation:

float computeHarmonicWeight(
    no_diff IntersectionAD isect,
    no_diff float3 origin,
    no_diff float3 auxDirection,
    no_diff float auxSampleY,
    no_diff float kappa,
    float3 direction
)
{
    float boundaryTerm = no_diff computeBoundaryTerm(isect.normalW, auxDirection);

    float sy = max(1.f - auxSampleY, 1e-6f);
    float invVMFDensity = 1.f / ((1.f - sy) * exp(-2.f * kappa) + sy);

    float wDenom = invVMFDensity - 1.f + boundaryTerm;
    float wDenomRcp = select(wDenom > 1e-4f, 1.f / wDenom, 0.f);
    float w = wDenomRcp * wDenomRcp * wDenomRcp * invVMFDensity;
    return w;
}

This function basically implements the computation of weight_i, a single line in Algorithm 2 in the paper. It seems like the code doesn't match the formula given in the paper. Here is some derivation & explanation. (I will use 'w' for omega, unit direction and 'weight' for w in the paper for convenience)

dot(w, w'). We take advantage of a fact that dot product doesn't change after we transform both vectors with same transformation. Thus, it equals the dot product in local space, where w_local is simply (0,0,1) and the sampled w'_local has z component cos(theta) = 1.f + log(exp(-2.f kappa) (1.f - sy) + sy) / kappa; See sampleVonMisesFisher() in Source\Falcor\Utils\Math\MathHelpers.slang for details.
Following that, after some simple derivation, we have the exponential term in the denominator becomes (1.f - sy) exp(-2.f kappa) + sy, which should be "VMF density of w' ".
Thus, weight_i = 1 / (VMF_density - 1 + BoundaryTerm) / VMFDensity
Further, I couldn't understand why there is a "1 / wDenom ^ 3". I didn't find anything in the calculation related to a cubic term.

And here is my modified implementation:

float computeHarmonicWeight(
    no_diff IntersectionAD isect,
    no_diff float3 origin,
    no_diff float3 auxDirection,
    no_diff float auxSampleY,
    no_diff float kappa,
    float3 direction // not used in computation; allows autodiff to find grad w.r.t. this direction
)
{
    float boundaryTerm = no_diff computeBoundaryTerm(isect.normalW, auxDirection);

    float sy = max(1.f - auxSampleY, 1e-6f);
    // see WAS paper Algorithm 2, computation of weight_i
    /* float invVMFDensity = 1.f / ((1.f - sy) * exp(-2.f * kappa) + sy);

    float wDenom = invVMFDensity - 1.f + boundaryTerm;
    float wDenomRcp = select(wDenom > 1e-4f, 1.f / wDenom, 0.f);
    float w = wDenomRcp * wDenomRcp * wDenomRcp * invVMFDensity; */

    float VMFDensity = (1.f - sy) * exp(-2.f * kappa) + sy;
    float w = 1.f / (VMFDensity - 1.f + boundaryTerm) / VMFDensity;
    return w;
}

After the change, here is the bunny_ref test scene. (First is before change, second is after change.) I did see some very subtle difference. But admittedly, I am not an expert on Falcor, differentiable rendering, or WAS, so it's very possible I am wrong. But I do want to point things out in case there is something wrong.

Thank you!

BeforeChange

AfterChange

Finally, here is the launch.json entry for this test scene for your convenience.

        {
            // Launch configuration for Mogwai (Warped Area Sampling Differentiable Rendering)
            "name": "Mogwai WAS",
            "type": "cppvsdbg",
            "request": "launch",
            "program": "${command:cmake.launchTargetPath}",
            "windows": {
                "program": "${command:cmake.launchTargetDirectory}/Mogwai.exe"
            },
            "args": [
                "--script=${workspaceFolder}/scripts/WARDiffPathTracer.py",
                "--scene=${workspaceFolder}/media/inv_rendering_scenes/bunny_ref.pyscene"
            ],
            "stopAtEntry": false,
            "cwd": "${command:cmake.launchTargetDirectory}",
            "environment": [
                {
                    "name": "FALCOR_DEVMODE",
                    "value": "1"
                }
            ],
            "visualizerFile": "${workspaceFolder}/Source/Falcor/Falcor.natvis"
        },

Sorry, I am still exploring and learning the entire Falcor project (magnificent work btw). I made mistakes in 2 details in my previous comment: 1) Just realized there are other tests in addition to unit tests, which are executed by FalcorTest.exe. In particular, those unit tests don't have direct checking on warped-area sampling differentiable PT. Image tests do have. And here are the test result. I commented out line 51 "[ForwardDerivative(fwd_computeHarmonicWeight)]" which links computeHarmonicWeight() to its custom fwd_diff function and let Falcor to auto-diff.

PS my_folder> .\tests\run_image_tests.bat --gen-refs -f .*test_WARDiffPathTracer.* --parallel 1
Reference directory: my_folder\tests\data\refs\<branch name>\windows-ninja-msvc-Release
Generating references for 4 tests on 1 processes
  renderpasses/test_WARDiffPathTracerMaterialFwd_d3d12         : STARTED
  renderpasses/test_WARDiffPathTracerTranslationBwd_d3d12      : STARTED
  renderpasses/test_WARDiffPathTracerMaterialFwd_d3d12         : PASSED (13.9 s)
  renderpasses/test_WARDiffPathTracerTranslationFwd_d3d12      : STARTED
  renderpasses/test_WARDiffPathTracerTranslationBwd_d3d12      : PASSED (25.0 s)
  renderscripts/test_WARDiffPathTracer_d3d12                   : STARTED
  renderpasses/test_WARDiffPathTracerTranslationFwd_d3d12      : PASSED (9.2 s)
  renderscripts/test_WARDiffPathTracer_d3d12                   : PASSED (6.8 s)

Generating references PASSED (0.0 s).

All related tests passed.

2) In my previous runs and attached screenshot, I used the wrong bunny test scene. It should be "test_scenes/bunny_war_diff_pt.pyscene" from the python configuration file. Now here is the updated comparison: First is original code with custom fwd_diff function; Second is my modified code with auto diff; And I changed max bounces and sample per pixel to 4 and 8 respectively

Now we can see there is a big difference. It's very possible my reasoning is wrong. Still, I'd like some expert confirmation. Really appreciate that!

NVIDIAGameWorks / Falcor

match code with paper weight_i computation #427