Confusion in the number of samples per pixel (spp)

SJoJoK commented 1 year ago

Hello,

Thanks for your great work! I have been going through your code and something confused me. In the raygen function of the optix pipeline, n_sample_x * n_sample_x rays are generated: for (int i = 0; i < params.n_samples_x * params.n_samples_x; ++i)

This seems to imply that the samples per pixel (spp) should be equal to n_sample_x * n_sample_x. So, how can I obtain a result with spp=8 or any other setting where spp is not a square integer? (or, how should i change the stratified sampling strategy to match the experiments presented in the paper?)

Thanks in advance for any help or advice.

jmunkberg commented 1 year ago

Hello @SJoJoK ,

It is even worse than that, as we trace two rays in the inner loop (MIS with two strategies, bsdf and light importance sampling). So total number of samples= 2n_sample_x n_sample_x. So the sample rates we can easily support without modifying the code is total number of samples = 2, 8, 18, 32, ... corresponding to n_samples_x = 1,2,3,4, ...

Use the config flag n_samples to control the number of rays, given the formula above: num rays per pixel: 2*n_samples*n_samples

The config flag spp was always set to 1 in our experiments. That is used for supporting supersampling of primary visibility in the rasterizer.

Configs to reproduce the experiments in the paper are included here: https://github.com/NVlabs/nvdiffrecmc/tree/main/configs

SJoJoK commented 1 year ago

Thank you @jmunkberg , I realized that I mistook the number of iterations in the for loop as spp. I also found that the paper presented results for spp=2*n^2 (8, 18, 32, and 128), which clarified my confusion. By the way, I have another question unrelated to this issue. Have you tried moving the for-loop outside of Optix? (that is, take the for-loop in the python script) I believe this may incur additional OptixLaunch overhead, but I'm not sure if you have conducted similar experiments. If you have, it would be great to know, but if not or if it's not convenient to share, that's okay too. Thank you very much.

jmunkberg commented 1 year ago

Have you tried moving the for-loop outside of Optix?

Yes, we started with the loop in torch, but later moved it into the ray gen to increase runtime performance and to reduce the memory requirements.

SJoJoK commented 1 year ago

Thanks for replying, I have this requirement because I want to perform some post-processing based on intersection information. However, the current way of implementing the for-loop within the raygen forces me to use something like a very large buffer to temporarily store all intersection information in all the samples, so I may move the loop to the torch again 😄 Anyway, thanks again for the information. I'll close the issue.

SJoJoK commented 1 year ago

I apologize for bothering you again @jmunkberg . I tried moving the for-loop outside of the Optix and it works, but not perfectly. The topology is correct, and the $kd$ is roughly correct, but the $k{orm}$ and the light are not the same as one obtained from the original code. I removed the for-loop from the raygen program and placed it in the forward and backward functions of _optix_env_shade_func, returning the accumulated results (also, I modified the Optix launch parameters to pass the current sample number). Despite spending a long time reviewing the code, I cannot figure out the issue. The returned diff, spec, and the gradients all seem to be accumulated results, and all values in the Optix pipeline except sx and sy are independent of the current sample number. Besides, the random number generated inside Optix is controlled by _rnd_seed. Here is my code:

class _optix_env_shade_func(torch.autograd.Function):
    _random_perm = {}
    @staticmethod
    def forward(ctx, optix_ctx, mask, ro, gb_pos, gb_normal, gb_view_pos, gb_kd, gb_ks, light, pdf, rows, cols, BSDF, n_samples_x, rnd_seed, shadow_scale):
        _rnd_seed = np.random.randint(2**31) if rnd_seed is None else rnd_seed
        if n_samples_x not in _optix_env_shade_func._random_perm:
            # Generate (32k) tables with random permutations to decorrelate the BSDF and light stratified samples
            _optix_env_shade_func._random_perm[n_samples_x] = torch.argsort(
                torch.rand(32768, n_samples_x * n_samples_x, device="cuda"), dim=-1).int()
        opts = dict(dtype=torch.float32, device=torch.device('cuda'))
        diff_accum = torch.zeros((ro.size(0), ro.size(1), ro.size(2), 3),**opts)
        spec_accum = torch.zeros_like(diff_accum, **opts)
        for sample_idx in range(n_samples_x * n_samples_x):
            diff, spec = _plugin.env_shade_fwd(optix_ctx.cpp_wrapper,
                                            mask, ro, gb_pos, gb_normal, gb_view_pos, gb_kd, gb_ks,
                                            light, pdf, rows, cols,
                                            _optix_env_shade_func._random_perm[n_samples_x],
                                            sample_idx, BSDF, n_samples_x, _rnd_seed, shadow_scale)
            diff_accum+=diff
            spec_accum+=spec
        ctx.save_for_backward(mask, ro, gb_pos, gb_normal,
                              gb_view_pos, gb_kd, gb_ks, light, pdf, rows, cols)
        ctx.optix_ctx = optix_ctx
        ctx.BSDF = BSDF
        ctx.n_samples_x = n_samples_x
        ctx.rnd_seed = rnd_seed
        ctx.shadow_scale = shadow_scale
        return diff_accum, spec_accum

    @staticmethod
    def backward(ctx, diff_grad, spec_grad):
        optix_ctx = ctx.optix_ctx
        _rnd_seed = np.random.randint(
            2**31) if ctx.rnd_seed is None else ctx.rnd_seed
        mask, ro, gb_pos, gb_normal, gb_view_pos, gb_kd, gb_ks, light, pdf, rows, cols = ctx.saved_variables
        opts = dict(dtype=torch.float32, device=torch.device('cuda'))
        gb_pos_grad_accum = torch.zeros_like(gb_pos,**opts)
        gb_normal_grad_accum = torch.zeros_like(gb_normal, **opts)
        gb_kd_grad_accum = torch.zeros_like(gb_kd, **opts)
        gb_ks_grad_accum = torch.zeros_like(gb_ks, **opts)
        light_grad_accum = torch.zeros_like(light,**opts)
        for sample_idx in range(ctx.n_samples_x * ctx.n_samples_x):
            gb_pos_grad, gb_normal_grad, gb_kd_grad, gb_ks_grad, light_grad = _plugin.env_shade_bwd(
                optix_ctx.cpp_wrapper, mask, ro, gb_pos, gb_normal, gb_view_pos, gb_kd, gb_ks, light, pdf, rows, cols,
                _optix_env_shade_func._random_perm[ctx.n_samples_x],
                sample_idx, ctx.BSDF, ctx.n_samples_x, _rnd_seed, ctx.shadow_scale, diff_grad, spec_grad)
            gb_pos_grad_accum += gb_pos_grad
            gb_normal_grad_accum += gb_normal_grad
            gb_kd_grad_accum += gb_kd_grad
            gb_ks_grad_accum += gb_ks_grad
            light_grad_accum += light_grad
        return None, None, None, gb_pos_grad_accum, gb_normal_grad_accum, None, gb_kd_grad_accum, gb_ks_grad_accum, light_grad_accum, None, None, None, None, None, None, None

and the raygen

extern "C" __global__ void __raygen__rg_()
{
    // Lookup our location within the launch grid
    const uint3 idx = optixGetLaunchIndex();
    const uint3 dim = optixGetLaunchDimensions();

    // Read per-pixel constant input tensors, ray_origin, g-buffer entries etc.
    float  mask        = params.mask[idx.z][idx.y][idx.x];
    float3 ray_origin  = fetch3(params.ro, idx.z, idx.y, idx.x);
    float3 gb_pos      = fetch3(params.gb_pos, idx.z, idx.y, idx.x);
    float3 gb_normal   = fetch3(params.gb_normal, idx.z, idx.y, idx.x);
    float3 gb_view_pos = fetch3(params.gb_view_pos, idx.z, idx.y, idx.x);
    float3 gb_kd       = fetch3(params.gb_kd, idx.z, idx.y, idx.x);
    float3 gb_ks       = fetch3(params.gb_ks, idx.z, idx.y, idx.x);
    if (mask <= 0)
        return; // Early exit masked pixels

    float3 diff_grad, spec_grad;
    if (params.backward)
    {
        diff_grad = fetch3(params.diff_grad, idx.z, idx.y, idx.x);
        spec_grad = fetch3(params.spec_grad, idx.z, idx.y, idx.x);
    }

    float3 diffAccum = make_float3(0.0f, 0.0f, 0.0f);
    float3 specAccum = make_float3(0.0f, 0.0f, 0.0f);

    float strata_frac = 1.0f / params.n_samples_x;
    float sample_frac = 1.0f / (params.n_samples_x * params.n_samples_x);
    float alpha = gb_ks.y * gb_ks.y; // roughness squared
    float3 wo = safe_normalize(gb_view_pos - gb_pos); // view direction

    float metallic = gb_ks.z;
    float3 baseColor = gb_kd;
    float3 specColor = make_float3(0.04f, 0.04f, 0.04f) * (1.0f - metallic) + baseColor * metallic;
    float diffuseWeight = (1.f - metallic) * luminance(baseColor);
    float eta = 1.0f;
    float specularWeight = albedo(specColor, eta, wo, gb_normal);
    float pDiffuse = (diffuseWeight + specularWeight) > 0.f ? diffuseWeight / (diffuseWeight + specularWeight) : 1.f;
    float pSpecular = 1.0f - pDiffuse;

    unsigned int rng_state = hash_pcg(params.rnd_seed, (idx.z * dim.y + idx.y) * dim.x + idx.x);
    unsigned int lightIdx = rand_pcg(rng_state) % params.perms.size(0), bsdfIdx = rand_pcg(rng_state) % params.perms.size(0);

    float3 ray_dir, diff, spec;
    float sx, sy, sz = 0.f, pdf_light, pdf_bsdf;

    unsigned int sample_idx = params.sample_idx;

      // Light importance sampling
      sx = ((float)(params.perms[lightIdx][sample_idx] % params.n_samples_x) + uniform_pcg(rng_state)) * strata_frac;
      sy = ((float)(params.perms[lightIdx][sample_idx] / params.n_samples_x) + uniform_pcg(rng_state)) * strata_frac;
      ray_dir = lightSample(sx, sy, pdf_light);
      pdf_bsdf = bsdf_pdf(pDiffuse, pSpecular, gb_normal, wo, ray_dir, alpha);
      process_sample(idx, ray_origin, ray_dir, gb_pos, gb_normal, gb_view_pos, gb_kd, gb_ks, pdf_light + pdf_bsdf, sample_frac, diff, spec, diff_grad, spec_grad, SampleType::LIGHT);
      diffAccum = diffAccum + diff;
      specAccum = specAccum + spec;

      // BSDF sampling (sample either the diffuse or specular lobe)
      sx = ((float)(params.perms[bsdfIdx][sample_idx] % params.n_samples_x) + uniform_pcg(rng_state)) * strata_frac;
      sy = ((float)(params.perms[bsdfIdx][sample_idx] / params.n_samples_x) + uniform_pcg(rng_state)) * strata_frac;
      sz = uniform_pcg(rng_state);
      ray_dir = bsdf_sample(pDiffuse, pSpecular, gb_normal, wo, make_float3(sx, sy, sz), alpha, pdf_bsdf);
      pdf_light = lightPDF(ray_dir);
      process_sample(idx, ray_origin, ray_dir, gb_pos, gb_normal, gb_view_pos, gb_kd, gb_ks, pdf_light + pdf_bsdf, sample_frac, diff, spec, diff_grad, spec_grad, SampleType::BSDF);
      diffAccum = diffAccum + diff;
      specAccum = specAccum + spec;

    // Record results in our output raster
    if (!params.backward)
    {
        params.diff[idx.z][idx.y][idx.x][0] = diffAccum.x;
        params.diff[idx.z][idx.y][idx.x][1] = diffAccum.y;
        params.diff[idx.z][idx.y][idx.x][2] = diffAccum.z;
        params.spec[idx.z][idx.y][idx.x][0] = specAccum.x;
        params.spec[idx.z][idx.y][idx.x][1] = specAccum.y;
        params.spec[idx.z][idx.y][idx.x][2] = specAccum.z;
    }
}

Here the optimized light obtain from above code: val_000029_light_image and here the optimized light obtain from the origin code: I would greatly appreciate any guidance or suggestions to help me understand and fix this issue. Please let me know if any additional information is needed. Thank you for your time and assistance!

jmunkberg commented 1 year ago

Sorry, I cannot provide support for this, as this is a modification of our released code.

A quick look at the code above, perhaps in the torch.autograd backward function, diff_grad, spec_grad should be scaled with one over the number of loop iterations, but that is still just a magnitude difference, and does not explain the visual differences.

SJoJoK commented 1 year ago

Thanks for replying, I understand that providing support for modifications is not possible. I appreciate your suggestion regarding the backward function, and I will look into it to see if it helps address the issue. Thanks again for your time and assistance.

jmunkberg commented 1 year ago

Yes, sorry for the short reply. I'm happy to help, but it is hard to provide useful feedback by just getting the full code listing. Debugging backward passes is always tricky, and we usually try writing small unit tests for each component to verify the output, like here https://github.com/NVlabs/nvdiffrecmc/tree/main/render/renderutils/tests

So if possible, perhaps try extracting the two versions of the code (loop in OptiX vs. loop in torch) in a separate test Python script, and verify that all outputs are correct in both the forward and backward passes.

SJoJoK commented 1 year ago

Thank you for your suggestion. I am currently in the process of comparing the output of different versions of the code. I noticed that there are slight differences in the results of the forward function when the loop is in Torch (I used the original backward function). I am currently investigating the root cause of this issue.

In principal, only sx and sy are related to the current sample, and the radiance is simply the result of accumulation.Thus, I'm seeking to identify any other factors that are difference at different OptixLaunch.

Once again, thank you for your reply.

NVlabs / nvdiffrecmc

Confusion in the number of samples per pixel (spp) #24