Closed SJoJoK closed 1 year ago
Hello @SJoJoK ,
It is even worse than that, as we trace two rays in the inner loop (MIS with two strategies, bsdf and light importance sampling). So total number of samples= 2n_sample_x n_sample_x. So the sample rates we can easily support without modifying the code is total number of samples = 2, 8, 18, 32, ... corresponding to n_samples_x = 1,2,3,4, ...
Use the config flag n_samples
to control the number of rays, given the formula above: num rays per pixel: 2*n_samples*n_samples
The config flag spp
was always set to 1 in our experiments. That is used for supporting supersampling of primary visibility in the rasterizer.
Configs to reproduce the experiments in the paper are included here: https://github.com/NVlabs/nvdiffrecmc/tree/main/configs
Thank you @jmunkberg , I realized that I mistook the number of iterations in the for loop as spp. I also found that the paper presented results for spp=2*n^2 (8, 18, 32, and 128), which clarified my confusion. By the way, I have another question unrelated to this issue. Have you tried moving the for-loop outside of Optix? (that is, take the for-loop in the python script) I believe this may incur additional OptixLaunch overhead, but I'm not sure if you have conducted similar experiments. If you have, it would be great to know, but if not or if it's not convenient to share, that's okay too. Thank you very much.
Have you tried moving the for-loop outside of Optix?
Yes, we started with the loop in torch, but later moved it into the ray gen to increase runtime performance and to reduce the memory requirements.
Thanks for replying, I have this requirement because I want to perform some post-processing based on intersection information. However, the current way of implementing the for-loop within the raygen forces me to use something like a very large buffer to temporarily store all intersection information in all the samples, so I may move the loop to the torch again 😄 Anyway, thanks again for the information. I'll close the issue.
I apologize for bothering you again @jmunkberg . I tried moving the for-loop outside of the Optix and it works, but not perfectly. The topology is correct, and the $kd$ is roughly correct, but the $k{orm}$ and the light are not the same as one obtained from the original code.
I removed the for-loop from the raygen program and placed it in the forward
and backward
functions of _optix_env_shade_func
, returning the accumulated results (also, I modified the Optix launch parameters to pass the current sample number).
Despite spending a long time reviewing the code, I cannot figure out the issue. The returned diff
, spec
, and the gradients all seem to be accumulated results, and all values in the Optix pipeline except sx
and sy
are independent of the current sample number. Besides, the random number generated inside Optix is controlled by _rnd_seed
.
Here is my code:
class _optix_env_shade_func(torch.autograd.Function):
_random_perm = {}
@staticmethod
def forward(ctx, optix_ctx, mask, ro, gb_pos, gb_normal, gb_view_pos, gb_kd, gb_ks, light, pdf, rows, cols, BSDF, n_samples_x, rnd_seed, shadow_scale):
_rnd_seed = np.random.randint(2**31) if rnd_seed is None else rnd_seed
if n_samples_x not in _optix_env_shade_func._random_perm:
# Generate (32k) tables with random permutations to decorrelate the BSDF and light stratified samples
_optix_env_shade_func._random_perm[n_samples_x] = torch.argsort(
torch.rand(32768, n_samples_x * n_samples_x, device="cuda"), dim=-1).int()
opts = dict(dtype=torch.float32, device=torch.device('cuda'))
diff_accum = torch.zeros((ro.size(0), ro.size(1), ro.size(2), 3),**opts)
spec_accum = torch.zeros_like(diff_accum, **opts)
for sample_idx in range(n_samples_x * n_samples_x):
diff, spec = _plugin.env_shade_fwd(optix_ctx.cpp_wrapper,
mask, ro, gb_pos, gb_normal, gb_view_pos, gb_kd, gb_ks,
light, pdf, rows, cols,
_optix_env_shade_func._random_perm[n_samples_x],
sample_idx, BSDF, n_samples_x, _rnd_seed, shadow_scale)
diff_accum+=diff
spec_accum+=spec
ctx.save_for_backward(mask, ro, gb_pos, gb_normal,
gb_view_pos, gb_kd, gb_ks, light, pdf, rows, cols)
ctx.optix_ctx = optix_ctx
ctx.BSDF = BSDF
ctx.n_samples_x = n_samples_x
ctx.rnd_seed = rnd_seed
ctx.shadow_scale = shadow_scale
return diff_accum, spec_accum
@staticmethod
def backward(ctx, diff_grad, spec_grad):
optix_ctx = ctx.optix_ctx
_rnd_seed = np.random.randint(
2**31) if ctx.rnd_seed is None else ctx.rnd_seed
mask, ro, gb_pos, gb_normal, gb_view_pos, gb_kd, gb_ks, light, pdf, rows, cols = ctx.saved_variables
opts = dict(dtype=torch.float32, device=torch.device('cuda'))
gb_pos_grad_accum = torch.zeros_like(gb_pos,**opts)
gb_normal_grad_accum = torch.zeros_like(gb_normal, **opts)
gb_kd_grad_accum = torch.zeros_like(gb_kd, **opts)
gb_ks_grad_accum = torch.zeros_like(gb_ks, **opts)
light_grad_accum = torch.zeros_like(light,**opts)
for sample_idx in range(ctx.n_samples_x * ctx.n_samples_x):
gb_pos_grad, gb_normal_grad, gb_kd_grad, gb_ks_grad, light_grad = _plugin.env_shade_bwd(
optix_ctx.cpp_wrapper, mask, ro, gb_pos, gb_normal, gb_view_pos, gb_kd, gb_ks, light, pdf, rows, cols,
_optix_env_shade_func._random_perm[ctx.n_samples_x],
sample_idx, ctx.BSDF, ctx.n_samples_x, _rnd_seed, ctx.shadow_scale, diff_grad, spec_grad)
gb_pos_grad_accum += gb_pos_grad
gb_normal_grad_accum += gb_normal_grad
gb_kd_grad_accum += gb_kd_grad
gb_ks_grad_accum += gb_ks_grad
light_grad_accum += light_grad
return None, None, None, gb_pos_grad_accum, gb_normal_grad_accum, None, gb_kd_grad_accum, gb_ks_grad_accum, light_grad_accum, None, None, None, None, None, None, None
and the raygen
extern "C" __global__ void __raygen__rg_()
{
// Lookup our location within the launch grid
const uint3 idx = optixGetLaunchIndex();
const uint3 dim = optixGetLaunchDimensions();
// Read per-pixel constant input tensors, ray_origin, g-buffer entries etc.
float mask = params.mask[idx.z][idx.y][idx.x];
float3 ray_origin = fetch3(params.ro, idx.z, idx.y, idx.x);
float3 gb_pos = fetch3(params.gb_pos, idx.z, idx.y, idx.x);
float3 gb_normal = fetch3(params.gb_normal, idx.z, idx.y, idx.x);
float3 gb_view_pos = fetch3(params.gb_view_pos, idx.z, idx.y, idx.x);
float3 gb_kd = fetch3(params.gb_kd, idx.z, idx.y, idx.x);
float3 gb_ks = fetch3(params.gb_ks, idx.z, idx.y, idx.x);
if (mask <= 0)
return; // Early exit masked pixels
float3 diff_grad, spec_grad;
if (params.backward)
{
diff_grad = fetch3(params.diff_grad, idx.z, idx.y, idx.x);
spec_grad = fetch3(params.spec_grad, idx.z, idx.y, idx.x);
}
float3 diffAccum = make_float3(0.0f, 0.0f, 0.0f);
float3 specAccum = make_float3(0.0f, 0.0f, 0.0f);
float strata_frac = 1.0f / params.n_samples_x;
float sample_frac = 1.0f / (params.n_samples_x * params.n_samples_x);
float alpha = gb_ks.y * gb_ks.y; // roughness squared
float3 wo = safe_normalize(gb_view_pos - gb_pos); // view direction
float metallic = gb_ks.z;
float3 baseColor = gb_kd;
float3 specColor = make_float3(0.04f, 0.04f, 0.04f) * (1.0f - metallic) + baseColor * metallic;
float diffuseWeight = (1.f - metallic) * luminance(baseColor);
float eta = 1.0f;
float specularWeight = albedo(specColor, eta, wo, gb_normal);
float pDiffuse = (diffuseWeight + specularWeight) > 0.f ? diffuseWeight / (diffuseWeight + specularWeight) : 1.f;
float pSpecular = 1.0f - pDiffuse;
unsigned int rng_state = hash_pcg(params.rnd_seed, (idx.z * dim.y + idx.y) * dim.x + idx.x);
unsigned int lightIdx = rand_pcg(rng_state) % params.perms.size(0), bsdfIdx = rand_pcg(rng_state) % params.perms.size(0);
float3 ray_dir, diff, spec;
float sx, sy, sz = 0.f, pdf_light, pdf_bsdf;
unsigned int sample_idx = params.sample_idx;
// Light importance sampling
sx = ((float)(params.perms[lightIdx][sample_idx] % params.n_samples_x) + uniform_pcg(rng_state)) * strata_frac;
sy = ((float)(params.perms[lightIdx][sample_idx] / params.n_samples_x) + uniform_pcg(rng_state)) * strata_frac;
ray_dir = lightSample(sx, sy, pdf_light);
pdf_bsdf = bsdf_pdf(pDiffuse, pSpecular, gb_normal, wo, ray_dir, alpha);
process_sample(idx, ray_origin, ray_dir, gb_pos, gb_normal, gb_view_pos, gb_kd, gb_ks, pdf_light + pdf_bsdf, sample_frac, diff, spec, diff_grad, spec_grad, SampleType::LIGHT);
diffAccum = diffAccum + diff;
specAccum = specAccum + spec;
// BSDF sampling (sample either the diffuse or specular lobe)
sx = ((float)(params.perms[bsdfIdx][sample_idx] % params.n_samples_x) + uniform_pcg(rng_state)) * strata_frac;
sy = ((float)(params.perms[bsdfIdx][sample_idx] / params.n_samples_x) + uniform_pcg(rng_state)) * strata_frac;
sz = uniform_pcg(rng_state);
ray_dir = bsdf_sample(pDiffuse, pSpecular, gb_normal, wo, make_float3(sx, sy, sz), alpha, pdf_bsdf);
pdf_light = lightPDF(ray_dir);
process_sample(idx, ray_origin, ray_dir, gb_pos, gb_normal, gb_view_pos, gb_kd, gb_ks, pdf_light + pdf_bsdf, sample_frac, diff, spec, diff_grad, spec_grad, SampleType::BSDF);
diffAccum = diffAccum + diff;
specAccum = specAccum + spec;
// Record results in our output raster
if (!params.backward)
{
params.diff[idx.z][idx.y][idx.x][0] = diffAccum.x;
params.diff[idx.z][idx.y][idx.x][1] = diffAccum.y;
params.diff[idx.z][idx.y][idx.x][2] = diffAccum.z;
params.spec[idx.z][idx.y][idx.x][0] = specAccum.x;
params.spec[idx.z][idx.y][idx.x][1] = specAccum.y;
params.spec[idx.z][idx.y][idx.x][2] = specAccum.z;
}
}
Here the optimized light obtain from above code: and here the optimized light obtain from the origin code: I would greatly appreciate any guidance or suggestions to help me understand and fix this issue. Please let me know if any additional information is needed. Thank you for your time and assistance!
Sorry, I cannot provide support for this, as this is a modification of our released code.
A quick look at the code above, perhaps in the torch.autograd backward function, diff_grad, spec_grad
should be scaled with one over the number of loop iterations, but that is still just a magnitude difference, and does not explain the visual differences.
Thanks for replying, I understand that providing support for modifications is not possible. I appreciate your suggestion regarding the backward function, and I will look into it to see if it helps address the issue. Thanks again for your time and assistance.
Yes, sorry for the short reply. I'm happy to help, but it is hard to provide useful feedback by just getting the full code listing. Debugging backward passes is always tricky, and we usually try writing small unit tests for each component to verify the output, like here https://github.com/NVlabs/nvdiffrecmc/tree/main/render/renderutils/tests
So if possible, perhaps try extracting the two versions of the code (loop in OptiX vs. loop in torch) in a separate test Python script, and verify that all outputs are correct in both the forward and backward passes.
Thank you for your suggestion. I am currently in the process of comparing the output of different versions of the code. I noticed that there are slight differences in the results of the forward function when the loop is in Torch (I used the original backward function). I am currently investigating the root cause of this issue.
In principal, only sx
and sy
are related to the current sample, and the radiance is simply the result of accumulation.Thus, I'm seeking to identify any other factors that are difference at different OptixLaunch
.
Once again, thank you for your reply.
Hello,
Thanks for your great work! I have been going through your code and something confused me. In the raygen function of the optix pipeline,
n_sample_x * n_sample_x
rays are generated:for (int i = 0; i < params.n_samples_x * params.n_samples_x; ++i)
This seems to imply that the samples per pixel (spp) should be equal to
n_sample_x * n_sample_x
. So, how can I obtain a result with spp=8 or any other setting where spp is not a square integer? (or, how should i change the stratified sampling strategy to match the experiments presented in the paper?)Thanks in advance for any help or advice.