SVGF Denoiser implementation

0beqz commented 1 year ago

About

Hey, I'm working on implementing SVGF for a GI effect I'm working on. So I think it would be very suitable for this project as a denoiser. Since it implements temporal reprojection as well, it would also solve issue https://github.com/gkjohnson/three-gpu-pathtracer/issues/60 through optionally leaving out the denoising pass at the end. I would like to describe how it works, what is needed to implement it and then how exactly we could implement it in three-gpu-pathtracer. Related issues: https://github.com/gkjohnson/three-gpu-pathtracer/issues/85 and https://github.com/gkjohnson/three-gpu-pathtracer/issues/60

How SVGF works

I'm using images/videos from a GI effect for demo purposes here.

Raw input frame

Suppose you have rendered the diffuse lighting in half resolution for the current frame: raw

Temporal accumulation

The first step is to temporally accumulate like raytracers usually do except that it is also using reprojection so that it doesn't have to discard the accumulated render when there is camera movement. This gives us this result:

It's comparing both, the normals and the depth of the current and the reprojected pixel. If the difference of either exceeds a set threshold then we have a disocclusion meaning that our current pixel wasn't visible in the last frame. In the alpha channel of the render target of the TR pass I'm saving the 'age' of a pixel, namely for how many samples a pixel was visible. This age is used to blend pixels individually as pixels that were recently disoccluded need to be blended more aggressively while pixels that have been visible for a long time barely need to be blended in.

For reprojection it's using both, the velocity of a pixel and the ray length of the reflected ray in the current frame (for glossy surfaces only). Reprojection of the 'hit point' using the ray length is needed as reflections have a different parallax than diffuse lighting so we can't correctly reproject them using just a pixel's velocity. However, hit point reprojection only works well if the surface is flat (which can be determined by a pixel's curvature using screen-space derivatives) and if the roughness of the surface is also rather low.

Denoising

The temporally accumulated texture will be denoised using a smart À-trous blur filter. It will run that blur filter over multiple iterations. Since blurring using a great kernel size can be very expensive, an À-trous blur filter takes every i-th pixel into account in its i-th iteration allowing it to cover greater kernels while maintaining a decent performance. Here's an illustration showing how the blur filter selects its neighbors in iteration i:

That explains why it's called an 'À-trous' (with holes) blur filter as it always skips i neighboring pixels during iteration i. You can use varying kernel sizes for it. The kernel size in the illustration would be 9 for example (for every iteration). The blur filter is edge-stopping so that it doesn't overblur. It's using all sorts of information such as depth similarity, normal similarity, luminance similarity and roughness similarity to weigh neighbors when blurring to preserve details but still get rid of noise. You can control how much the denoiser weighs by certain similarites to trade less noise for less details for example. One of the most important functionalities is weighing a pixel through its variance, i.e. weigh it on how 'noisy' it is over multiple frames (which is determined in the accumulation pass). The blur filter weighs neighbors based on the center pixel's variance and the neighbors variance. This makes the blur filter deal more aggressively with noisy areas and evaluate changed variances from previous iterations in the current iteration by recalculating variances each iteration.

After 3 blur iterations using a kernel size of 7, we get this result: denoised This gets rid of the remaining noise and helps cover up noisy dissocclusions tremendously by denoising them more aggressively.

Video

Here's how everything looks in motion, showing the denoised, temporally accumulated output first, then just the temporally accumulated output and then the raw input frame in the end:

svgf_2.webm

Requirements

SVGF needs the following inputs:

rendered lighting only (without direct diffuse textures applied, indirect diffuse lighting is still allowed)
depth
normals

SVGF will then have the following output:

denoised lighting

You can then combine that denoised lighting with the direct diffuse textures of the materials to get the final raytraced output. Reasons why direct diffuse lighting isn't included:

for TR, we should only use info that can't be computed in the current frame, as direct diffuse textures can be rendered each frame, we shouldn't reproject them at all
the denoising pass will overblur the direct diffuse textures resulting in unsharpness and loss of details

Including direct diffuse textures when temporally accumulating was also the main reason for the smearing and temporal lag in my recent PR (https://github.com/gkjohnson/three-gpu-pathtracer/pull/241). When you want to reproject entire frames and not just lighting, you would need to use more constraining methods than just depth/normal comparison such as 'neighborhood clamping' which is the usual method to eliminate smearing for TRAA. We can't use neighborhood clamping here as it only works properly when we have the full scene information computed each frame (which is the case in TRAA). For noisy inputs like we have here it'll result in false positives, discarding correct accumulated pixels due to neighborhood clamping when the neighboring pixels have too much variance.

So I'd like to open a PR soon but have a few question regarding the implementation first:

How exactly would you combine the direct diffuse textures with the denoised lighting?
Would it be possible to store the reflected ray length in the alpha channel of your raytraced buffer? If not, then we could disable hit point reprojection and just use reprojection through a pixel's velocity.

References

Adventures in Hybrid Rendering: great explanation of SVGF and temporal reprojection
Spatiotemporal Variance-Guided Filtering: Real-Time Reconstruction for Path-Traced Global Illumination
Edge-Avoiding À-Trous Wavelet Transform for fast Global Illumination Filtering

gkjohnson commented 1 year ago

Sorry for the delayed response - I've been a little slow on pathtracer work recently but I will be putting more time into it agains soon. But this is awesome to see and thanks for the detailed write up! I likely won't have the bandwidth to become enough of an expert into SVGF but I trust your research and detail here!

A few questions and thoughts:

One of the most important functionalities is weighing a pixel through its variance, i.e. weigh it on how 'noisy' it is over multiple frames (which is determined in the accumulation pass).

This is interesting and maybe you can speak to how this might enable something like #198. Basically allow us to determine when the image has hit some predetermined noise threshold so we can stop rendering automatically at a good image or even drive which pixels we should focus our ray samples on. Ie a perfectly reflective surface that immediately terminates at the skybox would really only need 1 sample (or a few for AA) while diffuse surfaces need a lot. Knowing which pixels have effectively "settled" could enable a big performance gain.

rendered lighting only (without direct diffuse textures applied, indirect diffuse lighting is still allowed) ... as direct diffuse textures can be rendered each frame, we shouldn't reproject them at all

What do you mean by this? Are you saying you only want lighting contributions after the first ray bounce? Ie don't attenuate after the first hit? Presumably we'd include any metalness, normal, or roughness maps in the contribution as well, right? And what about specular lighting? Do we separate that out, too? I feel like there are some limitations here but maybe once you elaborate it'll become more clear to me.

Only rendering environment lighting, though, is something that's been discussed for generating light maps (#5) so maybe there's some overlap here.

depth normals

How are you imagining generating these buffers? With basic rasterization? Or outputting from the path tracing shader? I think a separate rasterization pass would be simplest first.

One other case that comes to mind is depth of field denoising. I can only imagine any kind of reprojection would break in this case but maybe a-trous denoising can help this resolve more quickly?

Are the reprojection and a-trous denoising passes separable? Ie would it be possible to architect these so users can just choose a-trous denoising without reprojection and vice-versa?

Maybe a little nitpicky but I see some edge darkening on the left side of the mesh - is there an off by one pixel issue? Or is this a natural artifact of the approach?

Thanks again for all your effort on this!

gkjohnson commented 1 year ago

Just throwing a few more notes / examples on these topics in here:

KihwanChoi12 commented 7 months ago

Hi @gkjohnson I would like to contribute an SVGF implementation.

I have implemented the temporal filter and spatial filter parts, but I would like to ask for advice on how to get the pathtracing frame buffer with the direct diffuse texture removed, and how to combine the direct diffuse texture + denoised frame buffer.

If you have any advice, I'll try to organize and post a PR. Thanks for sharing your great library!

gkjohnson commented 7 months ago

@KihwanChoi12 Awesome! I've made #511 so we can start a more dedicated discussion.

gkjohnson / three-gpu-pathtracer