Implement nearest-depth upsampling

Looooong commented 5 years ago

The VXGI script has the property diffuseResolutionScale used to scale-down the render resolution of indirect diffuse lighting pass, which reduces pixel shading cost. Bilinear filtering was used to upsample the result. This method produces blurry effect at the edges.

This commit replaces the old filtering method with nearest-depth upsampling method.

Looooong commented 5 years ago

@jeske Here, I present to you the upsampling result when indirect diffuse lighting pass is rendered at different resolution scale. What do you think?

1/4 camera resolution

1/8 camera resolution

1/100 camera resolution

jeske commented 5 years ago

(I only have time to read the code right now.. I can try to run it in the next day or two.)

One possible problem is that I think the GatherOffsets table might be wrong here...

static int3 GatherOffsets[4] = {
        int3(0, 1, 0),                                                          
        int3(1, 1, 0),                                                          
        int3(1, 0, 0),                                                          
        int3(0, 0, 0) 
      };

If this is the right source......

https://docs.microsoft.com/en-us/windows/desktop/direct3dhlsl/gather4--sm4-1---asm-

[gather] is the same as point sampling with (u,v) texture coordinate deltas at the following locations: (-,+),(+,+),(+,-),(-,-), where the magnitude of the deltas are always half a texel.

Then the gather offsets are:

(-0.5, 0.5) (0.5,0.5) (0.5,-0.5) (-0.5,-0.5)

So I think you want this:

static int3 GatherOffsets[4] = {
        int3(-1,  1, 0),     
        int3( 1,  1, 0),     
        int3( 1, -1, 0),    
        int3(-1, -1, 0)    
      };

And just checking that this behaves the same on OpenGL, this ARB_texture_gather reference describes the order as...

T_i0_j1 T_i1_j1 T_i1_j0 T_i0_j0 which is.... (-,+) (+,+) (+,-) (-,-)

...so it's the same

jeske commented 5 years ago

Looking at the pictures some more...

I don't think nearest-depth is going to work very well for this situation. Because the depths on both sides of these corners is going to be very similar.

I think nearest-normal or a weighted combination of depth-and-normal.... And if the error-distance exceeds some threshold, fallback to a blended value. (or add the pixel to a work-list that needs to be re-traced per-pixel)

Is the surface normal available?

Looooong commented 5 years ago

@jeske Actually, I already have the texel 0.5 offset removed at the beginning of NearestNeighborTexel:

int3 texel = int3(mad(uv, sourceTexelSize.zw, -0.5), 0);

Then I cast the result to int3, which truncates the float to integer. This is the same as floor, thus, texel holds the sample position at (-,-). This is why the GatherOffsets looks like that.

Looooong commented 5 years ago

Is the surface normal available?

Yes, it is.

And if the error-distance exceeds some threshold, fallback to a blended value.

And how do we determine this threshold?

jeske commented 5 years ago

Another important issue...

It's essential that we use absolute values of distance.. so if the current distance function is signed, we need to use abs(distance(...)).

Even better would be to compute where the neighborDepth should be based on the current texel normal, depth, the neighbor coordinate, and the diffuseResolutionScale, (and possibly the subpixel location of the current pixel in the diffuse illumination texture) and then compute an error from where we expected the depth to be.

And if the error-distance exceeds some threshold, fallback to a blended value.

And how do we determine this threshold?

Good question. It's important to note that the error deltas depend on the diffuseResolutionScale. Because the smaller the indirect lighting texture, the more "off" the texel values will be.

I think a good place to start is just to compute this threshold based on the allowed amount of normal deviation per screen-space pixels. So something like:

if (bestNeighborNormalDelta > 10-degrees * diffuseResolutionScale) { // fail to blending mode }

Thresholding based on the depth value is complicated.... I think you would need to use the texel normal diffuseResolutionScale, and neighbor coordinate to project what you expect the neighbor texel depth value to be. Which sounds complicated.

The more I think about this, the more I think getting fancy about "upscaling" is the wrong approach.

I would instead look at using a pixel-accurate fallback... Using approximate lighting for some pixels, and pixel-accurate lighting for others.

This could be done by

(a) looking at a threshold and putting pixels into a fallback work-list for a "pixel-accurate lighting pass" (b) do a binning pre-pass, binning screen-space tiles based on whether the G-buffer normals and depth appear to be suitable for a lighting approximation over the whole tile (mostly uniform normal).. and then handling the two screen space tile types separately. (c) or a combination of both!

As for finding the best approximate texel... I'm thinking it looks something like this:

float normal_factor = 0.2;  // no idea what this should be
float bestNeighborDepthDelta = abs( distance(targetDepth, neightborDepths[i])  );
float bestNeighborNormalDelta = angle_delta(targetNormal, neighborNormals[i]);  // always positive
float texel_delta = bestNeighborDepthDelta +
                              bestNeighborNormalDelta * normal_factor;

Where angle_delta() returns the angle between the normals in radians or normalized on a 0.0 to 1.0 scale...

Looooong commented 5 years ago

Screenshot from 2019-06-07 13-34-42

I don't think that this is the expected result. I don't know where I do it wrong. This implementation provides a good example on how this should be done.

jeske commented 5 years ago

I got this branch running to look at the problem.. The way the artifacts move when the camera moves gives some hints to the issue, because it suggests it's related to how the coordinate is falling on the low-rez texture.. One thing that might help understand what is happening, is for you to draw the boundaries of the low-rez lighting pixels on top as gridlines.

I tried to experiment with the code.. but I don't understand how to get Unity to reload the shader, or where the shader errors show up. So I got stuck. If you have any tips it would help me work on it.

Let me explain what I'm thinking based on what I saw...

My current theory is that a single Gather won't work for this for two reasons. (a) The current coordinate actually lies inside some low-rez pixel. Because Gather only loads 4 adjacent pixels, this limits us to loading (up OR down) and (left OR right) from the current low-rez lighting pixel. This decision is being forced on us by the sub-pixel offset. (b) sometimes the coordinate will fall in a low-rez lighting pixel with a bad value, such as near a corner. In these cases we might want to ignore this badly sampled "current" lighting pixel with a wrong normal and look at the pixels around it, but a single 2x2 Gather won't let us.

If the code is working correctly, we should be getting gathers that look like this below, where "o" is the low-rez pixel our current pixel lies in. This is obviously going to be a problem when we need a lighting value on the other side of the current pixel.

         - - - -       - - - -     - - - -     - - - -  
         - x x -       - o x -     - x o -     - x x - 
         - x o -       - x x -     - x x -     - o x - 
         - - - -       - - - -     - - - -     - - - -

What we want, is to sample the low-rez lighting pixels around the current low-rez lighting pixel. Ideally we would sample like this:

     - - - - - 
     - x x x - 
     - x o x -
     - x x x - 
     - - - - -

We could approximate this with two four-pixel gathers, like this, though we wastefully fetch "o" twice. We could even choose which orientation to sample based on which quadrant of the "o" pixel the coordinate lies in.

         - - - - - -                - - - - - -
         - x x - - -                - - x x - -
         - x o x -                  - x o x - 
         - - x x - -                - x x - - -
         - - - - - -                - - - - - -

Or we could use something like GLSL textureGatherOffsets to specify exactly which four offsets we want from each gather, so we could get 8 pixels of the 9 in the ideal set.. except I don't see an HLSL version of this function.

Looooong commented 5 years ago

Okay, so I know what I did wrong. I didn't assign the low resolution depth buffer with values 😅 Here are the new results, at 1/100 camera resolution.

Near

Capture

Far

Capture

Looooong commented 5 years ago

I tried to experiment with the code.. but I don't understand how to get Unity to reload the shader, or where the shader errors show up. So I got stuck.

Unity will automatically reload the shader whenever you edit the shader and then focus on Unity window.

If you have any tips it would help me work on it.

Well, Unity has Frame Debugger. I suppose that you know it. I just fiddle with the texture values, for example, find the differences between 2 depth textures, and output the result to a render target. Then, I take a look at the frame debugger to see that render target. That's how I know where the problem is.

jeske commented 5 years ago

Those pictures look much better.. The artifacts you see might be because of the quad-sample issue I explained above.. because it will cause issues depending on where the edge crosses the low-rez pixel boundary. If you see the artifacts move when you move the camera, I suspect that issue.

It reloaded the shader once for me.. but then I made a mistake, so the shader had errors, and I couldn't find them and I couldn't get it to reload the shader. Then I tried to build-and-run and it started showing the shader in the game window, but not in the scene window, and now I can't figure out how to get the shader running in the scene window again. I'll try some more. Sorry, I don't know Unity very well.

Looooong commented 5 years ago

As for finding the best approximate texel... I'm thinking it looks something like this:
float normal_factor = 0.2;  // no idea what this should be
float bestNeighborDepthDelta = abs( distance(targetDepth, neightborDepths[i])  );
float bestNeighborNormalDelta = angle_delta(targetNormal, neighborNormals[i]);  // always positive
float texel_delta = bestNeighborDepthDelta +
                              bestNeighborNormalDelta * normal_factor;
Where angle_delta() returns the angle between the normals in radians or normalized on a 0.0 to 1.0 scale...

This is actually a good idea. Better yet, we can use dot product between the 2 normals. And then factor the depth distance with the dot product like so:

distance(targetDepth, neighborDepths[i]) / dot(targetNormal, neighborNormals[i]);

The downside to this is that we also need an additional render texture to store the low resolution normals. Or we can infer the low resolution normal based on high resolution normal and the low resolution texel position, but this is less robust and more error prone.

Looooong commented 5 years ago

After thinking and testing for awhile, I came to the conclusion that both our methods of using normals to apply additional weights to the distance function is quite wrong. There are instances where they won't work out.

Anyway, I feel satisfied with the current algorithm. It produces nice result at 1/4 and 1/8 camera resolutions, which are the practical use cases.

jeske commented 5 years ago

It's great you have a good result. Here are some additional thoughts should you ever revisit this. When I get a chance I'll experiment a little more.

distance(targetDepth, neighborDepths[i]) / dot(targetNormal, neighborNormals[i]);

I see where you are going with this, but this seems like it is over-weighting the normal angle deviation contribution relative to distance. This could make a far-off distance value with a more similar normal win over a close distance value. I was suggesting including the normal-angle deviation as a very small contribution to just try and nudge similar distance values to prefer the one with a similar normal. Something more like:

distance(targetDepth, neighborDepths[i])  + 
   (1.0 - dot(targetNormal, neighborNormals[i])) * normal_weight;

The downside to this is that we also need an additional render texture to store the low resolution normals. Or we can infer the low resolution normal based on high resolution normal and the low resolution texel position, but this is less robust and more error prone.

Ahh yes, we would need the low-rez sampled normal for this to be accurate. How is the low-resolution lighting sampled? Does it use the Gbuffer normal data? It seems like it would be sufficient to test the high-rez normal at the center of the low-rez texel. (and using dot-product we'd need to test to make sure we didn't divide by zero)

After thinking and testing for awhile, I came to the conclusion that both our methods of using normals to apply additional weights to the distance function is quite wrong. There are instances where they won't work out.

Yes, there are instances where normals will not work and instances where distances will not work, and even instances where using both will not work. This is the challenge with low-rez approximations.

(a) Near a corner, looking at only the depth might pick the wrong side of the corner, because the depth on the opposite side of the corner might be closer. I suspect this is the cause of some artifacts in your "far" example.

(b) For geometry edges, several surrounding values might be far away in the background, in which case looking at only normals would be flawed, and depth must be used.

(c) For objects smaller than the low-rez texels, all methods will fail, because there are no valid low-rez texels to sample from.

(d) All of these are limited by Gather only sampling 4 texels, as I explained earlier.

(a) and (b) drove my idea to try to do a combination error, where we bump the distance value slightly by the normal angle deviation, as a way to try and prefer the similar distance value that also has a similar normal. and (c) drive my idea to look at a threshold to declare the low-rez sample was invalid and we would either blend values as a fudge, or compute a new value in an additional selective per-pixel fixup pass, and (d) drove my idea to do two gather samples.

However, if you have good enough results now, these all may be unnecessary extra work.

Looooong / Unity-SRP-VXGI