KhronosGroup / glTF-Sample-Viewer

Physically-Based Rendering in glTF 2.0 using WebGL
Apache License 2.0
1.28k stars 235 forks source link

Transmission roughness is quite bad #409

Closed Leadwerks closed 2 years ago

Leadwerks commented 2 years ago

I'm not satisfied at all with the appearance of the refracted background with rough materials:

Untitled

The obvious pixellation with higher roughness looks awful. We want something nice and smooth like this: 70_glass_after

What do you think is the best solution for this? I can think of a few ideas:

  1. Build pre-blurred mipmaps levels.
  2. Take multiple samples in the render pass to blur the background image.
  3. Use cubic filter to smooth background image in render pass.

If it works, the third option would probably be the cleanest but I can't test it right now because my laptop does not support this feature. Has this been discussed? Is there a recommended solution?

BeRo1985 commented 2 years ago

I've implemented the bicubic texture filtering option in my GLTF-capable render at PasVulkan. Here is the result:

image

My GLSL code for this => https://github.com/BeRo1985/pasvulkan/commit/ef0e982f56885db28f7caf2faf5b7ef2ae760989#diff-c4116bbb50478522438a947c81c088c4cd45ef01690d383a8946177fcc131cce

Maybe this could be adapted for WebGL for the GLTF Sample Viewer?

UX3D-nopper commented 2 years ago

We will have a look at it. Thank you very much for the contribution.

Leadwerks commented 2 years ago

@BeRo1985 That looks fantastic. Thank you.

Leadwerks commented 2 years ago

@BeRo1985 How does it look in motion? Are you seeing a lot of pixellation then?

I suspect this is the only solution that will really produce the desired result:

  1. Blur the background image on the U axis.
  2. Blur the resulting image on the V axis.
  3. Instead of sampling different mip levels, mix the blurred and unblurred images using the roughness as the blend factor.
BeRo1985 commented 2 years ago

You can see it here in motion at Youtube, so I don't see much or even any pixellation, at least for my own visual perception.

BeRo1985 commented 2 years ago

But I'm also generating the mipmaps with special downsampler filters per compute shader with a bilinear 13-tap 2x downsampler (equals to a 36-tap non-bilinear 2x downsampler), see here . Based on these nextgen post processing slides from Call of Duty: Advanced Warfare

Leadwerks commented 2 years ago

Ah, okay, so what we have here is cubic sampling combined with a special mipmap generation technique. I tried it with a cubic sampler (in software) and it looked awful, but I was also using vkCmdBlitImage with a bilinear filter to generate the mipmap levels.

BeRo1985 commented 2 years ago

How do you do the cubic filtering exactly? I handle both LOD levels individually, this means, I do cubic interpolate the current and next LOD levels (each with the real LOD-mipmap-sizes) and then linear-interpolate between them.

Leadwerks commented 2 years ago

I was just using a GLSL function I found on the web, that looks similar to your code. It would be better to test with hardware cubic filtering.

I'm going to have to move on because this is unfolding into a bigger task than I was expecting, but will definitely be returning to this issue later.

BeRo1985 commented 2 years ago

I was just using a GLSL function I found on the web, that looks similar to your code. It would be better to test with hardware cubic filtering.

I'm going to have to move on because this is unfolding into a bigger task than I was expecting, but will definitely be returning to this issue later.

This could also be the problem, because all bicubic texture lookup functions from the web handle mipmapping incorrectly, which I have found at least, because they always operate only with the resolution of the first mipmap level 0, or better said, the full texture resolution for the interpolation taps, even for the mipmap-levels >= 1. Of course, this must then lead to unattractive results, as they do not handle mipmapping correctly.

My customized bicubic texture lookup function, on the other hand, operates on each of the end-effective mipmap resolutions by first operating on mipmaplevel int(floor(lod)), then on mipmap-level int(floor(lod)) + 1, and then linearly interpolating the results between the two individual mipmap-level bicubic interpolations via mix(l0, l1, fract(lod)).

Was I able to enlighten you with that?

(Edit: I also suspect the cubic hardware filtering should work the same way my customized bicubic texture lookup function does, by treating each two mipmap levels individually one at a time and then interpolating between them.)

Leadwerks commented 2 years ago

I think the biggest issue is the way the mipmaps are created. If you just do a bilinear sample for each level, it results in very blocky results as you move the camera, since the pixels "jump" a large distance when the threshold is reached. A routine like you describe that blurs each level as it downsamples it is probably required to avoid this.

Leadwerks commented 2 years ago

@BeRo1985 What is the purpose of using a compute shader for this instead of a fragment shader? Is there any advantage?

BeRo1985 commented 2 years ago

@BeRo1985 What is the purpose of using a compute shader for this instead of a fragment shader? Is there any advantage?

See the quote from https://forum.beyond3d.com/threads/hardware-gaussian-blur.59951/#post-1972505

On modern GPUs, you should program blur kernels as compute shaders. Compute shader has access to groupshared memory, a fast on-chip memory per compute unit (64 KB per CU on AMD GPUs). With groupshared memory, you don't need to load/sample the blur neighborhood again and again for each pixel. Instead you first load the neighborhood to groupshared memory and then load data from groupshared memory for each pixel. Separate X/Y as usual. You should also do reductions directly in groupshared memory if you want multiple different radius gaussian filters. Doing multiple downsampling & combine pixel shader passes is slow, because the GPU stalls between each pass (as there's always a dependency to the last passes output). This is another important advantage of compute shader blur versus pixel shader blur.

Even if I'm using the groupshared memory stuff not yet, but I've planned it for the future.

Leadwerks commented 2 years ago

Well, it's certainly easier than trying to set up rendering to a texture mipmap level.

Leadwerks commented 2 years ago

In my tests just doing a texture sample and image write, compute shaders are about five times slower than the equivalent vertex/fragment shader. That's after adjusting the number of work groups to get the maximum framerate: https://www.ultraengine.com/community/topic/61078-compute-vs-fragment-shaders/

Leadwerks commented 2 years ago

Using a conventional 2-pass blur with 8 samples in each shader, I am getting better results: large Untitled jpg 97299f641343f52c3ac8b5db361bc2bc

I think that cubic sample function you are using is going to have a pretty heavy performance cost. The image above is just using linear filtering, and there's no "flickering" as the camera moves about.