Closed Leadwerks closed 2 years ago
I've implemented the bicubic texture filtering option in my GLTF-capable render at PasVulkan. Here is the result:
My GLSL code for this => https://github.com/BeRo1985/pasvulkan/commit/ef0e982f56885db28f7caf2faf5b7ef2ae760989#diff-c4116bbb50478522438a947c81c088c4cd45ef01690d383a8946177fcc131cce
Maybe this could be adapted for WebGL for the GLTF Sample Viewer?
We will have a look at it. Thank you very much for the contribution.
@BeRo1985 That looks fantastic. Thank you.
@BeRo1985 How does it look in motion? Are you seeing a lot of pixellation then?
I suspect this is the only solution that will really produce the desired result:
You can see it here in motion at Youtube, so I don't see much or even any pixellation, at least for my own visual perception.
But I'm also generating the mipmaps with special downsampler filters per compute shader with a bilinear 13-tap 2x downsampler (equals to a 36-tap non-bilinear 2x downsampler), see here . Based on these nextgen post processing slides from Call of Duty: Advanced Warfare
Ah, okay, so what we have here is cubic sampling combined with a special mipmap generation technique. I tried it with a cubic sampler (in software) and it looked awful, but I was also using vkCmdBlitImage with a bilinear filter to generate the mipmap levels.
How do you do the cubic filtering exactly? I handle both LOD levels individually, this means, I do cubic interpolate the current and next LOD levels (each with the real LOD-mipmap-sizes) and then linear-interpolate between them.
I was just using a GLSL function I found on the web, that looks similar to your code. It would be better to test with hardware cubic filtering.
I'm going to have to move on because this is unfolding into a bigger task than I was expecting, but will definitely be returning to this issue later.
I was just using a GLSL function I found on the web, that looks similar to your code. It would be better to test with hardware cubic filtering.
I'm going to have to move on because this is unfolding into a bigger task than I was expecting, but will definitely be returning to this issue later.
This could also be the problem, because all bicubic texture lookup functions from the web handle mipmapping incorrectly, which I have found at least, because they always operate only with the resolution of the first mipmap level 0, or better said, the full texture resolution for the interpolation taps, even for the mipmap-levels >= 1. Of course, this must then lead to unattractive results, as they do not handle mipmapping correctly.
My customized bicubic texture lookup function, on the other hand, operates on each of the end-effective mipmap resolutions by first operating on mipmaplevel int(floor(lod))
, then on mipmap-level int(floor(lod)) + 1
, and then linearly interpolating the results between the two individual mipmap-level bicubic interpolations via mix(l0, l1, fract(lod))
.
Was I able to enlighten you with that?
(Edit: I also suspect the cubic hardware filtering should work the same way my customized bicubic texture lookup function does, by treating each two mipmap levels individually one at a time and then interpolating between them.)
I think the biggest issue is the way the mipmaps are created. If you just do a bilinear sample for each level, it results in very blocky results as you move the camera, since the pixels "jump" a large distance when the threshold is reached. A routine like you describe that blurs each level as it downsamples it is probably required to avoid this.
@BeRo1985 What is the purpose of using a compute shader for this instead of a fragment shader? Is there any advantage?
@BeRo1985 What is the purpose of using a compute shader for this instead of a fragment shader? Is there any advantage?
See the quote from https://forum.beyond3d.com/threads/hardware-gaussian-blur.59951/#post-1972505
On modern GPUs, you should program blur kernels as compute shaders. Compute shader has access to groupshared memory, a fast on-chip memory per compute unit (64 KB per CU on AMD GPUs). With groupshared memory, you don't need to load/sample the blur neighborhood again and again for each pixel. Instead you first load the neighborhood to groupshared memory and then load data from groupshared memory for each pixel. Separate X/Y as usual. You should also do reductions directly in groupshared memory if you want multiple different radius gaussian filters. Doing multiple downsampling & combine pixel shader passes is slow, because the GPU stalls between each pass (as there's always a dependency to the last passes output). This is another important advantage of compute shader blur versus pixel shader blur.
Even if I'm using the groupshared memory stuff not yet, but I've planned it for the future.
Well, it's certainly easier than trying to set up rendering to a texture mipmap level.
In my tests just doing a texture sample and image write, compute shaders are about five times slower than the equivalent vertex/fragment shader. That's after adjusting the number of work groups to get the maximum framerate: https://www.ultraengine.com/community/topic/61078-compute-vs-fragment-shaders/
Using a conventional 2-pass blur with 8 samples in each shader, I am getting better results:
I think that cubic sample function you are using is going to have a pretty heavy performance cost. The image above is just using linear filtering, and there's no "flickering" as the camera moves about.
I'm not satisfied at all with the appearance of the refracted background with rough materials:
The obvious pixellation with higher roughness looks awful. We want something nice and smooth like this:
What do you think is the best solution for this? I can think of a few ideas:
If it works, the third option would probably be the cleanest but I can't test it right now because my laptop does not support this feature. Has this been discussed? Is there a recommended solution?