Open jeske opened 5 years ago
picking up from issue#5.
I added a wiki page for VXGI Research Resources.
The Voxel performance optimizations in Tommorw's Children look interesting, and they have some nice looking soft reflections and AO-ish shadows... Though it doesn't look as nice as NVidia VXGI 2.0. I don't yet understand the limitations and tradeoffs. The quality of these captures is low because they are snaps of youtube videos...
Their voxel cascades are very similar to the 3d Clipmap used in NVidia VXGI. The difference is, each of their cascade levels is independent, where in the 3d clipmap levels are joined together. Their technique gives them the flexibility to reposition and recompute "only one cascade per frame". Wheras I don't think that's possible with a 3d clipmap. I believe 3d clipmap has better cache locality, at least for the smaller levels.
In torodial addressing... The main constraint is that voxel volumes have to be axis-aligned. They don't say whether their voxel volumes are always axis aligned, but I don't see an immediate reason this would be a challenge for their technique. I think torodial addressing has the most return when dynamic lighting and voxel occlusion changes can be localized within the scene, so only changed regions and the torodial-border-changes are re-voxelized.
I'm especially intrigued by their precomputed "far" cone tracing.
I decided to try calculating another set of texture cascades, one for each of our 16 directions, that we could periodically fill with pre-calculated results for cone tracing the back half of a cone from the center of each voxel......This data essentially represents that far away lighting that hardly alters with changes in the local viewer position. It was then possible to combine this data, which we could just sample from our texture, with a full cone trace of the "near" part of the cone. Once we correctly tuned the distance for where we transitioned to the "far" cone data (1-2 meters proved to be sufficient for our purposes), we got a large speed up with very little impact on quality.
Back to VXGI...
Another useful resource is Armory3d , which also has an open-source VXGI implementation. (github)
When I was talking about Nvidia VXGI / VXAL soft shadows / occlusion, I was referring to how their emission area lights have soft shadow (which I think comes from voxel occlusion, not shadow maps) ... such as under the car here......
... WIth SRP-VXGI, the light from this pink emission cube is not occluded by the red slab.
Does the voxel data store opacity / occluders?
The voxel cascade from The Tomorrow Children does join together inside a same volume texture. I read that from their GDC presentation slide (the link is dead recently). The texture volume is 32 x (32 6) x (32 6), where 6 is both the number of faces of a voxel and the number of levels of cascade. Fortunately, there is a GDC Vault video, which includes the presentation slide.
About the occlusion, either the object need to be big enough or the voxel resolution high enough, so that the light doesn't bleed through object.
As you can see here, the plane on the ceiling is actually an emissive object. The cube occludes part of light which produce a soft shadow on the wall on the left. I will continue to improve the quality with anisotropic voxel.
This is with voxel resolution on "Very High" and I the occluder huge. The light still bleeds to the backside of the object and the floor... Is there a quick way to visualize the voxels?
Here is a link to the old Childred of Tomorrow GDC presentation slides from the Internet Wayback machine..
To visuallize the voxel, on the VXGI script that attached to the main camera, select "Mipmap" from Pass.
Regarding Texture Cascades vs 3d Clipmap, one important issue, regardless of the storage layout, is that the positioning of the volumes needs to put most of the effective volume out in front of the camera...
This picture from the Children of Tommorow describes the situation well.. (and it does look like they use axis-aligned voxel volumes)
On the topic of toridial addressing, it is definetly compatible with the Children of Tommorow techniques. (and they might even be doing it)
On slides 42-44 of the GDC talk, they describe their way of handling movement as "scrolling" the volumetric data, so they can re-use existing data and only re-render the edges (and any changing regions inside the volume). Toridial addressing is just a means to do this scrolling without moving any data. Their picture on slide 43 suggests they are actually copying the data, but the result is mostly the same, and this might just be an illustration picture.
Good to know. I will find a way to implement voxel cascade soon.
Does the voxel data store opacity / occluders?
Well, it stores both emittance (rgb) and opacity (a).
More about visualizing the voxel, you can change Mipmap Sampler mode and Level property to choose which Mipmap level you want to visualize.
As far as I can understand, clipmap is a mechanism to load a part of the mipmap data into the memory, where the other part get "clipped" out. This is useful for LOD and terrain rendering.
I don't think that the clipmap is applicable to voxel cone tracing though. Because we need the entire lighting environment to calculate every possible light paths to any of the voxel within the volume. In this case, I think voxel cascade is good candidate for this problem.
As for Toroidal Processing, yes, I did implement this as CSShift
(at that time, I didn't know that it is Toroidal Update 😅). Indeed, we do need this for translating the lighting environment whenever the camera moves.
You can test this implementation by enable mipmap visualization and check Follow Camera
on the VXGI script, or either write your own code to update VXGI.center
(of the voxel volume) accordingly, or just simply change VXGI.center
in the inspector during Play Mode.
In your CSShift
you Manually COPY the data, which takes GPU cycles and memory bandwidth (though I don't know how much).. Torodial addressing allows you to shift the center using math on your address lookups, without copying or moving any data. It is described in the Clipmap Paper, where they explain how "The overlap areas are unchanged and require no update." The red areas they talk about "loading" are the areas that are re-voxelized when the camera moves, the grey areas are unchanged and stay right where they are.
I think you are confused a little about clipmaps. One use of clipmaps is loading part of a mipmap into memory. However, a clipmap is just a texture shape and addressing mode. A clipmap is a layered pyramid like a mipmap, but where the larger layers have a maximum size. For example... With mipmap layers, you might have 4x4,16x16, 32x32,64x64,128x128.. but with a clipmap you can say the largest size is 32x32, and you get 4x4,16x16,32x32,32x32,32x32.
NVidia VXGI uses 3d clipmaps, because if they were using a traditional mipmap like pyramid, the voxel size would explode. You can see them explaining this on page-12 of the NVidia VXGI slides. The coarse levels of the clipmap (LOD2-4) represent areas far from the camera, while the fine levels (LOD0-1) represent areas near the camera. They don't need "fine detail" far from the camera, so this naturally fits the clipmap representation.
By controlling the "Clip Center" of each clipped level, you can control where LOD0 and LOD1 are relative to LOD2. My understanding of the reason NVidia used clipmaps is to benefit from hardware-addressing modes present for clipmaps. However, you can do the same thing with your own code and texture cascades.
This Clipmap Paper explains the addressing, Clip Center, and torodial addressing... but you have to ignore the wording that talks about virtual texturing and loading part of a larger texture, because that is not how NVidia uses clipmaps in VXGI.
For example, given 1 element inside the LOD 0 of the clipmap, if I have to calculate indirect lighting incoming from the region outside the clipped part of LOD 0, I would have to traverse up to LOD 1 or higher to take the emittance sample, is that right?
I kinda understand the toroidal addressing now, everything in the "same" part remains untouched, I only update everything else, then set texture offset to the new origin address for every LOD, right?
Quoted from NVidia:
Our version uses a clip-map, which is similar to a cascaded texture.
Which implies that these 2 things are different. In my opinion, I would prefer cascaded texture over clip-map. Because, with clip-map, we have to maintain both the pyramid mipmap and the clip-map caches. Whereas with cascaded texture, eveything can be contained inside a single 3D texture. Both methods have the ability to display coarse details at far distance while maintaining fine details near the camera, don't they?
Using voxel cascade with toroidal addressing has a downside which we have to implement custom texture filtering, because we cannot use repeat UV sampler state to take texture sample at the edge of a cascade level, since each cascade level is placed next to each other inside a sample 3D texture.
Here, I send you the processing time of each stages on my machine (in milliseconds) that I have taken few months ago.
Voxel volume | 653 | 1293 | 2573 | 5133 |
---|---|---|---|---|
Voxelization | 0.346 | 0.369 | 0.423 | 0.592 |
Voxel Shading Voxel Cone Tracing | 0.155 | 0.330 | 1.007 | 3.878 Voxel Data Aggregation | 0.018 | 0.140 | 1.043 | 8.657 Mipmap Filtering | 0.027 | 0.085 | 0.402 | 2.848 Pixel Shading G-buffers | 1.050 | 1.075 | 1.064 | 1.046 Diffuse Tracing | 13.024 | 15.356 | 17.485 | 19.920 Reflection Tracing | 2.855 | 2.556 | 3.301 | 4.514 Total | 17.475 | 19.911 | 24.734 | 41.455
The voxel data aggregation is the stage where multiple voxel values, that overlap the same voxel position, get averaged. The current issue is cone tracing, because many texture read calls. I implemented cone tracing as 32 directions distributed around a sphere (twice the amount of The Tomorrow Children).
I suppose that the number of directions should be reduced to 16. The number of texture reads would be reduce drastically, more than half of them. Not only becaused of halved number of directions, but the number of samples taken during a single cone tracing would be reduced because of cone size increasing.
I would prefer cascaded texture over clip-map. Because, with clip-map, we have to maintain both the pyramid mipmap and the clip-map caches.
You have the same data in GPU memory whether it is a manually packed texture cascade or a clipmap as a 2D texture array.
I believe you want to use the clipmap/2D-texture-array method, so the hardware can do the toroidal multi-sampling for you.
This is why clipmaps use 2D texture-arrays to store the clipmap stack.. so the the hardware can automatically do multi-sampling over the toroidal wrapped edge by using the "wrap" texture addressing mode. You can see more details in this description of an NVidia Clipmap implementation.
In DX10, this looks like:
If you pack texture-cascade levels yourself into a single large texture, the hardware can't automatically "wrap" the coordinate for you, so you get to do the sampling yourself.
I don't know how much faster it is to let the hardware sample over the wrapped edge vs taking you own samples, but since Nvidia built the hardware I suspect there is a performance advantage to doing it their way. The wrapping logic is probably built into the hardware multi-sampler, wheras if you code it yourself it's going to stall all the GPU threads if any of them have to wrap.
Also, AFAIK, you don't compute mip-maps for each level of the clip-map stack. If you want to go up a level, you just move up the clip-map stack..
For example, given 1 element inside the LOD 0 of the clipmap, if I have to calculate indirect lighting incoming from the region outside the clipped part of LOD 0, I would have to traverse up to LOD 1 or higher to take the emittance sample, is that right?
Yes. Which is exactly how Tomorrow's Children texture cascades work also..
I kinda understand the toroidal addressing now, everything in the "same" part remains untouched, I only update everything else, then set texture offset to the new origin address for every LOD, right?
Yes.
The clipmap paper does mention the "texture memory cache":
3.2 The Anatomy of a Clipmap A clipmap is an updatable representation of a partial mipmap, in which each level has been clipped to a specified maximum size. This parameterization results in an obelisk shape for clipmaps as opposed to the pyramid of mipmaps. It also defines the size of the texture memory cache needed to fully represent the texture hierarchy.
which mean that the clipmap acts as a partial cache for the mipmap, doesn't it? Doesn't that mean we need to implement the code that transfer data from the mipmap to the clipmap in Unity?
If you pack texture-cascade levels yourself into a single large texture, the hardware can't automatically "wrap" the coordinate for you, so you get to do the sampling yourself. I don't know how much faster it is to sample over the wrapped edge vs taking you own samples, but since Nvidia built the hardware I presume there is a performance advantage to doing it this way.
I'm thinking the same.
"texture cascade" is just an imprecise term that could mean just about anything with multiple textures. Every clipmap is a texture cascade, but every texture cascade is not a clipmap.
Well, texture cascade I'm discussing here is the one implemented in the Tomorrow Children, which is composited of multiple levels of voxel texture with the same resolution. Whereas, clipmap is a clipped representation of the mipmap, where higher level has the same or less resolution than the lower. We still have to store the full representation of the mipmap somewhere.
You have the same data in GPU memory whether it is a manually packed texture cascade or a clipmap as a 2D texture array. I believe you want to use the clipmap method, so the hardware can do the multi-sampling for you.
What happend if I want to take a sample in 3D space, like trilinear filtering between 8 points?
which mean that the clipmap acts as a partial cache for the mipmap, doesn't it? Doesn't that mean we need to implement the code that transfer data from the mipmap to the clipmap in Unity?
No. The clipmap is not a partial cache of the mipmap in VXGI. It is just a texture-addressing mode.
Let me try to explain this without using the term "clipmap" at all....
The NVidia method is to pack "multiple levels of voxel texture with the same resolution" using a texture-type called a "2D texture array", and then to use the "WRAP" addressing mode to get the hardware to automatically and efficiently perform toroidal multi-sampling over the wrapped edge of a single level. Look at my previous post again.. I edited it to add code samples of 2D texture arrays.
Here is the documentation of Unity 2D texture arrays. Then you use the TextureWrapMode.Mirror
to automatically get toroidal wrapping around the edge of each layer of the 2D texture array.
Do you undestand? There is no caching. They are using 2D Texture Arrays, to get hardware wrapping support to do efficient toroidal addressing of their toroidal voxel textures.
They are using the term "Clipmap" just to explain the technique, because this is also how you implement clipmaps. This NVidia Clipmap implementation paper shows the texture addressing mode and how they get the hardware to do their wrapped toroidal sampling.
The Tommorow's Children method is pack "multiple levels of voxel texture with the same resolution" using a single 2D texture atlas. When you do this, if you want to use toroidal addressing, you have to perform your own samples.
The NVidia method is to pack "multiple levels of voxel texture with the same resolution" using a texture-type called a "2D texture array", and then to use the "WRAP" addressing mode to get the hardware to automatically and efficiently perform toroidal multi-sampling over the wrapped edge of a single level.
Yes, I totally understand the sampling method with wrap addressing mode. The 2D texture array is used in terrain rendering system. But since we are performing cone tracing in 3D space, I suppose that they are using array of 3D texture, not 2D, with wrap addressing mode of course. We need to mark the voxel volume as 3D texture, so that the sampler will perform bilinear texture filtering, which is trilinear interpolation in 3D space. Using 2D array will only perform bilinear interpolation in 2D space.
But since we are performing cone tracing in 3D space, I suppose that they are using array of 3D texture, not 2D, with wrap addressing mode of course.
Yes... I think that is likely.
But hey, let's stop arguing about the clipmap and stuff. It has been my decision that I will implement voxel cascade.
Here is the current implementation (except for dynamic and static objects, these 2 are currently indistinguishable):
The 3D Radiance Map is the original emittance and occlusion information (essentially mipmap at level 0). The 4D Radiance Map is the mipmap filtered from the 3D Radiance Map. We need buffers to store both 3D and 4D Radiance Map, because 4D Radiance Map from the previous frame is needed to construct 3D Radiance Map in the next frame.
Here, toroidal addressing won't ever be used, because every frame, every voxels occupied in the scene is traversed, combined with 4D Radiance Map, in order to construct the new 3D Radiance Map.
Now, with voxel cascade, this structure stays mostly the same. Only things changed are the way we take anisotropic voxel sample, and the way to construct 4D Radiance Map from 3D Radiance Map. Because 4D Radiance Map is changed from a pyramid of multiple 3D textures to a monolithic 3D texture.
The bottlenecks here are Voxel Shading and Pixel Shading because of intensive texture reading operation.
After this, we will worry about distinguishing dynamic objects from static objects, and partial cascade updating/scrolling.
Sounds great!
I just wonder if "Radiance Map" is the correct term in this context. It contains the lighting environment within a specific bounding volume, which comprises of emittance data (RGB) and occlusion data (A).
Do the cascades each have mipmap levels?
I thought the cascades acted as the mipmap levels, because they essentially contain the same data. Cascade 1 is the finest resolution, but it doesn't cover the whole scene. Cascade 2 essentially serves as both the mip-map of Cascade 1, and a voxel map covering a larger scene area. Did I understand that incorrectly? Does each cascade also have mipmap levels? That seems like it would contain lots of redundant data.
As for the name "Radiance Map".. I don't have an opinion. Maybe "Voxel Surface Map"
Some other useful tidbits about UE4 VXGI in these slides... looks like they re-voxelize every frame, so maybe re-using and toroidal addressing is not that important:
https://www.dropbox.com/s/6ptemn6vbvx7bry/UE4_VXGI_Overview.pdf?dl=0
Maybe because of this??
Do the cascades each have mipmap levels?
Well, they don't. The basic idea of cone tracing is to start from mipmap level 0, and traverse up the mipmap level as the cone size getting larger. The fomular for calculating the level is:
level = log2(voxel_cone_size)
Where voxel_cone_size
is the ratio between cone size and voxel size at mipmap level 0 in world space. The sample applies to cascade level.
What happen if we start tracing outside cascade level 0 but inside cascade level 1? Then we just ignore level 0 and start from level 1. Thus, the level
and voxel_cone_size
definition are changed accordingly.
voxel_cone_size = world_space_cone_size / starting_level_world_space_voxel_size
level = log2(voxel_cone_size) + starting_level
BTW, another GPU programmer I'm talking to, Sean Boettger, had a good naming idea... "Reflectance Volume".
Im my thesis, I refer to this as lighting environment or lighting information.
I think that I'm close to releasing v0.0.1. There are a few issues left:
I'm also thinking of renaming this project to something different than VXGI, because I feel that this name belongs to NVidia. Here are few candidates:
What do you think?
Yo i'm a tech-artist from france and i'm interrested by your project "VSGI/VSRP/VUGI" depend of what you chose, i'll follow you're project. If you need some AAA assets to showcase your demo, i'll be glad to give you some stuff (mesh/vfx etc) for a test scene! So VGXI is a custom SRP or it can be add to the two existing SRP (universal/HDRP)? Actually i am enhancing LWRP at work but no realtime GI solution! awwww. Keep me in touch :)
Hi, I'm appreciate your support. I think it will take a month or two for this SRP to be fully functional. So, stay tune, and watch out for breaking changes.
About extending existing SRP, yes, it is technically possible. But it will take time doing so, as there are a lot to cover.
You're probably aware by now, but Unity has an API now for getting the actual ComputeBuffer reference and doing a complete GPU side reassignment of it if needed or just read from it. No need to copy data to/from CPU side.
https://forum.unity.com/threads/feedback-wanted-mesh-compute-shader-access.1096531
@Invertex
No need to copy data to/from CPU side.
Can you elaborate more about the data that are copied from the CPU side? What kind of data are they? Is it about the voxelization stage that tranforms meshes to voxels?
@Invertex
No need to copy data to/from CPU side.
Can you elaborate more about the data that are copied from the CPU side? What kind of data are they? Is it about the voxelization stage that tranforms meshes to voxels?
It was in regards to the discussion happening here https://github.com/Looooong/Unity-SRP-VXGI/issues/5#issuecomment-497574383
This is to use as an open discussion thread.