Open Zylann opened 5 years ago
Update: a section of the Tranvoxel paper mentions this problem, and describes a fix. However, it assumes we have access to high-resolution data in places where we currently don't (at least, not always), because the module does not keep all LODs in memory at all distances. I'll keep this note though, because a workaround or data redesign might be possible in the future.
While implementing Transvoxel, I came back to the terracing problem I had in my notes, and thought I would explain it in a larger post. I feel the topics I talk about here may be quite simple, but these days I like to lay down everything to be really clear as to why something happens.
In short: downscaling sampled voxels is a pain in the ass.
Beforehands, let me recall how this module deals with voxel data so far: We load voxel data around the player at different levels of detail. That's right, it's not just a visual thing. If you see half-resolution meshes, it really means voxel data is also half-resolution.
Within about 100m around the player depending on your settings, voxels are in full-resolution. Then further away, we only have half of them. Then even further away, we load only a quarter, etc. This half-resolution data is also getting saved and maintained: something super-far in the horizon will never be generated at 100% precision, until your player actually walks to that area. When the player gets closer, voxels will be generated again at higher resolution. This saves both memory and computing speed, while allowing to see very far away. There is also no trick with the data: once it's generated, it's recoverable even if you lost the seed or the code that made it. Like Minecraft worlds, what's generated is here to stay. But we'll see how it becomes a problem with LOD.
Transvoxel assumes we obtain a lower LOD by skipping half voxels in the grid, i.e "nearest neighbor" downscaling. This is mostly fine, however it comes in conflict with a problem I found earlier in a different area of this module: terracing, aliasing, or surface shifting creating unwanted "staircase" patterns in the landscape (note: don't confuse this with the very poor VoxelStreamImage
, same problem, different reasons :p).
A smooth voxel surface is represented with a distance field. Negative values are inside the surface, positive values are above the surface, zero values are on the surface. When we polygonize it, we just interpolate values to find where they cross zero, in 3D, and it gives us vertices.
Today, using this voxel module, if you generate a perfect sphere distance field and sample it in a voxel grid, you will obtain a smooth sphere if you polygonize it at LOD0. But if you take the resulting grid and downsample it, the result will be worse and worse. In fact, the more you downscale it (or generate it by only running your SDF function at larger intervals), the more it will look aliased. You could generate a whole smooth planet at LOD0, LOD5 will still look crap.
Here is the result with a slope:
So why is that? It happens because behind the scenes, we store distance fields in a lossy way. Voxels are 8 bits representing a FIXED range (they are quantized), which means the surface is represented in only a fraction of the space it would take in theory. The reason for this choice is to save memory and CPU: voxels take less space, and voxelizing shapes only needs to touch a limited area.
To see this problem in practice, let's take a look at a slice of the voxel data. If we were to use a true distance field, our sphere would look like this:
As you can see, even though our sphere surface may be only a small part of this space, representing it requires to calculate its distance field throughout the whole area (and this is also an assumption taken in many papers about voxel surfaces). But in practice, we really don't care if some voxel far in space is 978776 meters away from our planet. Computing it would be a waste of time. So instead, we store it like this, carefully choosing the values so that the distances are represented only near the zero-crossing:
Voxels only store values between -1 and 1, with 256 possible values in between. Values beyond are clamped. This makes the sphere area much more localized, and still precise enough to be polygonized correctly, and it means we no longer need to iterate the whole grid to place one.
Now, if we generate a downscaled version to obtain LOD1, here is what we get with a true distance field:
There is a problem already, but it's not obvious to see. To better understand what happens, look at our 8-bit voxels:
You can see that the visuals become jagged. And that's precisely where terracing comes from. Values have a tendency to be clamped at -1 or 1 much earlier, and further down they become only -1 or 1. Try drawing a thin gradient in an image editor, and downscale your image using nearest-neighbor quality, you'll see that the nice short gradient completely disappears:
We could think we have to use a better downscaling method, like linear filter? Wrong. It would seem methods that work so well in 2D aren't reliable with voxels, as explained in this article: https://0fps.net/2012/08/20/simplifying-isosurfaces-part-2/
One solution would be to realize that voxels at LOD1 represent distances twice as large as LOD0, which means larger numbers, it makes sense that their quantized values will clip much more:
So when we generate them, we should divide the values by 2 beforehand, and store that, instead of re-using values from the full-resolution LOD (which were already quantized). Then when we get the values again, we would multiply them back. This will make our storage range fully exploited again, resulting in less data loss.
But this has downsides. The first downside is, even if we multiply the range back, we'll end up multiplying the precision a 8-bit value has. We got rid of clipping, but precision remains lossy.
As another downsides, comes the Transvoxel dilemma. We no longer obtained LOD1 by using nearest-neighbor sampling, we actually generate it again. The assumption made by the algorithm is altered, which causes the following situations:
There are more consequences of this like the inability to downscale by solely using sampled voxels or the problem of player edits, but I wanted to bring this back to light because Transvoxel made me think about them again. None of this is a dead-end however, there are alternatives and tradeoffs, but they all have their pros and cons. For example, you could choose a relatively larger fixed-range, but you will still get terracing, only further away, while reducing precision in the near range.
Depending on the game you want to make, it may actually be enough to never store downscaled voxels, just keep them always full-resolution .
If we go along the way Transvoxel was thought to work, we would need the ability to access high-resolution voxel data in a much larger area around the player. With the current storage data structure, this is a massive memory hog. But it could actually be improved by putting the emphasis on a different storage method, taking compression much further, so that even a voxel 1000m away from the player could be obtained at full-resolution. Octrees? I'm not 100% convinced that this will allow the same visibility range and access speed though, and infinite streaming worlds still incur constraints (although, also depending on your game, you might not need an infinite world).
As suggested by Phyronnaz, another way would be to never store the generated values, and just query the generator again, which gives access to unaltered values and also allows to use the binary search proposed by Transvoxel. This requires the generator to be really well optimized and not too complex. When the world has been edited however, there is still need to have sampled data, but it would then occur much less, even allowing to choose a greater precision like 16-bits (unless you put too much editings into the same area).
After thinking about it, offering an option to make the module work in "only edits are saved" mode would be an interesting approach, in many aspects, at least for smooth terrain. I'd call this "hybrid" mode, in reference to the fact that voxel data can be retrieved under different forms.
Volumes are defined by sampled parts, and procedural parts. So far the module limited the procedural part to a tiny portion of the pipeline which was baked into sampled parts, so everything else was relying on sampled data as the only possible format. (i.e: VoxelBuffer
).
Now procedural data must remain accessible. When polygonizing, we can look at the available sampled data, and have a choice to "deep-sample" it. If enabled, we'll combine it with procedural data. The advantage is, the procedural part can be evaluated on the fly at any LOD level without needing to be stored, so a generated terrain can be given LOD at better precision.
It doesn't solve the problem with sampled data not based on procedural functions (or too complex ones), but for procedural terrains it would be a huge improvement.
What non procedural terrain is there? Heightmaps and all the procgen terrains we currently use can be queried on the fly. It shouldn't be any slower than it was during the initial build out, but faster as it will only do a few sections at a time.
I think assuming an efficient generator algorithm is acceptable. Gdscript generators are likely going to be too slow, so this means requiring C++ or at least C# if you want to make your own.
As you said it's more a matter of how fast the generator is, and also how it works. If it's just a few layers of fractal noise, it may be fast enough and optimizable. If it's a complex calculation involving erosion, tectonic plates and realistic cave systems, it won't be suitable for that, and will rather be sampled, which is as better as it can get for now. Heightmaps are interesting because 2D data could still fit in a cache for a large distance at full-res. Clearly GDScript will quickly be overloaded, but maybe there is another way to let users design their own generators. A graph-based approach could be one.
I'm wondering how to organize this for SDF, in such a way not using such technique would remain the same. Because this won't replace sampling, it's just a large optimization of it :p
Past LOD0, the sampled SDF would be seen as an "edit", and the base would be on-the-fly. They could be added together to form the result, so the sampled part can both be about adding matter (player sculptings), or removing matter (player-made cave): Base + Edit = Final
(litterally) and default of the edit layer would be 0 (128 in byte form). Problem is, if Base
is -1 (minimum) and Edit
is 1 (maximum), you don't get a cave, it's just going to be 0
, it needs to be above 0
to obtain a cave. Since an edit is a delta, it has to be Base + 2 * Edit
then, but if the base was 0
, half the resolution of the voxel can become irrelevant, since it's going to clip above 0.5
. Current edits are rather done by using min(Base, Edit)
to add, and max(1 - Base, Edit)
to remove, but no way to tell which one to use. It will have to be tried to tell^^
I still have in mind to allow 16-bit values though, to see how well it mitigates precision problems.
Also, all this remains about smooth terrain. I haven't thought of that for blocky, not sure it has benefits to get from this?
The branch transvoxel_binary_search
implements binary search sampling for Transvoxel using the generator. It's very rough, non-final architecture and only usable for non-editable terrains, as I was only interested in seeing the results quickly. It improved the look of far-away meshes, they no longer look blocky. However terracing is still visible (even on smooth slopes), which I suspect is due to how normals are computed.
I don't think I'll merge this branch since I have to work out a new voxel access API which is more thread-friendly, but I'll keep it around for reference.
LOD downscaling with smooth voxels comes with drawbacks. Typically, features of the terrain will either be lost or exxagerated. Currently, the module assumes nearest-neighbor sampling, which is the fastest method and works well in most terrain use cases.
But for regular surfaces, it's more noticeable, here seen on the "roads" of heightmap generation:
Those ridges are a limitation of the LOD method itself. I use nearest-neighbor to do this, which is fine on organic terrain, but more noticeable on hard surfaces like this. The downscaling can't represent the height of the surface as accurately, so at some speciflc heights it is offset by a lot. That could be a problem with the generator though. Also this implementation of DMC isn't good at hard surfaces anyways (since it's basically marching cubes at the moment). Some way to mitigate this would be to try alternative downscaling methods, rework the generator in use here, or have higher-precision voxels (current is 8-bit). See https://0fps.net/2012/08/20/simplifying-isosurfaces-part-2/
Visually, with nearest-neighbors, this happens (1D representation, in reality the problem exists with cubes of 8 voxels to turn into 1): Or
Although, since this terrain isn't yet editable, this problem likely happens in the generator.