What optimizations are used in this module?

dark-penguin commented 4 years ago

I am trying to figure out why is this module so much faster than a "normal" GDScript implementation that does the same thing with SurtaceTool or ArrayMesh. For example, this: https://github.com/orosmatthew/Voxelverse

(Here I'm talking about the "blocky" terrain only.)

I understand that GDScript is slow, so the code itself will run faster - including terrain generation. But once everything is generated, there is supposedly the same thing! And yet terrain generated with this module is running at least twice as fast as Voxelverse, even at a higher draw distance!

Apparently, there is a number of optimizations; now I wonder what exactly, and whether those are possible to pull off in GDScript. (I can read GDScript, but I'm having trouble finding my way around C++ sources here.)

Both implementations draw only visible faces and do not draw faces in between blocks, naturally.

If I'm reading correctly, Voxelverse creates one mesh per 8x8x8 chunk, with only one surface for the whole mesh. It does not seem to use any "greedy meshing", so it's two polygons per each block face. Could it be that the gigantic performance boost is only due to "greedy meshing"? I find that hard to believe, especially seeing that the terrain generated in the demo for this module is all mountains - there are barely any flat faces more than one block in size.

According to https://github.com/godotengine/godot/issues/7844, "mesh batching" is also used - which is the only other thing I can think of. But is it not the same as merging faces into a single surface per chunk? Maybe this module merges all of them into one mesh? But then does it rebuild the whole gigantic mesh in a split second every time a block is added or removed? Or maybe the chunk size is much larger, so there are fewer meshes?

Or maybe it's due to collision? As far as I can see, Voxelverse generates a collision mesh that is the same as the rendered mesh. And I remember that collisions are not very fast, so there is probably room for optimization. But I don't see the demo doing anything special in terms of a rewritten collision system, so it's the "normal" Godot collision as well, isn't it?

By the way, it also uses significantly less memory. What, does it use some kind of magic mesh that is not a normal Godot mesh, and therefore impossible to create in "normal" Godot?.. Oh, maybe it doesn't add each chunk as a MeshInstance? But you can't add a separate mesh for each chunk without either having one MeshInstance for each chunk, or rebuilding the whole mesh on every block modification...

Zylann commented 4 years ago

The implementation in this module is more or less fancy mesh batching, with some extra effort to still cull faces of non-cube shapes. There is no greedy meshing, it really is close to Minecraft in that sense. Fast to generate and simple code, working for any kind of shape (but obviously working best with cubes).

maybe the chunk size is much larger, so there are fewer meshes?

That plays a big role I think. Godot's culling system gets very slow with a lot of chunks. 16x16x16 is a sweet spot between culling performance and edition speed. 32x32x32 is probably a bit faster if you don't mind slower chunk updates. There is still room for better culling but Godot doesn't have an API to implement such techniques without modifying VisualServer.

Or maybe it's due to collision? As far as I can see, Voxelverse generates a collision mesh that is the same as the rendered mesh. And I remember that collisions are not very fast, so there is probably room for optimization.

This module offers two alternatives for collision with blocky terrain:

Godot's collisions, which as you said, uses the same mesh as the one used for rendering. If you use Bullet, it contains a workaround to speed up a Godot flaw which you can see here: https://github.com/Zylann/godot_voxel/issues/54
As seen above, classic colliders are slow to produce, so the module offers a quick helper to use Minecraft-like physics, called VoxelBoxMover. You probably missed it, because there goes a "rewritten" collision system :D It's so simple and fast it can even be done in GDScript, but I made it in the module anyways in case it's used a lot. This is used by the demo project.

By the way, it also uses significantly less memory. What, does it use some kind of magic mesh that is not a normal Godot mesh, and therefore impossible to create in "normal" Godot?

There are several simple things this module does to optimize memory. Most of them can be done in GDScript too.

The module doesn't use nodes at all. Only VisualServer and PhysicServer APIs. This clears up the overhead of the nodes layer.
Uniform chunks get compressed down to a single value (i.e "only air", "only dirt" etc). So they don't even allocate memory. Besides, chunks aren't even Object so they hold the minimum required.
Chunk data uses channels of configurable bit depth, and defaults to 1 byte per voxel. In GDScript, a variable weights 24 bytes minimum, and integers are 8 bytes. You'd have to use a PoolByteArray to pack data the same way, but if you need 2-bytes it gets a bit complicated.
Voxel memory is pooled, so creating/loading/destroying chunks is pretty fast because they almost always are the same size.
Of course, empty chunks don't generate a mesh or collider.

C++ makes things really fast otherwise. I can line up the data efficiently for the CPU to crunch more quickly than GDScript, with less overhead and fewer allocations. Besides, I use 2 threads at once to produce visual meshes, and a third thread to generate voxel data. The rest (mesh instances and colliders) being on the main thread due to Godot not supporting these things being done in a thread.

dark-penguin commented 4 years ago

Thank you for such a detailed answer!!

I already saw that bug report about Bullet collision mesh being slow to generate. But after it's generated, is it slower to check collision against, compared to calculating collisions manually in GDScript instead of using Bullet?

I understood that using VisualServer directly allows us to add a mesh to the scene without creating a Node (MeshInstance). That will save some memory for a Node object, but does having less Nodes also increase performance? Is this the only difference (other than batching) between using VisualServer and "doing it the normal way"?

And finally, "fancy mesh batching". I know this is the most significant factor for a huge FPS increase, but I could not find any information at all about how it works with Godot. Is there anywhere I could read about it, or could you explain how did you do it? I do not see any way to "group" meshes by material to indicate they should be batched. I would assume they are drawn in the same order they were created, but there is no way to control this order on chunk rebuild. I don't suppose all of them are automatically split into surfaces and then all those surfaces are grouped by material?.. Are they even drawn in one call, or simply drawn without changing the state? What "fancy" things did you have to do other than simply create meshes via VisualServer instead of MeshInstances?

Zylann commented 4 years ago

after it's generated, is it slower to check collision against, compared to calculating collisions manually in GDScript instead of using Bullet?

That has nothing to do with GDScript. Once a collider is generated, its efficiency only depends on the physics engine (assuming the geometry is the same).

does having less Nodes also increase performance?

Less objects to manage leads to more performance because they dont need to be created, enter tree, leave tree or be freed, but it mostly reduces memory usage.

And finally, "fancy mesh batching". I know this is the most significant factor for a huge FPS increase, but I could not find any information at all about how it works with Godot. Is there anywhere I could read about it, or could you explain how did you do it?

I don't know about Godot specifically, but I've been using that technique several times for a while in various engines. It's notably described here: https://0fps.net/2012/01/14/an-analysis-of-minecraft-like-engines/ Then for mesh creation, I use ArrayMesh: https://docs.godotengine.org/en/stable/classes/class_arraymesh.html

I do not see any way to "group" meshes by material to indicate they should be batched.

Godot doesn't do batching on its own. The fact I combine meshes into one or more surfaces into an ArrayMesh is what does it. This module allows you to define voxel types, each with geometry of its own, which is then sliced between sides and insides, and added together when building chunks. It's down to combining vertex arrays in the same way shown in the doc.

I would assume they are drawn in the same order they were created, but there is no way to control this order on chunk rebuild.

Order doesn't matter, I don't know why you'd need this.

I don't suppose all of them are automatically split into surfaces and then all those surfaces are grouped by material?.. Are they even drawn in one call, or simply drawn without changing the state? What "fancy" things did you have to do other than simply create meshes via VisualServer instead of MeshInstances?

If a chunk has 2 materials (i.e opaque and transparent), then it will have two surfaces. That happens usually when voxels composing it have geometry including such materials (opaque voxels go into first vertex array, transparent voxels go into the second). One surface in one chunk corresponds to one draw call, so in such a chunk, you'd have 2 draw calls.

dark-penguin commented 4 years ago

after it's generated, is it slower to check collision against, compared to calculating collisions manually in GDScript instead of using Bullet?

That has nothing to do with GDScript. Once a collider is generated, its efficiency only depends on the physics engine (assuming the geometry is the same).

I mean, is Bullet collision detection slow compared to a custom GDScript implementation of voxel-based collision? (Something simple like "push the player out if there is a voxel here".)

About batching, I thought you were further optimizing it in some way, because you're up to ten times faster than Voxelverse, which is supposed to do the same thing. But upon closer examination, seems like that's not what it's doing, so I'll examine it more closely. Thank you again!

Zylann commented 4 years ago

I mean, is Bullet collision detection slow compared to a custom GDScript implementation of voxel-based collision? (Something simple like "push the player out if there is a voxel here".)

VoxelBoxMover is much faster than Bullet. Its strength is also its weakness: it only cares about AABBs and get them on the fly rather than building everything upfront, which turns out to work really well with blocky terrain.

NuclearPhoenixx commented 6 months ago

What about occlusion culling, I don't think it's implemented, right? I imagine in a voxel game like Minecraft/Minetest it would sometimes significantly boost FPS depending on the player's location. Just wandering around the world with a lot mountains or exploring caves would result in most of the surrounding world not needing to be rendered at all (especially with high render distances). Although probably at the cost of slightly reduced FPS if you're seeing most of the world anyways and of course higher memory usage due to all the overhead.

If it isn't implemented right now, I'm sure you have an important reason why that is. This might have been already mentioned in other issues, but I couldn't find anything.

Zylann commented 6 months ago

There is no occlusion culling. It's a relatively recent feature in Godot, and custom solutions such as BFS cave culling are a bit annoying to do because Godot doesn't expose a way to register custom culling systems (down to having to set mesh instances visible/not visible, which is more costly).

NuclearPhoenixx commented 6 months ago

Yes, I think it was added with Godot 4.0 AFAIK. I noticed that the OccluderInstance3D takes occluders such as an ArrayOccluder3D or BoxOccluder3D for example. Maybe this could be used to generate an ArrayOccluder for each chunk and then use Godot's built-in occlusion culling system to handle everything?

Maybe it's even possible to generate an occluder for each voxel depending on its visual mesh and transparency and then combine these array occluders to a single occluder instance similar to how it is with the physics engine collision (?). Like I said, easier said than done and I have no clue how exactly meshing works in this module, but that'd be my naive approach and I don't want to open up an entirely new discussion on that.

Zylann commented 6 months ago

Maybe it's even possible to generate an occluder for each voxel

That's a very bad idea considering the sheer amount of geometry this represents. Godot uses a CPU occlusion culling technique, so it will perform badly if given complex shapes.

Maybe this could be used to generate an ArrayOccluder for each chunk and then use Godot's built-in occlusion culling system to handle everything?

Maybe. The shadow_occluders branch now adds the option to spawn quads between chunks, initially intented to force caves to be in shadow, but maybe something similar could be used. However you have to realize that chunks constantly get loaded/unloaded and can be edited by players frequently. Godot's occlusion culling system needs to be re-baked entirely everytime something changes, so I fear that this might not scale well in such a dynamic environment. Also I'm not focused on this at the moment.

NuclearPhoenixx commented 6 months ago

Godot's occlusion culling system needs to be re-baked entirely everytime something changes, so I fear that this might not scale well in such a dynamic environment.

That's a good point, I totally forgot that it needs baking. What a shame.

NuclearPhoenixx commented 6 months ago

Are there any plans to implement an LOD system like in VoxelLodTerrain for blocky meshers? Chunked LODs for blocky voxels would be another great optimizations especially at the larger view distances at which that would make sense.

Zylann commented 6 months ago

I'd like to have that at some point, I have done some prototypes, but for now there is nothing ready yet. It's significantly more complicated to setup in a way that looks good, compared to smooth LOD

Zylann / godot_voxel

What optimizations are used in this module? #150