Implement Mesh streaming

reduz commented 1 year ago

Describe the project you are working on

Godot

Describe the problem or limitation you are having in your project

For large scenes, loading meshes (3D models) consume a lot of video memory and it takes a long time to do so.

Describe the feature / enhancement and how it helps to overcome the problem or limitation

Mesh streaming aims to solve this. A fixed amount of memory is reserved for streaming and then meshes are streamed in (higher detail) and out (lower detail) depending on proximity to the camera.

Describe how your proposal will work, with code, pseudo-code, mock-ups, and/or diagrams

Overview

The first thing that needs to be understood is that an efficient, modern mesh streaming has many requirements that need to be met:

Ability to split the mesh content into pages that can be loaded in and out in a fixed amount of video memory (to avoid fragmentation and defragmentation).
Ability to be culled entirely in the GPU, as we are intending to draw massive amount of objects.
Ability to use existing Godot materials.

To comply with the first requirement, this generally means using triangle strips and a fixed format that covers the most common use case (vertex, normal, tangent, uv). As such, colors, uv2, bones, weights, indices, etc. will not be supported.

This means that, with common encoding, a vertex will take 28 bytes. A common chunk of mesh can be 64 vertices, so this means 1792 bytes. If we want to be able to "cull" those chunks (so an object that hits the camera only has those chunks drawn if passing frustum and occlusion culling), the triangle strip should be self contained, otherwise it can just continue until the end.

In practice, this means that this needs to be a separate type of mesh, likely StreamedMesh that is registered separately in Godot than standard meshes.

As these chunks would need to be streamed form low detail to high detail, this means that different LOD versions would need to be stored separately so they can be streamed in and out (no index based LOD).

Rendering

Remember that our main goal is to still be able to use Godot materials, otherwise the workflow would get too complicated. To achieve this, the base algorithm would be more or less like this:

Cull all the instances of StreamMesh using a compute shader, mark them visible or invisible (occlusion culling can be added later, like two pass occlusion culling).
Count how many instances are visible for each material and how many chunks for each mesh with a compute shader.
Copy the indices of the visible chunks for all objects to a very large array.
Store the first index for every material.
Call a drawing primitive (draw arrays indirect) for every material using the proper start vertex offset.
In the shader, fetch the right vertex based on the gl_VertexIndex.

Of course, there are more things that need to be taken care of:

Finding the right LOD level
Determining when LOD levels need to be streamed-in or out
Culling objects that intersect the frustum so only chunks that intersect are visible.
Culling objects against the depth buffer if using two pass occlusion culling.

As it should be obvious, this render pass is separate from the regular geometry render pass in Godot.

Q: Is this like Nanite? A: No, Nanite is more complex since it has a LOD hierarchy (which means a single large object can have multiple levels of detail). This just aims to be a good enough solution for most use cases that is not as complex to implement, but it could eventually be extended into something more akin to Nanite if there is more demand in the future.

If this enhancement will not be used often, can it be worked around with a few lines of script?

N/A

Is there a reason why this should be core and not an add-on in the asset library?

N/A

nonunknown commented 1 year ago

I wonder what if we have a option to generate impostor as the most distant 3d static meshes, then as it approaches the camera start using LODS, wdyt?

ps: if this is not related, please remove comment!

robbertzzz commented 1 year ago

As such, colors, uv2, bones, weights, indices, etc. will not be supported.

Couldn't this just be a project setting?

TokisanGames commented 1 year ago

I have some meshes with one LOD with Godot generating lower lods. I have other meshes with artist created lods, and my own system to manually switch lods, and shadow impostors. Both are embedded in their own glb. If I'm going to use this system, I'd need to be able to plug into it for both generated and artist created lods.

If by "stored separately", you mean lods stored in separate files, that's not going to happen on my project. I've already manually imported a thousand assets three times due to devs breaking the arraymesh format. But that shouldn't be necessary. RAM is abundant and unimportant. Streaming and unloading for VRAM is the only real need IMO. You should be able to load all lods from the same physical file, but stream them to the video card on demand.

Qubrick commented 1 year ago

I'm not a graphics programmer by any means but I just love reading through the internet. There might be something here.

1. Visibility Buffer by Wolfgang Engel https://diaryofagraphicsprogrammer.blogspot.com/2018/03/triangle-visibility-buffer.html http://filmicworlds.com/blog/visibility-buffer-rendering-with-material-graphs/

2. GPU-Driven Rendering Pipelines by Sebastian Aaltonen This includes Mesh Cluster Rendering and Occlusion Depth Generation.

3. Rendering The Hellscape Of Doom Eternal

4. A hierarchical Framework For Large 3D Mesh Streaming On Mobile Systems

5. Streaming Meshes - UNC Computer Science

madjin commented 1 year ago

Perhaps worth checking out 3D Tiles? It's an open standard for streaming massive 3D content, including buildings, trees, point clouds, and vector data. I just saw that they are coming out with a new feature soon to use custom glTF 2.0 assets as tile content. Also coincidentally I saw @reduz was on a podcast with CEO of Cesium / creator of 3D Tiles. Source: https://github.com/CesiumGS/3d-tiles#specification

Examples: https://sandcastle.cesium.com/?src=Clamp%20to%203D%20Tiles.html

and-rad commented 1 year ago

As such, colors, uv2, bones, weights, indices, etc. will not be supported

Doesn't that mean that lightmaps won't work with this system?

Also, not allowing vertex colors to work with this kind of denser mesh system sounds like a missed opportunity. Using vertex colors as mask values for AO or grunge and wear is a nice way to save texture memory. This part of a presentation on Nanite demonstrates what such a workflow would look like.

Calinou commented 1 year ago

Doesn't that mean that lightmaps won't work with this system?

There's generally an assumption that in games with large open worlds, you can't use lightmaps as they'd require very large files and would take a long time to bake. Instead, SDFGI is the preferred solution (or multiple baked VoxelGIs, as its file size doesn't depend on mesh complexity and is much faster to bake).

fire commented 1 year ago

As the person who requested custom uv1-8 I would like that to work, but I understand that this is a bit packing problem.

I also expect that people will attempt to use this system with something like bones or they will hack it in via vertex animations with hierarchy for the field of grass and trees usecase. The godot skeletal animation is a variation of the vertex animation compute shader approach, so I don't see much difference.

ClinToch commented 1 year ago

I think you guys should definitely Checkout "Nvidia MicroMesh" it was unveiled by Nvidia during the launch of the RTX 40 series Graphics card in late 2022.

Benefits

it's FREE and open source
cross platform and cross vendor
has support for Displacement
supports opacity
better than nanite in UE5
supports hardware acceleration
built from the ground up for ray tracing
unmatched ray tracing performance
and more.

Learn more: https://developer.nvidia.com/rtx/ray-tracing/micro-mesh

Demo: This realtime tech Demo from Nvidia uses MicroMesh™

https://youtu.be/AsykNkUMoNU

WrobotGames commented 1 year ago

Really interesting, but isn't that just some super efficient tesselation and displacement (with fancy opacity rendering). At least, that that is what I understood form their article. While this is really cool tech, I don't think it is a replacement for mesh streaming or cascading LODs (e.g. nanite).

silverkorn commented 1 year ago

I think you guys should definitely Checkout "Nvidia MicroMesh" it was unveiled by Nvidia during the launch of the RTX 40 series Graphics card in late 2022.

Benefits

it's FREE and open source

cross platform and cross vendor

has support for Displacement

supports opacity

better than nanite in UE5

supports hardware acceleration

built from the ground up for ray tracing

unmatched ray tracing performance

and more.

Learn more: https://developer.nvidia.com/rtx/ray-tracing/micro-mesh

Demo: This realtime tech Demo from Nvidia uses MicroMesh™

https://youtu.be/AsykNkUMoNU

it's FREE and open source, but I believe it will be under the NVIDIA RTX SDKs LICENSE like their other SDKs (ex.: https://github.com/NVIDIAGameWorks/Opacity-MicroMap-SDK/blob/main/LICENSE.txt)

Not sure that it's compatible with Godot's license.

Calinou commented 1 year ago

Not sure that it's compatible with Godot's license.

That license is indeed proprietary, and therefore not suitable for inclusion with Godot.

and-rad commented 1 year ago

There's generally an assumption that in games with large open worlds, you can't use lightmaps as they'd require very large files and would take a long time to bake. Instead, SDFGI is the preferred solution (or multiple baked VoxelGIs, as its file size doesn't depend on mesh complexity and is much faster to bake).

That's true, but it might be worth keeping in mind that "large scene" doesn't have to mean big open world, it can also mean confined spaces with lots of detail going on. If I was making a game that takes place on a space ship and the engine had proper mesh streaming support, I'd greeble the hell out of my scenes. Lightmapping might still be desired in a case like that.

Personally, I would love nothing more than to never unwrap a lightmap again in my life.

Calinou commented 1 year ago

Alternatively, a much simpler way to implement mesh streaming would be to extend visibility ranges to emit signals on visibility changes. This would allow more flexible streaming behavior from the GDScript side.

3.5 allows specifying a maximum distance in VisibilityNotifier, but this isn't implemented in 4.x yet.

atirut-w commented 1 year ago

For example, leaf nodes in an HLOD tree could simply be placeholder lowpoly meshes that defer loading their full mesh data (attributes, LODs, shaders, etc) until the camera is close enough to trigger the visibility range.

That's actually what I used for one of my projects and it worked really well despite having mesh priority issues because I couldn't be bothered to implement more complex codes.

Ansraer commented 1 year ago

Reading your proposal it sounds very similar to what you would beed for mesh rendering, but using different terms (e.g. "chunk" instead of "meshlet"). You also mention doing the culling in a compute shader instead of in a Task Shader.

Are you intentionally not using the correct terms in your proppsal or was this an oversight? Assuming this was done intentionally it I am guessing you want to emulate mesh shaders without actually using them (and thus requiring modern hardware), but I have to wonder if it is worth it. Any project large enough to really need these kinds of optimizations will probably already require recent-ish hardware, and at that point we might as well use a mesh pipeline.

When it comes to the actual implementation it should also be pointed out that it is usually highly preferable to do this over multiple passes. You start with the largest & closest objects with the least vertices and draw them first before you downsample the resulting depth so that you can compare it against bounding spheres of the meshlets in the next pass. This way you can cull an increasingly larger percentage of the meshlets with every pass before even entering the mesh stage.

Finally I would like to point out that at least one custom vector per vertex might be necessary. A skilled technical artist could really use that for animations. All the usecases that drove me to investigate mesh shading would require it. (Mostly vegetation stuff, and yes, I am aware that this will also make the culling less efficient)

fire commented 1 year ago

I was assuming reduz was redefining terms to give his own meaning to common words.

https://github.com/zeux/meshoptimizer#mesh-shading which is already in godot 4 has meshlet creation support. I am also investigating doing a coarse grid division using @lawnjelly 's code at https://github.com/v-sekai/godot-splerger

Calinou commented 1 year ago

Are you intentionally not using the correct terms in your proppsal or was this an oversight? Assuming this was done intentionally it I am guessing you want to emulate mesh shaders without actually using them (and thus requiring modern hardware), but I have to wonder if it is worth it. Any project large enough to really need these kinds of optimizations will probably already require recent-ish hardware, and at that point we might as well use a mesh pipeline.

Even if we do end up using mesh shaders, I think we'll need an emulation path as a fallback regardless. Today's AAA games can still run on Pascal/RDNA1 or even Maxwell/Polaris GPUs after all. Support for mesh shaders on integrated graphics may also be less than stellar.

Saul2022 commented 1 year ago

Could it be possible that next, if it similar to nanite, could it use a programmable rasterizer for things like foliage and deformation. https://docs.unrealengine.com/5.1/en-US/nanite-virtualized-geometry-in-unreal-engine/#supportedfeaturesofnanit

edit i saw some good article that covers some nanite stuff, hope it helps https://www.reddit.com/r/hardware/comments/gkcd9b/pixels_triangles_whats_the_difference_how_i_think/

Also about the lod thing, i honestly think that if it doesn't make it harder to run on less powerful gpu, hlod could be added at first like nanite, as apparently thanks to chatgpt research it gives 3x perfomance compared to traditional lod's

![Screenshot_2023-02-19-17-19-20-17_40deb401b9ffe8e1df2f1cc5ba480b12](https://user-images.githubusercontent.com/97898580/219960726-5b94d131-95de-4fba-b7ee-af29b403cc45.jpg) ![Screenshot_2023-02-19-17-20-02-00_40deb401b9ffe8e1df2f1cc5ba480b12](https://user-images.githubusercontent.com/97898580/219960760-012f7056-d662-4634-a26a-283be2b381e2.jpg) ![Screenshot_2023-02-19-17-20-05-79_40deb401b9ffe8e1df2f1cc5ba480b12](https://user-images.githubusercontent.com/97898580/219960765-81761152-d436-4725-9239-0cf1206c0686.jpg) ![Screenshot_2023-02-19-17-20-09-62_40deb401b9ffe8e1df2f1cc5ba480b12](https://user-images.githubusercontent.com/97898580/219960769-04708215-b0db-4860-a81f-67357de91428.jpg)

Saul2022 commented 1 year ago

Also about the lod thing, i honestly think that if it doesn't make it harder to run on less powerful gpu, hlod could be added at first like nanite, as apparently thanks to chatgpt research it gives 3x perfomance compared to traditional lod's

@Saul2022 Godot already supports HLODs in the form of visibility ranges, so we should be good to go there.

I know i meant just having them automatically generated with mesh streaming instead of traditional lod.

Even if we do end up using mesh shaders, I think we'll need an emulation path as a fallback regardless. Today's AAA games can still run on Pascal/RDNA1 or even Maxwell/Polaris GPUs after all. Support for mesh shaders on integrated graphics may also be less than stellar.

The general recommendation is to store geometry data as meshlets, and convert back to individual triangles (standard vertex+index buffers) on hardware that doesn't support mesh shaders.

Then gtx 1080 is rdna2, i saw some videos with nanite running the valley of the ancient demo on that gpu and seemed to work.

I would strongly suggest splitting this into two separate proposals, one for streaming Resources, and the other for meshlets+mesh shaders, rather than combining them into a single system that streams at the meshlet level. Focusing on them separately allows them both to be made much more broadly applicable and useful for many more Godot users, while still being able to cover Nanite-like tasks of streaming and rendering highpoly meshes.

While i agree i think that for sake of easy to use there should be an option for streaming to add what is stated in the proposal while having the meshlet-mesh shader implementation independent.

Calinou commented 1 year ago

Then gtx 1080 is rdna2

No, it's a Pascal GPU :slightly_smiling_face:

Nanite works on older GPUs because it has fallbacks for GPUs not supporting mesh shaders.

and-rad commented 1 year ago

@myaaaaaaaaa

[...]

I like the sound of this. I like the idea of not only trying to replicate a system like Nanite but instead trying to build on it and the things we've learned and coming up with a solution that avoids some of its pitfalls.

darrylryan commented 1 year ago

This is one of the things that has always made me skeptical of the stance most open-source engines have taken that components such as terrain, terrain editor, foliage etc. are not / should not be part of the core.

I think for large worlds to work properly, there's a number of components that have to work together. First you need to be able to stream textures and meshes in and out of memory. Second you need to be able to be able to break huge worlds up into multiple chunks - you can't just have one huge scene file that's tens of GB is size, so you need to have scenes that are composed of multiple sub-scenes that are streamed in and out as you move around. Your scene format and editor needs to understand all that. Next your terrain system, foliage etc. needs to work together with that, so that chunks of terrain/foliage are stored in different world chunks. Finally, your editor needs to understand it all - so it doesn't try to load the whole world at once and lag the editor. It needs to be able to stream the world in and out as you move around, and save the trees and grass you paint into the world in the right chunks. It also needs to generate impostors for your trees etc. at import time and manage them along with mesh LODs.

It's all very inter-related I think, and the reason all the big AAA engines have these things as core components.

atirut-w commented 1 year ago

I think for large worlds to work properly, there's a number of components that have to work together. First you need to be able to stream textures and meshes in and out of memory. Second you need to be able to be able to break huge worlds up into multiple chunks - you can't just have one huge scene file that's tens of GB is size, so you need to have scenes that are composed of multiple sub-scenes that are streamed in and out as you move around. Your scene format and editor needs to understand all that. Next your terrain system, foliage etc. needs to work together with that, so that chunks of terrain/foliage are stored in different world chunks. Finally, your editor needs to understand it all - so it doesn't try to load the whole world at once and lag the editor. It needs to be able to stream the world in and out as you move around, and save the trees and grass you paint into the world in the right chunks. It also needs to generate impostors for your trees etc. at import time and manage them along with mesh LODs.

Wouldn't a sort of generic asset streaming system be more useful? It just have to allow streaming of any resources so it shouldn't be too complex.

Saul2022 commented 1 year ago

After some research on trying to expand the capabilities of this mesh streaming to have some sort of support for skeletal meshes by making that function once in the memory and reuse the same result on the rest. This is said in this thread by sebb from twitter(the guy that presentes a gpu driven renderer for assasin's creed unity, and the unity hybrid renderer senior director)
https://twitter.com/SebAaltonen/status/1403044403007078403

https://twitter.com/SebAaltonen/status/1403044018079080454

I recommend checking his tweets of gpu driven rendering and nanite as they are interesting and can give some ideas, as it seems this mesh streaming solution share some similarities. Edit' Also for small clusters you could use some techniques MM dreams used for merging them although the problem may come when there are overlapping objects.

and-rad commented 1 year ago

It's a good idea to keep in mind that those tweets are almost two years old though. The alpha mask limitation is gone by now, although I don't know if they will ever support skinned meshes for Nanite. I don't even know if I would want them to. Rigging and skinning a mesh with a million verts sounds like an absolute nightmare to me.

Saul2022 commented 1 year ago

i know that they are gone and that it kind of old,. but i mentioned that tweet as a reference for godot mesh streaming solution and could apply to both godot mesh streaming and nanite(as juan said the major difference is nanite lod's are computed per piece, not per object). Also Skinning support doesn't just mean animation on millions of polygon, it means that you can use way more characters and also frees cpu to do more behavior things. This makes largue crowds easier to do although, proposals like swarms or just using c++ can helps.It was mentioned in the blog of godot for AAA games.

Saul2022 commented 1 year ago

s * The aforementioned visibility notifier

Remember that the mesh shader should have a fallback for non RDNA 2, i don’t think it should be supported atm as only few will benefit from it, even if not as good, it gives a massive perfomance boost compared to auto lod, because of the cluster and two pass occlusion culling.

mrjustaguy commented 1 year ago

Non RDNA2+, Turing+ and Alchemist+, So AMD RX 6xxx+, Nvidia GTX 16xx+ and Intel Arc

Saul2022 commented 1 year ago

I have some questions: Will the initial approach support foliage that uses alpha like leaves, bush etc? And what will happen if there not enough chunks for the mesh to stream or the higher poly mesh don’t have them , will it dissapear too or will have the extra cost? For example with a 1k triangle sphere, could it use mesh streaming?

Calinou commented 1 year ago

I have some questions: Will the initial approach support foliage that uses alpha like leaves, bush etc? And what will happen if there not enough chunks for the mesh to stream or the higher poly mesh don’t have them , will it dissapear too or will have the extra cost? For example with a 1k triangle sphere, could it use mesh streaming?

We don't know yet, as we haven't even started looking at the implementation of this proposal. It's too early to know yet.

MichaelWengren commented 7 months ago

Over the course of the whole year, were there any attempts to allocate at least some financial resources from donations for this? Or are you waiting for someone to come along and do it all for free? Because this definitely won’t happen and it will hang like this for the another year

AThousandShips commented 7 months ago

A lot of other things have been prioritised for this, it's not been abandoned or neglected just not been a primary priority, there's no reason to expect it to not be looked into or worked on without funding being directed to it

clayjohn commented 7 months ago

Over the course of the whole year, were there any attempts to allocate at least some financial resources from donations for this? Or are you waiting for someone to come along and do it all for free? Because this definitely won’t happen and it will hang like this for the another year

I understand the frustration. But the fact is that an immense amount of work has gone into this proposal already. This isn't a situation where we can just throw money at the problem, we needed to clean up a lot of our RenderingDriver internals to make mesh streaming possible.

Particulalry, this PR is necessary before we can start work on mesh streaming https://github.com/godotengine/godot/pull/87590

But it relies on https://github.com/godotengine/godot/pull/87340

Which relies on https://github.com/godotengine/godot/pull/84976

Which relies on https://github.com/godotengine/godot/pull/83452

Now of course, these PRs weren't made directly to implement mesh streaming. They were all made for different reasons (bug fixes, reduce future maintenance burden, enable other optimizations), but there isn't a point in working on mesh streaming until we have proper thread safe resource loading from multiple threads. Since we will have that soon, we can start working on mesh streaming soon.

HeadClot commented 6 months ago

I have a few general questions about this PR.

In broad strokes how are the tooling side of this proposal going to be tackled? I.E. World Partition, Streaming Volumes, etc.
Will this work for 2D as well as 3D?

and-rad commented 6 months ago

Nobody knows, as work on this hasn't even started yet. But World Partition and streaming volumes are independent from this, as they solve a different problem. They're methods to handle streaming in assets for large worlds, while this is about streaming on the vertex level to handle individual, dense meshes.

clayjohn commented 6 months ago

I have a few general questions about this PR.

In broad strokes how are the tooling side of this proposal going to be tackled? I.E. World Partition, Streaming Volumes, etc.

Will this work for 2D as well as 3D?

To add to and-rad's comment, this proposal doesn't incorporate any tooling or considerations for building large worlds like World Partition. Its a totally different type of system.

This will be 3D only

Broken1334 commented 5 months ago

Interesting presentation about Horizon Zero Dawn's world streaming: https://www.guerrilla-games.com/read/Streaming-the-World-of-Horizon-Zero-Dawn

eMPee584 commented 5 months ago

Interesting presentation about Horizon Zero Dawn's world streaming

Wow, that's beyond interesting.. very detailed implementation recap delving into copious topics related to object formats, streaming and memory management. Gorgeous developer resource, thanks for posting it here @Broken1334 🤓

Qubrick commented 2 months ago

Here are the details of virtual geometry on Bevy 0.14

https://jms55.github.io/posts/2024-06-09-virtual-geometry-bevy-0-14/

and-rad commented 2 months ago

SInce we're still thoroughly in the spaghetti-to-the-wall phase of this, here is an example of how Remedy uses the technique for Alan Wake 2: https://youtu.be/21JIFzG22B0?t=1416

They don't go into any detail, but it's the first time I've seen meshlets used on animated characters. Nanite is still, as far as I know, limited to static geometry.

Saul2022 commented 2 months ago

They don't go into any detail, but it's the first time I've seen meshlets used on animated characters. Nanite is still, as far as I know, limited to static geometry.

Well ue5 .5 has experimental skeleton nanite tho and provides 25% more performance than lods, also there's kind of assasin creed unity where they use gpu driven renderer that while not nanite at all can be inspired for the future gpu renderer.

WrobotGames commented 2 months ago

Nanite in ue5.4 does support world position offset (used for moving leaves), but doesn't support skeletal animations and morph targets. The material has to be opaque or alpha scissor.

Forward rendering, split screen, stereo and raytracing do not work with nanite. A fallback mesh is used for raytracing when enabled.

Source: https://dev.epicgames.com/documentation/en-us/unreal-engine/nanite-virtualized-geometry-in-unreal-engine?application_version=5.4

Zetelias commented 1 month ago

Bevy (another amateur engine) implemented something similar.

They don't go into any detail, but it's the first time I've seen meshlets used on animated characters. Nanite is still, as far as I know, limited to static geometry.

Well ue5 .5 has experimental skeleton nanite tho and provides 25% more performance than lods, also there's kind of assasin creed unity where they use gpu driven renderer that while not nanite at all can be inspired for the future gpu renderer.

It's oftentimes much more in practice in games where you have a bunch of entities doing animations.

Saul2022 commented 1 month ago

It's oftentimes much more in practice in games where you have a bunch of entities doing animations.

True though that procces can be tanked via the cpu ( processing animations while nanite handles the mesh polycount)

godotengine / godot-proposals