godotengine / godot-proposals

Godot Improvement Proposals (GIPs)
MIT License
1.17k stars 98 forks source link

Implement Vulkan Mesh Shading and Meshlets #6822

Open ghost opened 1 year ago

ghost commented 1 year ago

Describe the project you are working on

3D procedurally generated open-world game.

Describe the problem or limitation you are having in your project

Terrain culling, mesh culling, lots of meshes, etc... Not huge problems, but this could make it easier.

Describe the feature / enhancement and how it helps to overcome the problem or limitation

See this: https://www.khronos.org/blog/mesh-shading-for-vulkan

Meshlets are already implemented in meshoptimizer, which Godot uses. Godot is already making use of Vulkan and some extensions (I believe).

Mesh and task shading pipeline + meshlets potentially offers a lot of benefits and future-proofing.

Benefits are described in the article but tl;dr is that it makes more use of the GPU, is more parallel, more culling opportunities, etc...

Describe how your proposal will work, with code, pseudo-code, mock-ups, and/or diagrams

See: https://www.khronos.org/blog/mesh-shading-for-vulkan

I'll probably try to implement it myself, but if anyone with more expertise and/or time does it I won't object 😉. The above article describes it and it's a topic that plenty of others have written about in far better detail than I can.

If this enhancement will not be used often, can it be worked around with a few lines of script?

There's workarounds with the current render pipeline but the new pipeline appears to have the potential to simplify a lot of that work. For example, implementing terrain involves splitting it into chunks for culling, LODing, etc... Splitting everything into meshlets makes terrain just another object and simplifies a lot. The roadmap for 4.1 is also to do a lot of 3D optimizations, this is a potential optimization.

Is there a reason why this should be core and not an add-on in the asset library?

It involves the core renderer.

Edit - Rationale behind this idea

As far as I'm aware based on statements by some Godot authors, Godot does have ambition of keeping up with AAA features and visuals, even if it's currently a great indie game engine. Like ray-tracing, this is a feature that makes use of modern hardware and will allow even better visuals. There's reference implementations out there, lots of papers and blogs, other engines are already implementing it as well. This feature isn't a need. But it seems to be something with lots of benefits and the direction many vendors and engines are going.

paddy-exe commented 1 year ago

It seems like only a small percentage of devices support this Vulkan extension (information from https://vulkan.gpuinfo.org/listextensions.php) : image

ghost commented 1 year ago

Mobile stats don't matter because the Vulkan mobile renderer is different anyway. There's also the fallback renderer for old devices. Forward+ is already using some Vulkan 1.3 features and all hardware I'm aware of that supports Vulkan 1.3 supports this with up to date drivers (the original vendor implementations are 5 years old, here's a 2018 article by Nvidia: https://developer.nvidia.com/blog/introduction-turing-mesh-shaders/#toc2). It's been in Mesa since 22.3, all the vendors support it and I'd wager it makes a much larger percentage of hardware people actually use to game on desktop.

@reduz also mentioned more rendering modes anyway, including a very high quality "cinematic" renderer with ray-tracing: https://twitter.com/reduzio/status/1568555295370002432?lang=en

I don't think supporting this is a stretch.

Calinou commented 1 year ago

The main question is, how do you expose mesh shaders with a high-level API that is friendly to non-expert users? This is the largest hurdle with exposing rendering features such as tessellation, stencil buffer, custom projection matrices, etc.

Unlike the features I listed, this is made more difficult with mesh shaders because you need to provide a fallback that requires as little manual work as possible (for unsupported hardware). A lot of people are still gaming on Pascal, Polaris or even Maxwell GPUs[^1], none of which support mesh shaders. Not even AAA games have a hard requirement on mesh shader support yet :slightly_smiling_face:

[^1]: Indie games are often played on lower-spec PCs compared to AAA games, so keep this in mind when reading hardware surveys.

badunius commented 1 year ago

It seems like only a small percentage of devices support this Vulkan extension

Note that all desktop GPU vendors (NVIDIA, Intel, Apple, AMD) support mesh shaders in their current generation of hardware, and continued support in future generations is highly likely.

please define "current generation" According to the steam stats (https://store.steampowered.com/hwsurvey/videocard/) 30XX series is not that popular and within a month a 40XX series is going to become a "current generation"

ghost commented 1 year ago

Current generation means Nvidia Turing and newer, AMD RDNA2 and newer, Intel Xe and Arc, and all ARM Macs (M1 has mesh shader support).

This includes Steam Deck, PS5, Xbox Series, and basically any hardware made within the last 2-3 years, ultrabooks included, up to 4+ year old GPUs (Nvidia Turing).

Anyhow, every desktop hardware vendor supports them, every current generation console, every graphics API (Direct3D had it before Vulkan, I only mentioned Vulkan because that's what Godot uses) and every desktop OS... It's been likened to the move from OpenGL 1 -> 2, and then 2 -> 3.3. Literally new graphics pipeline and, like raytracing, is something everyone is implementing. O3DE already has started implementing it, it's basically what Unreal Nanite does, and games that use it have already started coming out.

I don't think it should be the only rendering pipeline, we still have OpenGL fallback, Mobile/deferred, Forward+ and Juan mentioned future support for raytracing (raytracing hardware is same gen as this).

Also for @Calinou , here's an Nvidia sample of Vulkan mesh shaders that includes being able to switch between the 2 renderers. It wouldn't expose anything to the user per se, anymore than switching from Forward+ to Mobile renderers in the Godot editor. https://github.com/nvpro-samples/gl_vk_meshlet_cadscene

mrjustaguy commented 1 year ago

Considering how there will likely be a massive upgrade cycle in the next few years as more Next Gen titles come out, and more unoptimized AAA games with that too, and with https://github.com/godotengine/godot/issues/68959 still being unresolved causing extra pain on triangle heavy scenes which might take a while to fully diagnose and fix (I doubt it'll be fixed by 4.1 as planned sadly), it makes sense to start thinking about Mesh Shaders more, as when you account for the time it takes to also ship a game, by that point the older hardware is really a nice to have good support for, but not a must have...

Not to mention, for indie devs Mesh Shaders are actually a gigantic deal, as there is much less time spent on actually optimizing the meshes to run well, and you more or less don't really worry about poly budgets with them, especially considering an indie won't be using cinematic quality graphics with millions or billions of triangles per mesh, but more like 10k-100k a mesh with a bunch of meshes all around. Now that doesn't mean I'm saying screw optimization, I'm just being realistic, Optimization takes time that some devs might simply lack, and the more that the engine can handle itself the better.

TechnoPorg commented 1 year ago

Regarding compatibility with lower-end devices, would it be possible, or even desirable, to implement mesh shading / meshlets for all devices by using a software rasterizer with those features programmed in? It's been shown by Nanite that software rasterizers can match or even outperform hardware rasterizers when working with small triangles. I thought I'd throw that idea out there, but I'll be the first one to say that I'm far from a rendering expert, so I realize it's most likely not a good one.

Saul2022 commented 1 year ago

Current generation means Nvidia Turing and newer, AMD RDNA2 and newer, Intel Xe and Arc, and all ARM Macs (M1 has mesh shader support).

This includes Steam Deck, PS5, Xbox Series, and basically any hardware made within the last 2-3 years, ultrabooks included, up to 4+ year old GPUs (Nvidia Turing).

Anyhow, every desktop hardware vendor supports them, every current generation console, every graphics API (Direct3D had it before Vulkan, I only mentioned Vulkan because that's what Godot uses) and every desktop OS... It's been likened to the move from OpenGL 1 -> 2, and then 2 -> 3.3. Literally new graphics pipeline and, like raytracing, is something everyone is implementing. O3DE already has started implementing it, it's basically what Unreal Nanite does, and games that use it have already started coming out.

Switch does not support it 😕, and it the best console with the most indies as i remember. And also even if there’s no rtx, sdfgi is there, even if it not remotely as good, it there, in mesh shaders the solution would be a fallback like the unreal engine 5 nanite one.

Saul2022 commented 1 year ago

The following is a rough sketch (some specifics will probably be wrong or need tweaking) for porting Godot's rendering pipeline to become meshlet-based, with minimal refactoring (relatively speaking).

This should enable it to handle object counts and triangle counts on the same order of magnitude as Unreal Engine's Nanite ( resolving #2793 and #6109 ) while retaining full support for programmable vertex shaders, which Nanite still isn't capable of.

It is also much more transparent to users than Nanite, only requiring them to move uses of Mesh objects over to the aforementioned VirtualMesh. Users who don't procedurally generate geometry at runtime likely wouldn't need to do anything other than run a one-time migration script to reimport their meshes as VirtualMeshs.

"Soft-culling"

Reducing the polycount of objects instead of skipping rendering. This results in less noticeable culling artifacts, allowing for more aggressive culling settings and vastly simplified culling logic.

See the below demonstration video from godotengine/godot#76297

233419333-795560f0-39c7-465b-9a49-86ccf17b152f.mp4

Left: Traditional "hard-culling" Right: "Soft-culling"

Occlusion culling

Traditional occlusion queries allow occluders to have a vertex() shader, but introduces a delay during which objects that should be visible can end up becoming culled. Godot's CPU-based occlusion culling makes the opposite tradeoff, with an additional downside that users also need to manually set up occluder meshes.

With the new mesh shading pipeline, objects can now be culled on the GPU via task shaders, so we can move the occlusion pass onto the GPU as well and have the best of both worlds, resolving godotengine/godot#70373 .

To do this, simply render all objects to a low resolution depth buffer at their lowest LODs, something like a "depth pre-pre-pass". This can then be used as an occlusion buffer texture for the main rendering pass.

Soft-culling using a lod() shader

A new shader stage that allows users to dictate fine-grained soft-culling behavior for objects and meshlets.

  • in bool IS_OCCLUDED: Whether the bounding box of the object/meshlet is occluded by the occlusion buffer texture.
  • in float FRUSTUM_DISTANCE: How far the object/meshlet is outside of camera bounds.
  • in float VELOCITY: Screen-space velocity, allows reducing the polycount of fast objects/meshlets.
  • in float BACKFACE_CONE_ANGLE: The meshlet version of backface culling. Not applicable for objects.
  • out float LOD_BIAS: How much the polycount of the object should be reduced.

This allows complex soft-culling configurations using simple code:

void lod() {
    if (IS_OCCLUDED)
        LOD_BIAS = 0.2;
    if (FRUSTUM_DISTANCE > 0.2)
        LOD_BIAS = 0.01;
}

The artifacts/performance of soft-culling is highly dependent on the rendering scenario and the types of models used. lod() shaders allow fine-tuning to whatever degree of conservativeness/aggressiveness the user desires.

Per-object logic (task shader)

Task shaders allow drawing decisions (culling, LOD selection, etc) to be performed in a compute shader, meaning all objects in the scene can be rendered in a single draw call rather than needing one draw call per object.

Launch one task per object. Call the lod() shader specified above. If LOD_BIAS is set to 0, skip this object (traditional "hard-culling"). Otherwise, launch the appropriate number of mesh shaders for the selected LOD.

Per-meshlet logic (mesh shader)

Meshlets can be soft-culled as well, using a simple method that guarantees no gaps between meshlets:

a

Assuming a perfectly circular meshlet and some hand-wavy math, a meshlet with an area of 100 triangles will be reduced to 35 triangles around the circumference using this method, an approximate 3x reduction. Denser meshlets will have a greater reduction, but note that mesh shading hardware is currently limited to 256 triangles per meshlet.

Note that meshlet soft-culling stacks on top of the object soft-culling done in the task shader for even more savings.

We would like to transition all models to meshlets. This requires support for programmable vertex shaders (animation, displacement, etc).

To accomplish this, first select 4 points on the border and one center point. These five points will act as a "proxy meshlet".

b

Run the 5 vertices of this "proxy meshlet" through the vertex() shader. This is more efficient than processing all 200+ vertices of the full meshlet.

Use the transformed "proxy meshlet" to calculate the bounding box to be used for frustum/occlusion culling and the "normal cone" for backface culling. Pass these results into the lod() shader to determine whether the full meshlet should be soft-culled.

This is much easier than trying to hard-cull animated meshlets, which requires headache-inducing calculations to avoid culling artifacts: https://www.youtube.com/watch?v=auE3AF7B06A

Pretty cool, also about culling it could be combined with 2 pass occlusion culling?

Saul2022 commented 1 year ago

Not to mention, for indie devs Mesh Shaders are actually a gigantic deal, as there is much less time spent on actually optimizing the meshes to run well, and you more or less don't really worry about poly budgets with them, especially considering an indie won't be using cinematic quality graphics with millions or billions of triangles per mesh, but more like 10k-100k a mesh with a bunch of meshes all around. Now that doesn't mean I'm saying screw optimization, I'm just being realistic, Optimization takes time that some devs might simply lack, and the more that the engine can handle itself the better.

Most godot userbase are on old pc’s ,and the main indie market in consoles is the switch, which is a powerful as a modern mobile phone, even if mesh shaders are ever supported there should be 100 a fallback or alternative, it not good to screw low end gpu to recent expensive gpu, only a fraction of devs will benefit and if the people play your game and it gpu won’t support mesh shaders without fallback they will be complaining and rating the game as unplayable.

Calinou commented 1 year ago

@Saul2022 Please use the Edit button (located behind the icon in the top-right corner of your comments) instead of multi-posting. Also, please don't quote large blocks of text as it makes the thread difficult to read. Instead, only quote the relevant paragraphs (or don't quote anything and use @ mentions instead).

Either way, I don't think we should argue about having to implement mesh shaders without a fallback – it's clear that it's not a viable option for the next couple of years. The hardware coverage is just not good enough to do that yet, and macOS/mobile is still lacking support for mesh shaders. @myaaaaaaaaa has mentioned a way to introduce mesh shaders in a way where a fallback can be provided (performance will be worse, but visuals should be identical).

Saul2022 commented 1 year ago

@Saul2022 Nanite performs meshlet-level occlusion culling even while drawing the occlusion buffer (hence, "two-pass occlusion culling") to avoid processing every triangle in the scene, as otherwise it may end up having to rasterize billions of triangles, which would be unmanageable even in depth-only mode.

Alright seem’s good to me then, but i am still a bit confused how the difference would between the nanite fallback and this mesh shader fallback, as my experience with ue5 i was able to render a billion triangles with the nanite fallback at my igpu at 20 fps on lowest sethings, so it would be great to know if it would differ much or not and thanks for the awesome work you are doing it pretty good.

Calinou commented 1 year ago

@Calinou Are there any plans to provide a fallback path to non-raytracing hardware for RendererSceneCinematic or adding raytracing extensions to RendererSceneCull?

I assume this cinematic renderer will have a hard requirement on hardware-accelerated raytracing, although a driver-level fallback might be usable (but very slow and only suited for Movie Maker mode). It's still a while away, as the focus is currently on improving the existing rendering methods. By the time a cinematic rendering method is finished, the percentage of hardware in use supporting raytracing should be greater, although macOS will still be an issue.

That said, all this is still in flux, so we don't know for sure whether built-in RT features will be restricted to a dedicated rendering method or not. There will probably be a way to use RT extensions in RenderingDevice in any modern rendering method if you wish, but this doesn't automatically translate to high-level RT features being available out of the box.

See also this discussion on raytracing support for additional context.

Stimes59 commented 1 year ago

Any update ?

Calinou commented 1 year ago

Any update ?

As far as I know, no work on mesh shaders has started yet, but it's still a bit early for the reasons mentioned above. There are also more essential rendering features we're currently missing to make mesh streaming truly relevant in Godot, such as texture streaming.