Mesh Instancing + LOD + Portals extension proposal

This is an extension proposal I've been pondering about after my work with SharpGLTF, that aims to solve a number of issues in one shot:

Simplify scene rendering evaluation: makes glTF more JPEG of 3D
Clean LOD support, no weird childs of childs, and much more.
Multiple meshes per node.

The extension would depend primarily in two objects:

MeshInstance would be included in the Scene Extensions as a collection.

It essentially defines a Mesh Draw Call:

MeshInstance
{
    int? VisibilityRule; // should we render it?
    int Mesh;           // what to render.
    int? Node;         // where to render it (If no Skin).
    int? Skin;        // how to deform it. (If Skin)
}

So, a client that wants to render a scene, instead of traversing the node tree graph, it would simply loop over the MeshInstances of the scene.

Now the interesting part is that a MeshInstance references a VisibilityRule object, which are stored in the glTF model extensions.

VisibilityRule must be evaluated at runtime to a Visible | Invisible state. If a MeshInstance references a VisibilityRule that evaluates to Invisible, then that mesh should not be rendered.

Initially I thought that the rule could simply be the LOD distance or something like that, but then I noticed that I could be extended to incorporate many other visibility and occlusion mechanims, other than LOD, so I came with something like that:

VisibilityRule
{
    LODRule    LOD;
    PortalRule Portal;
    ... Any other visibility rule we can imagine; Planes? BSP trees?
}

Now I am not sure if this is the right way of structuring a VisibilityRule, my guess is that it is certainly possible to have multiple concurrent rules, for example, a given rule would only evaluate to visible if it's close enough by a LOD level, and its portal is visible.

But I believe this mechanism would help into bringing rendering large scenarios into glTF in a way that is simple to understand and implement.

How a client would render the scene?

for each Node in SCENE.Nodes
{
        Node.EvaluateWorldTransform();
}

for each VRule in SCENE.MeshInstances.VisibilityRules
{
    VRule.EvaluateVisibility(Projection, View, World);
}

for each MInstance in SCENE.MeshInstances
{
        if MInstance.VisibilityRule is Visible
        {
              if MInstance has Skin RenderSkinnedMesh(MInstance, Skin);
              else                  RenderStaticMesh(MInstance, Node);
        }
}

The cool thing is that this common implementation would work for normal scenes, scenes with LODs, scenes with Portals, etc.

Bonus tracks

Since the MeshInstance List of a scene is basically a collection of drawing calls for a given scene, it is easier to loop over the list, than traverse a Node Tree Graph.
Also, it is easier to identify which Mesh is being rendered multiple times, so it can become a good candidate for GPU instancing.
Multiple MeshInstances can point to the same Node, allowing to reference multiple Meshes per Node. Currently it is limited to 1 Mesh per Node.
It gives more control over how the scene is rendered, simply reordering the items in the MeshInstance collection we can declare the rendering order, irrespectively of how meshes are instantiated in the node tree graph.
This extension can be backwards compatible as long as the original node tree graph also references the appropiate meshes and skins, but it could be possible to define a node tree graph without mesh references, so the node tree graph would only serve as a mechanism to provide world transforms to the MeshInstances.
Materials could also take advantage by having an extension pointing to a VisibilityRule and a fallback Material, so when rendering a mesh, if the visibility rule of a given material evaluates to invisible, the fallback material would be used instead. This would bring LODs to materials too.

Regarding to your comment at https://github.com/KhronosGroup/glTF/pull/1691#issuecomment-550381280 : You linked to this issue at various places, and seemed to have some discussion in the related issues (although I'm not so deeply involved and up to date here - sorry). I think one of the reasons why nobody commented here might be that the description appears to be focusing on a specific implementation (in SharpGLTF, specifically). Aren't the ideas described here (on a more conceptual level) also relevant (and being discussed) in the related issues? In any case: Fleshing out a solution that everybody can agree on and that can be forged into a proper (extension) specification is difficult...

@javagl althought it might seem that way, the implementation is not specific to SharpGLTF.

Typically, to render a glTF scene you traverse the node graph and you render meshes as you find them in the nodes.

What my proposal does is, instead of traversing the node graph, you iterate over a render command list, where every item is a "renderable instance".

In this way, it is easier to know what needs to be rendered for a given scene, and it makes much easier to implement LODs and mesh instancing.

What my proposal does is, instead of traversing the node graph, you iterate over a render command list, where every item is a "renderable instance".

Personally, I don't think this is a direction that glTF should be taking. Most engines and tools have some internal abstraction of a node graph, and need to load glTF files efficiently into that abstraction. While I can understand that a render command list might make sense for SharpGLTF and probably some other implementations too, I don't believe render commands are a broadly portable enough concept to bring into core glTF or to Khronos extensions.

The expectation of a fixed render order, based on the order of the commands, seems problematic in particular.

In regard to instancing, since this was referenced from https://github.com/KhronosGroup/glTF/pull/1691, I have some other concerns — one of the goals for that extension was to support arbitrarily large (1000+) batches of GPU instanced meshes. In this extension you've defined a MeshInstance as a single draw call, with its own visibility rule, which almost by definition suggests that 1000 instances should be 1000 draw calls. An engine could optimize this, by observing that sequential draws for the same mesh with the same visibility state can be batched, but now we're asking the engine to reinterpret render commands as something more abstract. For mesh instancing, then, I prefer an approach like #1691 that provides an indication of where batching is possible without explicitly specifying draw calls.

All that said, I'd encourage you to use a vendor extension for this purpose and to see what traction this gets with other tools. But for the glTF conversations that are being linked into this thread, like LODs, instancing, and stricter skinning, I would prefer that we continue with extensions following a more traditional node hierarchy-based approach.

@donmccurdy I can understand that moving from a node graph based rendering to a render command list based approach is probably too much and too late, so I will probably consider this proposal as rejected.

SharpGLTF does not need any extension, since it builds the render command list from the node graph at load time, and uses it to speed up rendering, it's that design tha made me consider proposing it as a glTF extension, not the other way around.

To some degree, what I am doing is to sandbox every glTF scene on load, so it's much easier to handle present and future changes. To me, sandboxing glTF scenes is a better approach than trying to import every glTF concept into an existing engine, because it's impossible to please everybody and makes nearly impossible to achieve a 100% accurate playback on every engine, which defeats the original purpose of "the jpeg of 3D"....

But hey, we live in an imperfect world, so if I have to choose between FBX and glTF, I choose glTF hands down 😸

Again, I'm not sooo deeply involved here and in the related discussion. But after reading the last few comments, it might be worth pointing out that glTF is primarily intended as a transmission/delivery format. It does not have to (and does not even aim to) reflect the actual process of rendering.

Of course, it is designed in a way that can easily be parsed and processed by a renderer. As such, I think that many engines will actually do something that resembles the classical recursion of rendering a scene graph: They'll walk through the scene graph and generate rendering commands. But instead of doing some glPushMatrix/glVertex3f, the rendering commands are collected into a list and then executed sequentially (including the sophisticated things like batching and minimizing state changes).

Maybe I lack some context of the related discussion, or missed the main point of your issue here. But can you summarize how e.g. the concept of MeshInstance and VisibilityRule would affect (or be represented in) the actual glTF format or one of its extensions? (I wonder whether this aims at completely replacing the scene/node/mesh hierarchy with a plain list of meshInstance objects...)

@javagl yes, I wrote this layout a while ago... probably I should had attached it to the initial post... but anyway, here it is:

The current glTF Scene layout looks like this:

Scene
  *
  ├─ Node[0]
  │   └─ Mesh[0]
  ├─ Node[1]
  │   ├─ Node[2]
  │   │   └─ Mesh[1]
  │   └─ Node[3]
  │       └─ Mesh[2]
  └─ Node[4]
      └─ Mesh[3]

Adding the extension to Scene would make it look like this:

Scene

  * (original graph, used only if extension is not supported)
  ├─ Node[0]
  │   └─ Mesh[0]
  ├─ Node[1]
  │   ├─ Node[2]
  │   │   └─ Mesh[1]
  │   └─ Node[3]
  │       └─ Mesh[2]
  └─ Node[4]
      └─ Mesh[3]

  * (mesh instance extension)
  ├─ MeshInstance
  │   ├─ Node[0]
  │   └─ Mesh[0]
  ├─ MeshInstance
  │   ├─ Node[2]
  │   └─ Mesh[1]
  ├─ MeshInstance
  │   ├─ Node[3]
  │   └─ Mesh[2]
  └─ MeshInstance
      ├─ Node[4]
      └─ Mesh[3]

Now, in order to take advantage of the extension to do stuff like instancing and LOD, it would look like this:

Scene

  * 
  ├─ MeshInstance                - Instancing Example
  │   ├─ Node[0]
  │   ├─ Node[1]
  │   ├─ Node[2]
  │   ├─ Node[3]
  │   ├─ Node[4]
  │   ├─ Node[5]
  │   ├─ Node[6]
  │   ├─ Node[7]
  │   ├─ Node[8]
  │   └─ Mesh[0]                  - this mesh will be instanced 9 times.
  ├─ MeshInstance
  │   ├─ Node[77]
  │   ├─ Mesh[1]                 - LOD 0 mesh for node 77
  │   └─ VisibilityRule[0]       - LOD 0 visibility rule
  ├─ MeshInstance
  │   ├─ Node[77]
  │   ├─ Mesh[2]                 - LOD 1 mesh for node 77
  │   └─ VisibilityRule[1]       - LOD 1 visibility rule
  └─ MeshInstance
      ├─ Node[78]
      ├─ Mesh[5]                 - LOD 0 mesh for node 78
      └─ VisibilityRule[0]       - LOD 0 visibility rule

Additional notes:

A MeshInstance could reference multiple nodes, which would allow drawing the mesh instances in a single draw call.
Rendering order is not imposed in any way, I am using a list because it's the obvious way of storing the data, but it can be reordered, rendered in reverse order, etc.
Notice that the latest mesh instance does not have LOD1; Mesh[5] could be a close view detail that should be rendered at LOD0, but has no LOD1 mesh.
VisibilityRules could not be limited only to LODs, they could also define Portals or BSP nodes, but in the end, any VisibilityRule will end being evaluated as true/false.

^{(Repeating the disclaimer annoyingly often: I'm not an expert regarding the implementation options for this, but...)}

It looks like (VERY roughly speaking) you're flattening the scene graph and handle some issues related to LOD support on the fly. Implicitly reversing the relationship between mesh and node: Right now it is node->mesh, and you're suggesting meshInstance->nodes[]. (This resembles an issue that was opened quite a while ago: https://github.com/KhronosGroup/glTF/issues/889 - I'd have to re-read it to see whether there's really an overlap on a conceptual level).

Not having followed the discussion about LOD in the related issues, I don't know whether this is in line with what others need for LOD implementations. The VisibilityRule seems to be a very generic approach, but may be described too vaguely to cast it into a spec that everybody can implement.

I'll have to leave it at that (because I don't think that I can say anything really profound here). But as Don said: If this was described "formally" as a proposed vendor extension, people who are more concerned with LOD could probably understand it more thoroughly and align it with other LOD proposals.

KhronosGroup / glTF

Mesh Instancing + LOD + Portals extension proposal #1660

How a client would render the scene?

Bonus tracks