Scene complexity limits

petrbroz commented 4 years ago

Hi everyone,

What is the maximum scene complexity that glTF has been designed for? How would you guys handle scenes with potentially tens of thousands of nodes and distinct meshes?

We'd love to adopt glTF, but for our models - especially in architecture/engineering/construction - we're often generating glTFs with manifests that are hundreds of MBs, sometimes even units of GBs, and we have yet to find a tool in the glTF ecosystem that could process them. We'd like to avoid having to consolidate the meshes if possible.

javagl commented 4 years ago

It might be a bit far-fetched, but IIRC, the https://github.com/KhronosGroup/glTF-Asset-Generator (which is currently mainly intended for compliance tests) was once also considered for generating "benchmark" tests - e.g. generating artificial extreme cases like "chains of 10000 nodes" or "nodes with 10000 children". @bghgary Do you remember details here?

(Some of the issues of glTF files being "too large" could be solved if there was an option to compose multiple glTF files - roughly related to https://github.com/KhronosGroup/glTF/issues/37 - but I think there is no established solution for that yet).

donmccurdy commented 4 years ago

I'm not confident I could estimate a maximum scene complexity for the format itself, or for the processing tools like glTF-Pipeline and gltfpack. Perhaps others have thoughts on where that limit would be.

But I will try to comment from the perspective of someone implementing the client/viewer/engine side of things — because glTF is designed for runtime transmission and viewing, I assume that an asset has been optimized such that rendering is possible without fundamentally rewriting the asset at runtime.

While models with tens of thousands of nodes are common in AEC, engines cannot make tens of thousands of draw calls. We'll do culling, of course, but even that is probably not enough in these cases.

We'd like to avoid having to consolidate the meshes if possible.

I agree, but could you say more about why? I assume (1) unwanted increase in total filesize, and (2) inability to manipulate individual objects, but would be curious if there are more specific reasons like per-object metadata.

It's possible that a proposal like https://github.com/KhronosGroup/glTF/pull/1691 would provide some of the scale you're looking for. It heavily optimizes reused meshes (e.g. bolts and screws), losing some information (full node hierarchy is no longer sent in JSON) while retaining the ability to manipulate individual instances in the application.

lexaknyazev commented 4 years ago

with manifests that are hundreds of MBs, sometimes even units of GBs

glTF loaders usually do not expect such amount of JSON data and this will likely cause issues with many implementations. Optimizing for such manifests would require glTF loaders to use SAX-style JSON parsing (instead of DOM). We should probably add this data-point to the Project-Explorer (/cc @javagl, @weegeekps).

weegeekps commented 4 years ago

@lexaknyazev How do you envision us representing this data point in the Project Explorer? I think there's a lot of value in indicating if a loader can support loading large files, but it seems like a more difficult thing to represent in a meaningful fashion. Two options that immediately come to mind, in no particular order of preference:

A boolean flag indicating the loader supports parsing large manifests (threshold around ≥100MiB? higher?)
A field describing how the JSON parsing works? Is it using a DOM parser or a streaming parser?

I can open an issue in the Project Explorer repo if it makes sense to continue this discussion there, as to not hijack this issue further.

lexaknyazev commented 4 years ago

DOM/SAX JSON parsing seems like a good start - it implicitly gives expectations wrt memory usage. Let's continue in the Project Explorer repo.

wallabyway commented 4 years ago

following from @donmccurdy: He said, ...could you say more about why? I assume (1) unwanted increase in total filesize, and (2) inability to manipulate individual objects, but would be curious if there are more specific reasons like per-object metadata.. and also suggested mesh-instancing a proposal like #1691

(2) yes, this. We 'Pick' (and also manipulate) an individual node and present associated meta-data. Being able to render an 'outline', or make the object 'transparent with coloring (ghosting)' on a per-object basis.

Total FileSize was also important: we pulled out the buffer to a glb, but it still gave us a big gltf file from the co-dependency between nodes, accessors, bufferViews to support per-object.

We have done some neat optimization work on mesh-instancing AEC models. It could be applied to #1691. Instancing certainly helps, but there's still too much variation as the AEC models get bigger, resulting in many-draw calls, again.

We've thought about preparing a mesh-consolidated version of the glTF file, so that it simplifies things for a glTF loader and the simplest "get something on the screen" requirement, but it moves our problem (per object) to the render-loop. We must break apart the consolidated-meshes back into 'per-object' features. We currently do this with 'skip lists'. One way, to achieve this in glTF, might be to put 'skip lists' and object IDs in glTF 'extras'. Then, external to the glTF-loader, the render loop would use the skip-lists to skip parts of the consolidated mesh buffer, and move them into a new consolidated mesh (made of ghost material) to achieve per-ObjectID shading. And for GPU picking, render consolidated mesh with objectID as 'color'. But pre-baked mesh-consolidation with skip-lists, doesn't work well with culling or HLOD. Would this be the recommended approach for AEC in glTF ?

zeux commented 4 years ago

From the gltfpack side I would be happy to fix bug reports that involve multi-gigabyte glTF files :) I don't see why they shouldn't work given a reasonable amount of memory available on the target machine. In terms of overall mesh complexity, the largest model I have available is a 300 MB 6M triangle model of Thai Buddha, and it gets optimized in 4 seconds with max RSS of 1.3 GB.

There are some deduplication algorithms in gltfpack that are quadratic in the number of objects, so if a scene has 100_000 unique materials for example, the processing time may become prohibitively large. This should be simple enough to fix given a test case.

In general I think it would be great to have test models that stress the limits on object count / node depth / material count / etc. as part of some repository like glTF-Sample-Models - it's pretty easy to get glTF models that have a lot of triangles, courtesy of Sketchfab, but scenes with lots of objects are harder to come by.

petrbroz commented 4 years ago

It might be a bit far-fetched, but IIRC, the https://github.com/KhronosGroup/glTF-Asset-Generator (which is currently mainly intended for compliance tests) was once also considered for generating "benchmark" tests - e.g. generating artificial extreme cases like "chains of 10000 nodes" or "nodes with 10000 children". @bghgary Do you remember details here?

(Some of the issues of glTF files being "too large" could be solved if there was an option to compose multiple glTF files - roughly related to KhronosGroup/glTF-Sample-Models#37 - but I think there is no established solution for that yet).

Thanks for the feedback! To work around the limitations at the moment we're already splitting the models into multiple glTFs, so having an official way to compose them would be an interesting approach.

petrbroz commented 4 years ago

I'm not confident I could estimate a maximum scene complexity for the format itself, or for the processing tools like glTF-Pipeline and gltfpack. Perhaps others have thoughts on where that limit would be.

But I will try to comment from the perspective of someone implementing the client/viewer/engine side of things — because glTF is designed for runtime transmission and viewing, I assume that an asset has been optimized such that rendering is possible without fundamentally rewriting the asset at runtime.

While models with tens of thousands of nodes are common in AEC, engines cannot make tens of thousands of draw calls. We'll do culling, of course, but even that is probably not enough in these cases.

We'd like to avoid having to consolidate the meshes if possible.

I agree, but could you say more about why? I assume (1) unwanted increase in total filesize, and (2) inability to manipulate individual objects, but would be curious if there are more specific reasons like per-object metadata.

It's possible that a proposal like #1691 would provide some of the scale you're looking for. It heavily optimizes reused meshes (e.g. bolts and screws), losing some information (full node hierarchy is no longer sent in JSON) while retaining the ability to manipulate individual instances in the application.

Thanks! As @wallabyway mentioned, the main reasons for keeping non-consolidated geometry are per-object metadata, picking, highlighting, potentially manipulation (e.g., exploding a mechanical model), etc.

petrbroz commented 4 years ago

From the gltfpack side I would be happy to fix bug reports that involve multi-gigabyte glTF files :) I don't see why they shouldn't work given a reasonable amount of memory available on the target machine. In terms of overall mesh complexity, the largest model I have available is a 300 MB 6M triangle model of Thai Buddha, and it gets optimized in 4 seconds with max RSS of 1.3 GB.

There are some deduplication algorithms in gltfpack that are quadratic in the number of objects, so if a scene has 100_000 unique materials for example, the processing time may become prohibitively large. This should be simple enough to fix given a test case.

In general I think it would be great to have test models that stress the limits on object count / node depth / material count / etc. as part of some repository like glTF-Sample-Models - it's pretty easy to get glTF models that have a lot of triangles, courtesy of Sketchfab, but scenes with lots of objects are harder to come by.

I'll see if I can get a couple of examples of larger AEC models that can be shared publicly. :+1:

javagl commented 4 years ago

A side note: I'd hesitate to put these kinds of models into the sample models repo: There are multiple "dimensions" along which complexity can be measured. The complexity of having a single mesh with several million triangles is largely unrelated to glTF itself (and whether a certain mesh could be simpler is another question - this somehow reminds me of https://github.com/KhronosGroup/glTF-Sample-Assets/issues/45 ...). Similarly things like the maximum size of a texture largely depends on the GL implementation.

But for glTF specifically, having benchmark models roughly like

1 node with 1000/10000/100000 children
1 chain of nodes with depth 1000/10000/100000
a tree with depth 5/10/15 where each node has 10/100/1000 children
Orthogonal (mix-in): Different configurations of whether these nodes have meshes
Orthogonal (mix-in): Different configurations of whether these nodes have animations (!)
...

could easily blow up the repo to dozens of Gigabytes.

I think having some small, handy command-line tool (maybe even with a very simplistic UI) where you can say "Generate!" and let the models be dumped into a target directory would be preferable. (It shouldn't be so hard, and I have some infrastructure for that, but the asset generator certainly has a better one here).

BTW: There are approaches for handling really large scenes, and the difficulties of per-object metadata, picking, highlighting, etc. The core idea is to store the information "to which object does this vertex belong?" as another attribute in the glTF attributes. But the details are probably beyond what can sensibly be discussed here...

wallabyway commented 4 years ago

@javagl - 1. right, not for the main repo, since this is still testing.

right, whatever AEC workarounds we come up with here, we need to be able to explain to others. We would like glTF to be able to handle AEC features, in a way that is easy to implement in the any glTF loader (and maybe render-loop). If mesh-consolidation with glTF attributes is ok, then we can try it, document it. Note that implementing AEC 'explode' will be much harder this way (https://github.com/wallabyway/floor-animation).

petrbroz commented 4 years ago

Paraphrasing @donmccurdy's earlier comment, I guess it boils down to the question of whether glTF is:

a viewing format that is assumed to be already optimized for viewing performance
a transmission format that may require performance optimizations by the viewer

If it's (1), then the mesh consolidation seems like the right choice. The question for us (@wallabyway) would then be: does it make sense to output our kind of assets to glTF, given the added complexity of features like picking or exploding of models with consolidated meshes?

bghgary commented 4 years ago

@bghgary Do you remember details here?

We filed an issue for the asset generator which came from another issue. We haven't yet done anything to address them.

wallabyway commented 4 years ago

ok, let's explore KhronosGroup/glTF-Sample-Models#1 for a second...

We can't generalize with a "one-size-fits-all" grouping, so we have to decide on a grouping when the glTF is created, based on the customer requirement:

For example,

group meshes by floor/level/zone/shell: helpful to quickly load a floor level. and view just the outside (the shell) of the building. see floor exploder example
group by meta-data: consolidate by properties like MEP (plumbing/ electrical/ mechanical ), structural, facade, interior, etc
group by shader materials: optimize render performance / draw-calls (little object isolation)
group by AABB size: optimize for loading-time mesh appearance during progressive-rendering
group by a morton curve: approximate spatial queries / octants, efficient range gets in a db

any others I missed ?

If we explore KhronosGroup/glTF-Sample-Models#2, we don't mesh-consolidate during glTF creation, resulting in (typically) 100k+ nodes, that gives more flexibility, but the glTF ecosystem struggles to deal with so many nodes for just a single asset.

petrbroz commented 4 years ago

Btw. we're still in the process of finding some sample datasets that can be shared with public (most of the large datasets we use for testing can't be used unfortunately).

One sample I found online was in O'Reilly's Learning Autodesk Navisworks 2015, specifically the Chapter 1/bathcity/north.nwd file. Converting this file with forge-convert-utils generates a glTF manifest of approx. 34MB. glTF and glb outputs can be downloaded here.

Another dataset, an official Navisworks sample "Ice Stadium". The glTF and glb outputs (with and without Draco) can be downloaded here.

wallabyway commented 4 years ago

you can try out North.nwd and "ice stadium.nwd" in a browser (three.js R71) here (click the last two thumbnails): https://wallabyway.github.io/toolkitServerv2/index.html

this is what they look like... navis

ice-stadium

We shouldn't be struggling with <9MB Navisworks files. Solving this would be an excellent first step to glTF adoption.

zeux commented 4 years ago

FWIW here's gltfpack-processed files that can be opened in https://timvanscherpenzeel.github.io/three-gltf-viewer/ (I packed them with -c to reduce .glb size for download so a viewer that supports MESHOPT_compression is required; rendering performance is the same either way):

autodesk.zip

Note, on bath-city-north a lot of geometry is missing but this is the case for output.glb file as well.

By default gltfpack merges all meshes with the same material, which obviously is suboptimal for models of this scale and nature for a couple of reasons (culling efficiency if the view isn't top-down-all-encompassing, ability to select individual blocks).

If the processing tool has intelligence wrt the desired semantics of the scene, it can merge less aggressively. In the online Autodesk viewer linked above I noticed that selection on bath city doesn't select individual house roofs and selects them in clumps so I am assuming there's some underlying structure that could be leveraged for more efficient output.

Instancing will probably provide a reasonable solution to this problem as well. I will try to find some time to implement a prototype that, instead of merging meshes, creates a KHR_instancing file. The advantage there is that you would be able to implement a viewer that actually individually manipulates the objects by manipulating the instance buffer data, although this needs to be somewhat specialized - I would not expect three.js to provide this out of the box necessarily?

Finally, it should be possible to implement a renderer that does all of these optimizations on the full original file without having to sacrifice selectability or reasoning about individual components of the input file by other tools. We do something like this at Roblox where we're used to scenes with hundreds of thousands of primitive blocks and having to dynamically aggregate / instance and efficiently render them. It should be doable in a browser, if it becomes a priority for some rendering engine.

wallabyway commented 4 years ago

Nice results! The mesh-merge approach gets the file size (8MB .glb file) and the 60fps render performance I was looking for. I'm guessing Draco would have similar file-size ?

I guess the second part then, is to provide the equivalent of a 'source map' side file. The source map, ties together the individual 'DBid' and a range of bytes in the merged mesh.

it's then up to the glTF loader/renderer, to implement the logic for 'highlight' an individual window, let's say. It would use the source-map to rip out the individual mesh, from the merged-mesh, and render a separate draw-call edge-shader (etc). An extreme case, would be to break every mesh into individual DBids for an explode animation.

It's not ideal (in my mind), but if this approach is what makes the most sense to the community, then we'll run with it.

Agreed, that bath scene isn't the best (the roof selection looks buggy). The windows are individually selectable, which is more typical of our AEC files.

We also do dynamic aggregate / instance stuff in our viewer too, but It's a fair bit of complexity for the community to adopt dynamic aggregate / instance just to view an AEC model. So maybe pushing the mesh-merge at bake time, with a pre-determining grouping, is the way to go?

zeux commented 4 years ago

It's a fair bit of complexity for the community to adopt dynamic aggregate / instance just to view an AEC model.

I feel like we could attack this problem from two angles.

It would be nice if the viewers could reason about the geometry individually, while retaining reasonably efficient rendering pipeline. This is where I am hoping KHR_instancing comes in - the files at the moment are really big just to download and parse unless you merge them. If the scenes in question can be instanced efficiently (that is, if the geometry used can be deduplicated such that there's few source meshes and a lot of instances of these meshes), then this can produce files that are reasonably small, can be rendered reasonably quickly, and individual objects are still present in the file as entries in the instancing buffer, so a dedicated viewer could individually change them if necessary.
For cases where there's a lot of distinct geometry and/or the ability to reason about individual objects isn't important, using mesh merging at bake time erases the distinction between meshes so that it's not recoverable, but it allows all renderers, including ones that don't support instancing (which requires WebGL 2.0 I believe?), to download and render the scene efficiently.

I don't think Draco helps in this case per se because by itself I don't think it merges meshes at all, and I'm not sure if there's a processing tool that supports this other than gltfpack which doesn't support Draco.

For instancing, cursory look at the files involved made it seem like there's many duplicates for the same geometry, so I was planning to write a mesh deduplication pass in gltfpack, followed by instancing pass that replaces nodes with instance buffer entries. Not sure how well this will work.

wallabyway commented 4 years ago

So we already have a very strong de-duplicator and a non-glTF protocol in production that does a good job finding instancing and the custom 3js viewer does a run-time mesh-merge etc.

For our large AEC scenes, it's important to reason about every individual object, and hence we would still need to recover the object from the merged-mesh - I'm suggesting to use a 'source-map' that cherry picks the byte ranges within a merged mesh. Autodesk supplies the source-map when the glTF is baked (merged-mesh). If your viewer can interpret the source-map, then it can cherry pick individual objects (and render a edge-outline overlay thingy).

re/draco: if same mesh-merge was applied, and then Draco-compressed, I'm wondering what the glb file-size, TTFP (time.firstpixel), wasm library file-size would be, compared to MESHOPT_compression ?

Also, are you thinking of adding MESHOPT_compression to https://github.com/atteneder/glTFast ?

zeux commented 4 years ago

re/draco: if same mesh-merge was applied, and then Draco-compressed, I'm wondering what the glb file-size, TTFP (time.firstpixel), wasm library file-size would be, compared to MESHOPT_compression ?

It's a bit non-trivial for me to do the correct test. because I don't know if any tool can merge the meshes except for gltfpack, and gltfpack quantizes the data - I don't know what to expect from running Draco on gltfpack-ed meshes, so I'm hesitant to conclude anything. I'd need to do a specialized experiment to verify this, e.g. by coercing gltfpack to output raw floating point data. In general, meshopt codec is much much smaller and much much faster than Draco, but it usually loses on compression ratio with results depending on the specific data; some numbers here https://github.com/KhronosGroup/glTF/pull/1702#issuecomment-557034744.

Also, are you thinking of adding MESHOPT_compression to https://github.com/atteneder/glTFast ?

This wasn't on my radar but it's a possibility, the integration tends to be pretty simple. There are some details wrt codec that I need to finalize as noted in the extension PR, so I'd be hesitant to do anything before that happens.

zeux commented 4 years ago

Took some time and hacked gltfpack to produce a simple merged scene with no quantization, and then used gltf-pipeline with default settings to convert this with Draco.

autodesk.zip

The results are pretty interesting, I haven't used CAD models for testing before. gltfpack performs really well on this, better than I expected compared to Draco.

You can open all 4 models in the viewer I linked earlier to judge the performance for yourself, but the short story is that it looks like, while I didn't expect this, gltfpack is actually stronger in terms of compression on these models [after deflate, which is my usual metric for web transmission and consistent with gltfpack codec design].

-rwxrwxrwx 1 zeux zeux  5198648 Dec 13 07:51 bath-city-north-draco.glb
-rwxrwxrwx 1 zeux zeux  4507218 Dec 13 07:51 bath-city-north-draco.glb.gz
-rwxrwxrwx 1 zeux zeux  7694468 Dec  7 07:09 bath-city-north-gltfpack.glb
-rwxrwxrwx 1 zeux zeux  2883774 Dec  7 07:09 bath-city-north-gltfpack.glb.gz
-rwxrwxrwx 1 zeux zeux 11676040 Dec 13 07:55 ice-stadium-draco.glb
-rwxrwxrwx 1 zeux zeux  9129104 Dec 13 07:55 ice-stadium-draco.glb.gz
-rwxrwxrwx 1 zeux zeux  8387888 Dec  7 07:08 ice-stadium-gltfpack.glb
-rwxrwxrwx 1 zeux zeux  1983768 Dec  7 07:08 ice-stadium-gltfpack.glb.gz

I believe that the reason why this happens, which I observed on some scenes before but not to this extent, is that gltfpack codec is more careful about encoding similar objects in a similar way, so deflate can take advantage of this much better than with Draco's bitstream that mostly obscures the structure from zlib codec, and doesn't have "repeated runs" support.

On bath-city-north (which is a bit less skewed wrt resulting size), Draco takes an additional 1.8 seconds on my system to decode the mesh data; gltfpack takes 85 msec in Chrome stable, and 51 msec in Chrome Canary (this activates SIMD decoding for some parts of the work).

So it looks like on CAD models, gltfpack approach is just substantially better on all axes? I did not expect this but I am happy :)

wallabyway commented 4 years ago

Added demo site for testing: https://wallabyway.github.io/gltf-AEC-fast/ You can drag each mesh around, to see how things are consolidated. I'll change this to an outline shader, which uses the built in hit-test. I'll need to customize the hit-test to pick individual objects. I added TSAA, global-clipping plane. etc.

wallabyway commented 4 years ago

One feature we require, is to pick individual objects/nodes (see gif)

stadium-selection

When meshes are consolidated (like Zeux gltf-pipeline does, by grouping by material type), you can no longer pick individual nodes/objects.

One workaround to this, is to color the vertices using COLOR_0 buffer. That way the meshes can still be consolidated, but when rendering, we can render the meshes with COLOR_0, to a target buffer, and use that target buffer for GPU picking. The COLOR_0 integer could represent the original node index integer (before consolidation), for example.

Then use a highlight shader to emphasis the individual node.

@zeux - Would the gltfpipeline tool keep the COLOR_0 intact during mesh-consolidation ? @petrbroz - can we get the SVFtoglTF tool to generate the COLOR_0 buffer, based on a node index?

is this 'hinting' technique too wasteful (large file-size) ?

donmccurdy commented 4 years ago

^It might be better to use a custom attribute _ORIGINAL_INDEX rather than COLOR_0. Otherwise any standard viewer will (by default) multiply those vertex colors against the material and texture colors, affecting the visible result.

zeux commented 4 years ago

gltfpack supports colors but doesn't support custom attributes right now. It would need to be changed to recognize this specific attribute since it needs to know the type, but that's pretty easy to do. I agree that a different attribute is preferable for a variety of reasons, including the type confusion (colors right now are specified to be floating point or normalized integers, not unnormalized integer).

In terms of file size, I'd expect that the attribute compresses REALLY well with MESHOPT_compression, and really any general purpose codec as well. So this might be a reasonable workaround.

wallabyway commented 1 year ago

Adding one more reference _BatchID https://github.com/zeux/meshoptimizer/issues/354

KhronosGroup / glTF

Scene complexity limits #1699