Progressive glTF: Exploring Ideas

johannesvollmer commented 1 year ago

Hi!

For 2d images on websites, it is well known that so called progressive jpegs provide the best possible user experience. What are these? They allow the browser to display a low-resloution placeholder almost immediately, while the image is still loading. It gets more refined as the rest of the image file is loaded.

I have similar hopes for glTF. Our use case would be to display placeholders while the file still loads from the web or from the hard drive to the GPU. This is important: Even loading uncompressed data from SSD to GPU can take a while, especially for large textures and complex meshes, so I'm interested in this even for files even if they are already downloaded.

There are already a few things you can do to achieve a similar effect right now. Some ideas would be possible with extensions. I'm curious what you think, let's discuss :)

Definitely Possible:

load json first and display bounding box placeholders, maybe for individual nodes (probably only glTF, not glb)
load meshes and display them before loading textures
use multiple gltf/glb files and load the smallest first, replacing previous placeholders
load some nodes before other nodes (which ones to load first? maybe largest bounding box size first? this might be easier when standardized with a non-required extension?)

Maybe possible?

use progressive jpeg textures and already display the low-resolution version while still loading the remaining texture? is it allowed in the specification?

Extensions/Extras?

texture.extensions.previewColor to display a solid color instead of the texture while it is still loading? Like a 1x1 mip map? or maybe even 8x8 values directly in the json?
include an LOD mechanism that allows the runtime to choose one of multiple meshes?
maybe a generic placeholder extension that applies to many objects (nodes, meshes, textures, ...) that can be used as an approximation? it would represent the same data but a smaller copy, for example an 8x8 png or 200 vertex mesh or a simplified node with an sinplified mesh for multiple children?
node.extensions.allowProgressive to indicate that this node hierarchy is allowed to be displayed partially while still loading? node.extensions.showChildrenAsLoaded to indicate that the first child is the most important one?

Other Questions

Are there more possibilities? Are there ways to load a few bytes of a glb file and already display some data while the file is still being transmitted? Are the glb contents ordered?

In general, when prioritizing all heavy binary data, what order to load then would be the most beneficial? In other words, which binary resources are the most visually important? Which parts can be loaded in parallel? Is it possible to determine a general heuristic which data to load first? All geometry and then all textures, or some nodes before others? Are there states that are not useful to display?

Maybe, compute a visual importance factor and a predicted loading time for each binary asset, and then sort according to that? The visual importance factor could be a combination of bounding box size, data type (geometry probably being more important than textures), whether it is animated, whether it is rather transparent, and similar properties.

I'm excited to hear your ideas! :) Is anyone else seeing value in this?

johannesvollmer commented 1 year ago

I think it's important to not load an arbitrary file incrementally, because I think most files will not look nice when some nodes are displayed while others are still loading. This should definitely be opt-in per glTF file or node. Especially since this means that animations are either delayed until everything is loaded, or playing while some nodes are still missing

johannesvollmer commented 1 year ago

the Microsoft LOD extension allows node variations as well as material variations.

The material variations can not only be used for smaller textures, but also for solid-color placeholder materials. Neat!

johannesvollmer commented 1 year ago

It might be helpful to write a small tool that sorts the binary sections in a glb file according to the Microsoft LOD levels, such that a lower LOD can be displayed while the file is still being transmitted

johannesvollmer commented 1 year ago

basis_u KTX textures have mip maps. are the stores in a particular order in the file? starting with the smallest mipmap?

javagl commented 1 year ago

You already point to one extension, but there are many extensions that propose or talk about LOD, in one form or another. Reading through https://github.com/KhronosGroup/glTF/issues?q=is%3Aissue+lod may take some time (and will not even cover everything that might be related, but only looks for the term "LOD"). Digesting the information, and analyzing, structuring, and consolidating all ideas, so that they eventually become solid extensions is certainly not easy.

And... you already extended the question from plain "LOD" to the broader one of "progressive" approaches. It can be useful to talk about "progressiveness" in that broader sense, and see whether there are connections or interdependencies between possible solutions. But at some point, the discussion has to "zoom in" to the technicalities of these topics. You can look at the threads that talked about one, very specific form of "progressiveness" (namely LOD), and see that these already are looong threads with controversial discussion about what might be "The Best" solution.

Some of the points that you mentioned are purely in the responsibility of the client, and may not require any additional steps.

From the Definitely possible section:

load json first and display bounding box placeholders, maybe for individual nodes (probably only glTF, not glb)

This should also be possible for GLB: One can examine the GLB header, then load the JSON, and process it roughly as-if it was a plain JSON glTF.

Beyond that: Clients could already load meshes, and leave them untextured, before loading the textures. (This reminds me of https://github.com/KhronosGroup/glTF-Tutorials/issues/24 , where I described exactly that in the linked gist at https://gist.github.com/javagl/bfde5cfab4240843120ed6eb38f4af87#implications-for-implementations ...). They could also load the meshes first, and later load skinning information or animation, and even exploit the information from the glTF JSON for that.

This could be implemented in a really clever way. For example, when they know that the meshes (vertex attributes) are in the range of [10MB...110MB] of a 2GB GLB file, they could request that chunk explicitly from a server, with a Range Request, and request other parts later...

Other approaches that you mentioned:

use multiple gltf/glb files and load the smallest first, replacing previous placeholders load some nodes before other nodes (which ones to load first? maybe largest bounding box size first? this might be easier when standardized with a non-required extension?)

There are some solutions for things that are roughly similar to this. One that I'm aware of is https://github.com/CesiumGS/3d-tiles , which builds an infrastructure for exactly these tasks on top of glTF: It defines a hierarchy of nodes, where each node can contain glTF.

^{Disclosure: I'm involved in 3D Tiles to some extent, but am not proposing it as a solution, not advocating for using it, and not speaking on behalf of the company that created it. I'm only pointing out that this exists.}

From the Maybe possible section:

use progressive jpeg textures and already display the low-resolution version while still loading the remaining texture? is it allowed in the specification?

I think that this is not explicitly disallowed by the specification, but could imagine that it is not entirely trivial to implement on client-side, and is only applicable under the very specific condition that the texture indeed is a progressive JPEG.

From the Other Questions section:

Are there more possibilities? Are there ways to load a few bytes of a glb file and already display some data while the file is still being transmitted? Are the glb contents ordered?

Back in the early days of glTF (1.0), there had been a proposal for a glTF extension that tried exactly that: https://github.com/KhronosGroup/glTF/issues/364 . The "SRC" approach is, very roughly speaking, to have a "stream" of geometry data that contains refinement information on a very low level. That is: The LODs are not stored as separate meshes. Instead, the data stream starts with the "simple mesh" information, and then contains low-level, GPU-friendly data that describes additional vertices. The approach itself is described via the linked issue, but I'm not sure whether there have been efforts to support this as a glTF 2.0 extension.

what order to load then would be the most beneficial? In other words, which binary resources are the most visually important?

Here is a screenshot of a glTF asset that contains 4 million vertices:

box

Nah, it's not the box. The box one only contains 8 vertices. But inside that box is a "HappyBuddha" model with ridiculously high resolution.

The point is: It's difficult. Eventually, what is "visually important" will always depend on the camera, i.e. the current viewpoint configuration of the client. Bounding box sizes may give a hint, but in any case, the underlying model has to be structured in a form that allows exploiting any of this information.

donmccurdy commented 1 year ago

KTX textures have mip maps. are the stores in a particular order in the file? starting with the smallest mipmap?

KTX2 is designed to support streaming, yes.

Part of what makes this very complex is that different domains require different types of progressive loading. A simple web viewer displaying a single model might stream that model in progressively in its entirety. A game might stream nodes in and out depending on distance from the character, performance dips, or other factors. Geospatial applications get into some particularly complex cases. Applications may be treating glTF assets as discrete objects in the scene, as a complete representation of the entire scene, or may be streaming glTF assets into larger batching systems within a scene. Suitable solutions for these cases diverge, in my opinion.

In general I think glTF should be (and mostly is?) amenable to streaming, but standardizing how and when applications do progressive loading may be a bridge too far for this standard.

Progressive loading can of course be defined by standards that contain or reference glTF data, including 3D Tiles, USD, HTML's <model/> tag proposal, or the glXF proposal.

johannesvollmer commented 1 year ago

Thanks for the insights! Agreeing with all of your points. I'm not looking to standardize these ideas, just looking for inspiration for what could be done, even if only in the clients :)

KhronosGroup / glTF