Mesh representation change suggestions, to allow subdivision surfaces, more efficient storage and more clarity

gnagyusa commented 6 years ago

Hello. My name is Gabor Nagy. I was one of the original 2 designers of Collada, and I started using glTF2.0 a few months ago, in EQUINOX-3D. As you know, we also use it at Facebook, where I'm a 3D graphics lead. The format is great! It's super easy to parse, and the spec is nice and clear, but if I may, I'd like to make a couple of suggestions that would improve flexibility and clarity, and would allow for new features, like subdivision surfaces:

1) It would be awesome to support separate index arrays for POSITION, NORMAL, etc. Currently, exporters have to store multiple copies of vertex positions in many cases, producing a "disconnected polygon soup", rather than a clean, connected mesh. In addition to increasing file sizes, it doesn't allow easy mesh vertex identity checks that are needed for subdivision surfaces / mesh edge data saving, closedness tests, etc. Instead of just comparing integer indices, import tools have to compare float triplets for equality (kind of a dirty business, with epsilons and sign checks :)), to determine vertex identity. E.g. if 12 polygons share a vertex (position), but the normals are different, the vertex position (3 floats) must be replicated 12 times. That's a 144 bytes, instead of 12, for the same mesh vertex. While this is how current GPUs need the data, and it's usually ok to waste RAM on potentially thousands of duplicated vertex positions, it can be a problem when the data is transmitted over the internet, especially on mobile platforms with bandwidth caps and extra fees. Often, vertices need to be split or otherwise rearranged on input (e.g. if normals are missing, and hard normals must be generated), so the vertex array will change anyway, before it gets to the GPU. Also, future generations of GPUs may allow separate index lists for vertex positions, normals, etc.

The current format:

[
  {
  "attributes"
   {
     "POSITION":0,
     "NORMAL":1
   }
  "indices":2,   // This is outside the "attributes" scope, so only one index array can be specified for all attributes!
  }
]

This alternative would be the "best of both worlds", it would allow a separate position index array, while still allowing the use a single index array, as well:

[
 {
  "attributes"
  {
   {
    "POSITION":0,
    "indices":1,  // Position has its own index array, so no need to duplicate vertex positions, and mesh vertices can be easily checked for identity, by comparing integer indices
   }
   {
    "NORMAL":2
    "indices":4,  // We can use a separate index array for normals, or we can use the same as for position, by using the same value here.
   }
  }
 }
]

Non-repeating vertex positions would allow us to store mesh edge attributes, like "hardness", which is needed for (creased) Catmull-Clark subdivision.

2) It would be great to have a clear separation between texture-space tangents (used for normal mapping) vs. geometric tangents (used for anisotropic shaders). There are 2 texcoord sets (TEXCOORD_0 and 1), but there's only one TANGENT semantic, which would imply that it's for geometric tangents, but all the examples I've seen, use those for texture-space tangents. Ideally, texture-space tangents should be packed with their corresponding texcoords. Texcoords could be either VEC2 (S, T) or VEC5 (S, T, TgX, TgY, TgZ), which should be indicated in the accessor. The current system seems a bit confusing and incomplete. For example, what if a model uses as anisotropic shader with a normalmap, and thus it needs both geometric tangents, and texture-space tangents that are different? Or, if there are two normalmaps (e.g. a low-frequency + detail). An anisotropic shader + 2 normal maps would need 3 tangent sets. It's not clear which tangents should be stored in the single TANGENT semantic. And what about the other two tangent sets? There doesn't seem to be a way to store them, at all. This is why Collada had different semantics for geometric tangents and texture-space tangents.

3) I couldn't find a way to specify which texcoord set should be used for a particular texture, when rendering a mesh. The PBR material allows up to 5 textures (baseColorTexture, metallicRoughnessTexture, emissiveTexture, occlusionTexture, normalTexture), but there are only up to 2 texcoord sets. Do all 5 textures have to use the same texcoord set, via TEXCOORDS_0? But, then what is TEXCOORDS_1 for? I see that textures refer to samplers, but samplers only specify filter and wrapping options, but not the texcoord set to be used.

4) A minor thing. texcoords are referred to as "UV", but the proper names for texture coordinates in OpenGL are S and T. U and V are generic, "natural surface parameters" that may or may not be used for texture mapping. An unfortunate confusion in the industry, like some folks calling mesh bitangents "binormals" :)

Thank you, and please keep up the great work on this awesome new standard!

emackey commented 6 years ago

Thanks @gnagyusa. I'll start with the quick answer for point 3, and let others chime in on some of the design philosophy on the other points.

Texture set number is specified in textureInfo. So for example an occlusionTexture using TEXCOORD_1 might look like:

            "occlusionTexture" : {
                "index" : 0,
                "texCoord": 1
            },

But, ecosystem support for this is still under development. I've been told ThreeJS is currently limited to using TEXCOORD_1 only for the occlusion texture and none of the others, but there are longer term plans to implement support for the other texture types. The Cesium team has not yet implemented any support. I think the BabylonJS folks have full support for TEXCOORD_0 and TEXCOORD_1, but I haven't tested it myself.

gnagyusa commented 6 years ago

Thanks @emackey. That's awesome about 3)! I figured there must be a way, but I guess, I missed it somehow :) BTW. It's great to have such an easy-to-parse open format. My glTF plugin code is only around 3000 lines, including the JSON DOM parser. I love the fact that the entire glTF plugin binary for EQUINOX-3D is <50kB. To put that in perspective, the FBX plugin is 14MB (using the Autodesk SDK), and the Collada plugin is 430kB. Now, if we could just add n-gon and subdivision surface support, we'd have a very powerful publishing format that could beat USDZ! :)

lexaknyazev commented 6 years ago

Thanks @gnagyusa for thorough feedback!

1

I think that multiple indices support could be implemented as an extension to the current spec. Since it would require CPU processing before uploading data to the GPU, I'd expect that not all clients (especially on the web) would implement it.

it can be a problem when the data is transmitted over the internet

Draco mesh compression solves this by using quantization, prediction and entropy coding.

2

VEC4 (VEC3 + SIGN) TANGENT data has been added exclusively for normal maps. So tangent space is defined by

NORMAL
TANGENT
cross(NORMAL, TANGENT) * sign

At that point, the core spec didn't have (and still hasn't) notions of LODs, low/high frequency maps, or anisotropy.

Adding these features is certainly possible. This should be done as app-specific extras or (multi-)vendor extensions first, and then they could be promoted to Khronos extensions (or even integrated into new spec revisions if there's enough interest).

4

I'm afraid that simply replacing all "UV" occurrences with "ST" would cause more confusion especially among first-time readers. Nevertheless, pull requests harmonizing terms and language are always welcome!

gnagyusa commented 6 years ago

Thank you @lexaknyazev for the info. 1) That's an interesting idea. I think, if it was an officially accepted extension to allow index lists under each attribute, as well as outside (as it is now), that would be enough. It would mean that the attributes that do not specify their own index list, would use this shared index list.

"attributes"
{
  {
    "POSITION":0,
    "indices":1,    // POSITION has its own index list
  }
  {
    "NORMAL":1,  // NORMAL doesn't specify its own index list, so it uses the shared one below
  }
  {
    "TEXCOORD_0":2,  // TEXCOORD_0 also uses the shared index list
  }
 ...
  "indices":1,
}

We can decide later, if we want to deprecate the current format and mandate specifying the index list for each attribute, or keep both options. Once we have clearly identifiable mesh vertices, we can introduce an "edges" element with "start" and "end" vertex, and "edge hardness", etc., for Catmull-Clark, then introduce n-gon, or at least triangle and quad support, as Catmull-Clark works best with quads. The problem with Draco is that it's lossy. I think, if we could save bandwidth, but stay lossless, that would be very useful. For example, proper, position-indexed mesh support might help glTF become the standard for 3D printing as well. One certainly wouldn't want to use a lossy format for 3D printing, especially where part tolerances matter. The lack of position-indexed mesh support plagues the current de-facto standard, STL. It's a binary format, but it stores a disconnected triangle soup, repeating vertices, and bloating files to several times the size they need to be, and hitting the upload size limit for 3D printing services, at much lower triangle counts than they should.

2) Fair-enough. I think, extensions for such core features might "pollute" glTF too much though. It might be better to add them to the standard step-by-step, in a more or less backward-compatible way: Step 1 - Allow TEXCOORD_x to have either VEC2 or VEC5 type, to support storing texture-space tangents with their corresponding texcords. Step 2 - deprecate the use of TANGENT for texture-space tangents, and give tool developers time to switch to VEC5 TEXCOORD_x instead. Step 3 - change the meaning of TANGENT to geometric tangent, for anisotropic shaders, and possibly allow anisotropy in the PBR shader. I think, the new format would actually simplify the code to read/write glTF, as it wouldn't have to worry about keeping two separate arrays in sync. I know, the current glTF shader doesn't support this, but some 3D apps, allow for more than one normalmap with different resolutions and mappings, so they need multiple sets of texture-space tangents. E.g. a low-frequency map, and a tiled, high-frequency detail normal map. The new format would support this too. As a side note: maybe I'm missing something, but isn't it wasteful to store the handedness of every single tangent? We could just store a single float, or even a boolean in the vertex attribute field. That would reduce the tangent array size by 25%.

4) Fair-enough :)

Thank you!

lexaknyazev commented 6 years ago

1) For the JSON example to be syntactically-correct, attributes field needs to be an array of maps:

"attributes":
[ // <-------
  {
    "POSITION": 0,
    "indices": 3, // POSITION has its own index list
  },
  {
    "NORMAL": 1, // NORMAL doesn't specify its own index list, so it uses the shared one below
  },
  {
    "TEXCOORD_0": 2, // TEXCOORD_0 also uses the shared index list
  }
],
"indices": 4

or a map of maps:

"attributes":
{
  "POSITION": {
    "values": 0, // index of the accessor with vertex data
    "indices": 3, // index of the accessor with indices data
  },
  "NORMAL": {
    "values": 1, // NORMAL doesn't specify its own index list, so it uses the shared one below
  },
  "TEXCOORD_0": {
    "values": 2, // TEXCOORD_0 also uses the shared index list
  }
},
"indices": 4

The former would require breaking the schema (thus completely impossible within glTF 2.x lifecycle), while the latter could be made somewhat-compatible with the current design by using JSON-schema polymorphism (so it could be done in theory with glTF 2.1, also please see the spec about asset.minVersion):

"attributes":
{
  "POSITION": {
    "values": 0, // index of the accessor with vertex data
    "indices": 3, // index of the accessor with indices data
  },
  "NORMAL": 1, // NORMAL doesn't specify its own index list, so it uses the shared one below
  "TEXCOORD_0": 2, // TEXCOORD_0 also uses the shared index list,
},
"indices": 4

Proposed features (like edges or subdiv) seem to be oriented more towards interchange / DCC use cases rather than primary glTF goal - runtime delivery. 2) One of the main glTF's design goal is to keep loaders/importers as simple as possible. For example, one could map the whole buffer view to a GPU buffer and bind all attributes without any data processing (with notable exceptions of sparse accessors and compression extensions). Introducing VEC5 accessor type to the core spec would make that impossible (so I'd prefer separate interleaved VEC2 + VEC3 attributes).

gnagyusa commented 6 years ago

Hi @lexaknyazev. 1) You are correct, of course. I think, your second option is better than the 3rd one, because it's more consistent, and it would keep parser code simpler. As for subdivision surfaces, most modern game engines support them in runtime, with view-dependent continuous LOD (done on the GPU with tesselation shaders), so it's considered a pretty standard runtime feature nowadays. 2) I think, we might be talking about the same thing here, but to clarify: For example, EQUINOX-3D, I always use interleaved vertex data on the GPU. E.g., each vertex may consist of a VEC3 for position, a VEC3 for normal, and a VEC5 for texture coordinates + texture-space tangents. I just specify the vertex stride as 44 bytes (sizeof(float) * 11), when I call glVertexPointer(), etc., and use the appropriate starting offsets. As long as we couple the texture-space tangents together with the corresponding texcoords, in the JSON, it's all good. I think, introducing a VEC5 type for TEXCOORD_x accessors would make it the easiest to parse, but I'm open for other kinds of markup. We could even call it an "STXYZ" or "UVXYZ" (sic) type :), to make it more clear. Thank you!

gnagyusa commented 6 years ago

Hi @lexaknyazev 2) Here's an example accessor for what I meant:

"accessors":
[
  {
    "name":"texcoords_with_tx_tangents",
    "componentType":5126,
    "count":42,
    "type":"VEC2_VEC3"  // Aggregate type of VEC2 (texcoords) and a VEC3 (texture-space tangent)
...
  },
]

We could even use a map, to indicate that it's an aggregate type (i.e. "C struct") of a VEC2 and a VEC3. For example:

"accessors":
[
  {
    "name":"texcoords_with_tx_tangents",
    "componentType":5126,
    "count":42,
    "type":
    {
      "VEC2":"ST",               // Texcoords
      "VEC3":"TEXTURE_TANGENT"   // Texture-space tangent
    }
    "bufferView":0,
    "byteOffset":0
  },
]

This almost looks like a C struct definition, so it would be intuitive for most engineers, although, it would be more complex to parse than just saying "type":"VEC2_VEC3" or "type":"ST_TANGENT". And, we might want to specify "componentType" separately for the fields, if we go down this road. That would complicate things even further. So, this approach might be overkill at this point. Thanks!

msfeldstein commented 6 years ago

for 1, (separate indices for position vs normals/texcoord) is there a way to render things like this that have hard creases on the GPU without converting back to what the current spec says (ie duplicating the vertices for each face before uploading to gpu).

It seems like an incredible waste of transmission space, as well as gpu memory for renderers that don't use openGL

gnagyusa commented 6 years ago

On current GPUs, you can only send a single index list (e.g. via glDrawElements()), so you potentially have to waste a lot of RAM, by replicating vertex data, if any of the vertex attributes are different. It's not even just an issue of hard normals. If, for example texture mapping is not continuous between polygons that share a vertex, you have to replicate the vertices. This is even more common than hard normals. Yes, it can be very wasteful for transmission.

cesss commented 5 years ago

I was going to adopt GLTF2 as a substitute for OBJ, for global illumination scenes, because of its completeness, but I really need quad faces, not only for subdivision surfaces, but for FEM meshes also, and for radiosity solutions computed on quads. The fact that quads are not supported on GLTF2 is a show-stopper for me to adopt it. I hope this will added at some point, but in the meantime I have no other option but to use other formats.

gnagyusa commented 5 years ago

I agree. The lack of support for n-gons (quads etc.) is a show-stopper for me too. It also prevents using subdivision surfaces, which is a standard feature in most renderers now.

KhronosGroup / glTF

Mesh representation change suggestions, to allow subdivision surfaces, more efficient storage and more clarity #1362

1

2

4