Stricter Skinning Requirements

KhronosGroup / glTF

glTF – Runtime 3D Asset Delivery

Other

7.07k stars 1.13k forks source link

Stricter Skinning Requirements #1665

Open marstaik opened 4 years ago

marstaik commented 4 years ago

After messing around with the gltf spec and various engines, I feel there are quite a few cases with the current specification that make it extremely difficult for most importers to handle with their own internal rendering engines. Either this leads to a bunch of extraneous work and guessing on the importer side, or leads to duplication of data by the exporters.

Here are some ideas that I believe could tighten up the specification:

1) The skeleton root should be defined, otherwise the direct parent of the highest joint in the skin hierarchy must be used. The direct parent does not have to be a joint, to allow for multi rooted skeletons. Note that the direct parent may be the root, but it should not be the root as it was before when the skeleton property was undefined.

2) All joints in a skin must form a connected subtree with the skeleton root/direct parent (1) This means there lie no non joints (or other skins joints) as connecting joints between skin joints. This however still allows you to have other nodes attached to joints, as long as no subsequent joints of the skin follow. A lot of engines treat the skeleton/skin as a single entity, and allowing for non joints embedded in the tree of joints makes the engine have to create a bunch of shadow bones to get things to work properly.

3) A mesh can only bind to a single skin I know this is pretty implicit, but I believe it should be defined.

4) As a per previous discussion here: https://github.com/KhronosGroup/glTF-Blender-IO/issues/566#issuecomment-523584953 All meshes skinned must be normalized to the local space of the skeleton.

Now I have another requirement that I am torn between two options: (5) and (6)

5) Each joint shall belong to only one skin This is the ideal choice as it makes things extremely simple for importers as they do not need to resolve/union skinning trees to find the master skeleton/skin (see below)

6) Each skin shall either define a new tree of unused joints, or, explicitly be a subtree of a previously defined skin. This allows for multiple skins per skeleton, but the subtree definition implies that all skins for a single skeleton must have a "master" skin that holds all of the joints for a skeleton. This makes it easy for an importer to map smaller skin definitions to a master skeleton containing all the joints.

vpenades commented 4 years ago

Reusing nodes on multiple skins should not only be allowed, but it is also a great feature!

Consider a character model with 150 bones, made with a mesh and a skin; there's toolsets that might split the skin and the mesh to allow loading it in engines that have a limited number of bones per shader. For example MonoGame has a limit of 72 bones per skin. And when you do the split, you really, really need that the extra skins keep pointing to the original nodes, so yes, you do need to share nodes between skins.

There's more use cases where you want multiple skins to share nodes, consider a model with multiple LODs; several LOD meshes of a character would use their own skins, poiting to a commonly animated skeleton.

About requiring "Skeleton", I think it's been discussed before, and I believe the conclussion was that it was just giving redundant information that could be calculated from the nodes themselves. Truth is I am doing full skinning animation and I don't need it at all.... so I believe requiring Skeleton would actually complicate things, specially when Skeleton is defined in a way that conflicts with what can be inferred from the node tree.

Trying to understand how skinning works is difficult given the way the schema has been laid, the fact that a skinned node is defined within a node is very misleading, so I came with these thoughts:

Nodes are just WorldMatrix providers, nothing more, nothing less.
Every Node with a mesh can be interpreted as an MeshInstance that needs to be rendered.
A MeshInstance without skin will be transformed by the node that contains it.
A MeshInstance with a skin will be transformed by the nodes pointed by the skin.

So a renderer just needs to do this:

calculate the world transform of every node in the scene into a Matrix[] Table.
for every mesh instance (a node with a mesh):
- if the mesh has no skin, upload just one matrix from the table.
- if the mesh has a skin, upload all the matrices from the table pointed by the skin.

As you can see, you don't need the Skeleton property anywhere in the process.

marstaik commented 4 years ago

I agree that the skeleton property isn't useful, if constraints are still met.

Imagine Node A having children B, C, and C has children D, E.

Nothing in the spec prevents me from exporting a skin with nodes B, D, E with no skeleton property defined.

A
| \
b C
   | \
   d e

What is an importer supposed to do in this case? You have three disjoint sets of nodes, of which, they aren't even on the same hierarchy/level. How is an importer supposed to correctly interpret that?

An example of a strange solution is to duplicate C into a phantom bone C and create a multi rooted skeleton using B C D E. Then you can parent C to C, and have C steal C's transform. Remember, C may not even be a joint, or maybe it is, of a different skin. Or what if it's a mesh? How do I treat those joints which belong to a single skin as an entity I can't represent correctly in a node tree?

Now, I understand your argument for multiple skins per joint hierarchy, and that's why I would like to point out #6. You can still have multiple skins, but they would at least have to be a subset of an explicit node tree, or, each be a separate skin that no one used.

Example using #6 and above points (and assuming they are all joints) Skin #1: A B C D E Skin #2: B Skin #3: D Skin #4: E

The current spec allows for a bit too much flexibility, and then the importer has to handle a bunch of additional strange cases.

vpenades commented 4 years ago

I believe the current spec already says that all the nodes of a skin must share a common "parent" node, so there should be no need for a "skeleton" node.

But in my experience, even the requirement of having a common ancestor is not needed!, remember, as I said before, Nodes are just world transform providers.

Consider this pipeline:

     Node tree
       🡇
  World Matrix Table
       🡇
     Skins

First, you calculate the world matrices of ALL the nodes of the scene into a world matrix table (you can even use the node indices to build a flat list), once you have the world matrix table, you no longer need the nodes for anything, you just have a bunch of matrices in their final, world space positions.

Second, you loop through the MeshInstances, picking the matrices you need from the matrix table.

I'll give an extreme example: Imagine a scene with two character who happens to be a siamese couple, so both characters are made with just one mesh and one skin.... but pointing to two full skeletons, each with its own root node. So what would be the problem? in the end, all these nodes boil down to a plain list of world transforms!!

So in the end, how the nodes are arranged is completely meaningless, as long as you precalculate the world transforms, before traversing the mesh instances.

vpenades commented 4 years ago

@marstaik Okey, to be fair, I believe the problem is, as you suggested, in importing glTF into existing engines that might interpret how nodes and meshes and skins are related to each other.

I recently suffered this problem when loading glTF files into MonoGame. As it happens, MonoGame's default Model object only supports one skeleton tree and one skinned mesh (with up to 72 bones) per model, and these limitations are impossible to overcome.

And I've seen more engines that try the "node centric" approach; that is, Node trees rule over how the scene is rendered, and that approach is very limiting.

In the end, I made my own monogame model, which is "mesh instance" centric, and nodes are only used to provide the world transforms, and suddenly, all the problems were gone.

marstaik commented 4 years ago

@vpenades I'm not trying to be rude but please hear me out:

What is the point of a scene node hierarchy of you just chose to ignore it's existance?

What you define isn't a method for exporting and importing scenes. You've put down a completely custom implementation that discards the scene graph. And that's fine to do, but that doesn't mean it should affect the standard (as you said for MonoGame)

Your Siamese couple may have two full skeletons but I would hope they at least have a common direct parent. Then, in that case it is a single skin with multiple roots that can indeed be one skeleton. But they start at the same hierarchy level.

Maybe when I get back from vacation I'll draw diagrams to better represent these issues.

The main issue is as you said that most engines have strict requirements.

But that's an easy solution, just force every skin to be a complete tree and you fix all these issues.

vpenades commented 4 years ago

@marstaik I didn't say I ignore the existence of the node hierarchy, I said that once you have baked the node hierarchy into world matrices, the hierarchy itself becomes irrelevant, only the world matrices matter, because after all, the Skin object only needs world matrices to do its job, the skin doesn't know or care from where these world matrices come from, or which was the relationship between them.

marstaik commented 4 years ago

@vpenades I am sorry, I am still confused with what you are trying to say. What's the point of doing this? The skin should already have precalculated IBMs and the nodes themselves have transforms. Why are you calculating the transforms in the first place?

Also the skin object is a definition, not an actual entity. 90% of all implementation will turn the skin into a skeleton, because that's what is expected of a series of linked nodes defined as joints.

That's how we create and target animations efficiently across multiple skeletons. The animation channels target said joints, which most engines will bake down to the skeleton level.

But the main idea I'm trying to convey is that almost every engine treats a skin as a skeleton, as a skin defines nodes as joints. Most importers expect a skeleton to be a subtree of some sort. The current spec allows for Hodge lodge monstrosities to exist that really don't have any benefit.

vpenades commented 4 years ago

@marstaik I believe the problem lies in treating a skin as a skeleton. A skin binds a mesh to a selection of joints of a skeleton tree, but it is not a skeleton in itself. If importers expect a skin to be a full skeleton by itself, well, I am sorry but it is a wrong, or at least, an outdated approach.

What you call monstrosities I call a beautiful design. You see, I've been struggling with skinning for years, most formats before glTF used to handle skinning as if they did not know very well what they were doing, almost with fear, or by adding a lot of restrictions to cope with the limitations of the engines they were indended for.

But glTF is the first format that treats skinning as a first class citizen, and handles it with a standarized approach, so no wonder many engines, used to handle older formats with more limitations, now struggle to handle glTF.

So what now, we move engines forward to adopt glTF's standarized skinning? or we move glTF back because some engines have many limitations? then I would vote to limit skins to have up to 72 bones then.

Why should I not be allowed to create a model like this which happens to break every single point in your list? Yes, today some engines are not able to render it correctly. But some others can, and over time, there will be many more.

My I ask which engine are you using?

marstaik commented 4 years ago

The file you provided fails to open in blender and Houdini, can you post the separate gltf json + binary glb?

I've used a variety of engines, UE4, godot, and I've made my own in the past. But it's important to note that it's not always a matter of limitations. You could make every engine in some way handle every type of format sure, but the reason many game engines have a skeleton as a single entity is for performance reasons, gpu bindings, etc.

There's a reason we bake data entities (matrices, points, normals) to flat buffers so we can pass them to the gpu. Allowing a skin to point to joints from different skeleton hierarchies makes the optimizations game engines put in for real time graphics pointless, or at the very least tedious and CPU cycle wastebins.

The other side of that argument is that if you know something doesn't work well in that format then just don't use it for that project. If I'm using a game engine that doesn't support more than 72 skeleton nodes, maybe I just shouldn't export more than 72 nodes.

But all of this was not the point of the post I made.

I made this point because the current definitions has way too much ambiguity.

If two skins define a node as a joint, what the hell am I supposed to do? You've basically given a node two parents in the sense that two skins need to know about this joint. There are no problems in an engine where everything is a node as you said, where every "joint" is a node in a tree and not part of a skeleton. But to do so is absolutely destructive for any real game engine that aims to have multiple animated skeletons in the viewport that need to render 60fps. It's just not feasible.

Remember that a skin just tells a mesh how to bind to the joints. It's still the transforms on the nodes themselves that cause animations to happen.

Almost every modeling program known to man (Maya, blender, Houdini, etc) have a skeleton system just for this purpose.

I thought the purpose of GLTF was to get a decent standard for passing scene data between programs and game engines, not a global, infinitely large expandable specification like USD.

If I can't even safely import a scene into 90% of game engines without having to write custom interpreting code then what the hell is the point of gltf? Might as well go back and use fbx if we're going to stick around with broken definitions.

A-Lamia commented 4 years ago

I have to agree with @marstaik on this situation, the only thing an importer should be doing is a 1:1 construction of the file, i don't think importers should be doing crazy logic to try figure out how to construct a file, there should be a clear consensus to make implementation simple and clean.

vpenades commented 4 years ago

The file you provided fails to open in blender and Houdini, can you post the separate gltf json + binary glb?

Yes, here you have the zipped glTF , you can also find the source code that geneated it, and some info about it here.

Right now if you want to preview it you have to drop it on BabylonJS, or use the latest version of Windows10 3D viewer (previous version was also having trouble with the skins, but they fixed it recently)

That model might look sily, but this way of creating and reusing meshes can be very useful to create vegetation meshes, like grass, plants, trees, etc, where you have very few meshes instanced many times, and you want them to move with the wind.

I don't know about UE4, but I did report some skinning issues to godot a while ago, they're resolving them here.

glTF already forbids one skin to point to joints of different node trees within the scene, what it doesn't forbid is to have multiple skins poiting to the same skeleton, or multiple meshes using the same skin, which can be useful:

To overcome the 72max bones in engines that have that limitation.
To have the same mesh instantiated AND animated multiple times in the same scene.
For characters with multiple meshes, where you want to enable/disable some meshes at runtime.

marstaik commented 4 years ago

@vpenades I inspected your file, and I fail to see how it invalidates any of the points I made. None of the skins reuse the same joints. None of the skins have extraneous joints in between themselves, 0 > 1 > 2 > 3 ... 9 are just a single tree on joint nodes. I may have missed something since I'm using my tablet.

Yes, there are multiple nodes that use the same mesh definition (instances) and a different skin, but that's fine. For #3 I was referring to a single mesh instance node can only point to a single skin (which is why I said it was implicit).

From a rough look, the file looks like a good example of what to do. None of the skin definitions share common nodes that would require some sort of skin merging for skeletons. All the skins have 1 root.

As for the "skeleton" node, I'm all for having it always defined or never defined. Having it be optional is useless.

marstaik commented 4 years ago

https://github.com/KhronosGroup/glTF-Asset-Generator/tree/master/Output/Positive/Animation_Skin

skinD is a case that I believe to be a problem. Leaving non-joints in a joint hierarchy just causes so much extra effort on most importers to handle correctly.

jbherdman commented 4 years ago

Almost every modeling program known to man (Maya, blender, Houdini, etc) have a skeleton system just for this purpose.

I have to disagree with this assessment. I have dealt with data conversion to and from most 3D modeling programs over the past 18 years of my career. While a given UI may (or may not) encourage creation of "connected skeletons", there is often nothing inherently wrong or prohibited about using multi-rooted/disconnected sets of joints. A lot of systems, such as 3DS MAX, allow basically any transform-node to be referenced as a "joint" by skinning, regardless of where it may lie within the scenegraph.

If we impose these arbitrary restrictions on the glTF format, that just shifts the work from the importers to the exporters. That could benefit some importers, but it also has the potential to negatively impact importers that are actually flexible enough to support the current (and more generic) spec.

Surely, if there is a common set of "scene optimizations" for skinning, which benefit only a subset of potential importers/runtimes, shouldn't this be handled in some sort of transform/optimize tool (e.g. glTF->glTF), rather than imposing arbitrary restrictions on the spec? That potentially offloads the work from both the importers + exporters, and places it in a central tool that could be applied as-needed.

A-Lamia commented 4 years ago

I understand what you're saying but i don't agree with some of your points, currently working with Blender, Godot and Houdini I'm having an issue where there's always some sort of unique problem with how my glTF files are imported or exported.

If we impose these arbitrary restrictions on the glTF format, that just shifts the work from the importers to the exporters.

I'm not sure why that's arbitrary, the restrictions are there so when you use glTF in different software you can always expect it to work as it should, through out the software I'm using it seems like the importers are playing some crazy logic guessing game and things are not being depicted as they should, If the goal is a consistent experience through out many software it's currently not working.

If the exporter has a standardized logical system to follow, then the importer will always know how to build your file because the exporter will always export the same data.

jbherdman commented 4 years ago

Your comment indicates that you think the current skinning system is not standardized and/or logical. I would argue that it is. Where it maybe needs "cleaned up" is in terms of the raw math involved.

The spec itself is fairly quiet on the exact math behind the skinning calculations. It references the "glTF overview", but I'm not sure there is quite enough information to uniquely resolve all the corner cases that come up in skinning. That is probably where a lot of the inconsistent behaviour is coming from, in terms of importers/runtimes.

The core problem, as I see it, is that there needs to be a definitive spec for "glTF skinning math", including either pseudo-code or math formulae to describe the exact post-skinning position of a given vertex 'v' (in specific terms of the related glTF skin/joint/mesh elements). It probably also needs to be expressed in terms of "world space" coordinates, for clarity.

What the spec provides at the moment seems a little hand-wavy in sections, mostly a "go do skinning; your engine already has some stuff, I bet". That leads to all sorts of trouble, because every runtime engine (and modeller) out there is going to make slightly different decisions about how to handle the corner-cases of skinning. That's not something that can be solved by restricting skinning to "a single connected skeleton-tree".

Once the "glTF skinning math" is locked down, it would be up to the importers + exporters to ensure that they are being consistent with that math.

One prime example of a "corner case" is whether the transform on the (skinned) mesh-instance factors into the "post-skinning, world-space" position of each vertex. I've seen systems go either direction on that decision, and there is no solution other than to recognize this and account for it during import/export. glTF would need to have a policy covering that case, and then it would be up to the importers + exporters to modify the data as-needed according to their own internal logic/rules/math.

vpenades commented 4 years ago

One prime example of a "corner case" is whether the transform on the (skinned) mesh-instance factors into the "post-skinning, world-space" position of each vertex.

I think that glTF already agreed that the transform of a skinned node does not factor in and it should be discarded. Actually, I believe the glTF validator gives an error if it finds a node with a Skin and a transform.

Part of the discussion before is that a @marstaik needs to interpret Skin as a full skeleton, which, as I tried to explain, is a wrong interpretation of what a Skin in glTF is.

To clarify, the definition of a Node's transform could be reworded as this:

A node can have either:

A Simple transform
A Complex Transform

if a Node has a Skin, then it can be considered to have a "complex transform", otherwise it is a simple transform, so they're mutually exclusive.

When in simple transform mode, the mesh is brought to world space by the world transform of the node.
When in complex transform mode, the mesh is brought to world space by the world transforms of the nodes pointed by the skin.

I think this way of seeing how skinning woks is a bit more clear, but I agree the current design may be deceiving.... for that purpose I proposed #1660 , which enforces the exclusivity of simple/complex behavior explicitly.

jbherdman commented 4 years ago

I think that glTF already agreed that the transform of a skinned node does not factor in and it should be discarded. Actually, I believe the glTF validator gives an error if it finds a node with a Skin and a transform.

That does seem to be implied in the spec (2nd "implementation note" under the '#skins' section of the spec), but the wording also seems open to potential misinterpretation. Rather, it is hard to decode the meaning of that implementation note, as currently worded, if you don't already realize that there is a choice to be made in how skinning-math can be implemented.

In contrast, in the 'glTF overview' images, there is an example vertex-shader for skinning, which includes the line "gl_Position = modelViewProjection skinMatrix position". Without further context, it seems easy to misinterpret that line as saying that the skinned node's transform should be taken into account.

Part of the discussion before is that a @marstaik needs to interpret Skin as a full skeleton, which, as I tried to explain, is a wrong interpretation of what a Skin in glTF is.

I very much agree with you on that point, and disagree with any requirements to define any restrictive rules pertaining to "full skeletons".

While external runtimes may have their own rules about what support structures are needed for skinning/skeletons (such as fully-connected skeletons, etc), I don't see why those restrictions need to be pushed upstream into glTF. The existing skinning system is flexible, and fairly elegant. It may need some corner cases tightened up, or at least existing decisions to be reflected more clearly in the spec, but it seems to be on the right track without needing additional restrictions on the layout of joints/skeletons.

reduz commented 4 years ago

@jbherdman

If we impose these arbitrary restrictions on the glTF format, that just shifts the work from the importers to the exporters. That could benefit some importers, but it also has the potential to negatively impact importers that are actually flexible enough to support the current (and more generic) spec.

Sorry, but this logic is broken. This will negatively affect 1% of the importers and positively affect the remaining 99%, by making them immensely simpler. For a very rare corner case you are making the specification an order of magnitude more complex for every importer. I don't think this is in the spirit of GLTF2 also, which aims to enforce a single way of doing things to ensure the best possible compatibility. Your way of thinking is what made Collada a failure, we should stay away from that.

I really think @marstaik suggestions should be made core for version 3.0.

reduz commented 4 years ago

@donmccurdy @pjcozzi We are probably never ever going to support these situations in our importer, and I doubt any large game engine will either. I really do suggest Khronos or those responsible for GLTF spec do some damage control on this situation before more exporters keep exporting unusable files and GLTF2 ends up becoming another Collada.

As it stands, the format will keep finding incompatibility between exporters and importers for trying to be too flexible, and I believe this is entirely what GLTF2 tried to prevent.

I am sure that, when the spec was originally created, it was never intended to be used this way, so I would really try to close this gap in the 2.0 spec by adding extra clarifications, else by the time 3.0 comes out, it may be too late.

reduz commented 4 years ago

The clarifications I would add to the spec, taking from the OP:

Making the skeleton property in skin mandatory, which should always be to the parent of a bone (this way we can easily tell that a bindpose is relative to this node). This will greatly reduce ambiguity in the current situation of the spec, where it is optional. I know importers can somehow guess this, but not including this property is forcing more complex bug/prone importers, whereas for exporters adding this property (what bindposes are relative to) is no effort.
All joints in a skin must be connected. No game engine supports disconnected joints. Having them disconnected may work in a GLTF2 viewer (which plays single animations in skeleton local space and does no blending), but it makes importers to game engines hell (we need bind poses for all bones, because we convert animations to bone local space for animation blending, skeleton local does not work for blending. If bones are missing, we need to do heavy guesswork and invent incorrect bind matrices) . If an exporter really wants to have disconnected joints, then it needs to export another skeleton.
A mesh must only be able to bind to a single skin. No game engine supports binding a mesh to multiple skins. If an exporter wants to do this, it needs to join both skeletons.
While this is implicit in the spec, it should be made clearer that the expected way to export skinned meshes is by making the geometry skeleton-local, by applying skeleton_xform_world_inv * mesh_xform to the mesh vertices, tangents and normals.
Having multiple skins share joints should be outright forbidden. I've seen many exporters doing this and it also makes it hell for game engine importers. No game engine supports this, so we are forced to make really complex guesswork and duplicating everything (considerably reducing performance). If they really want to do this, they should just create the joints multiple times, but I've seen exporters do this to work around the "mesh needs to be local to skeleton" limitation, duplicating skeletons instead of making the meshes local. Having this as a requirement will force the exporters to properly localize meshes to the skeleton (which is why again, I suggest clarifying how this process is done, as the math is not super obvious for most).

All the above would make the spec regarding skins strict enough so exporters are forced to make gltf files that are easy to open and don't need any guesswork for importers.

I know many files exists that will be broken after these changes, but this ensures that, from now on, exporters have clear rules they have to follow to produce non ambiguous gltf2 files that will always open on importers (that won't need to do guesswork, or write an implementation to later realize some files don' t work).

jbherdman commented 4 years ago

* While this is implicit in the spec, it should be made clearer that the expected way to export skinned meshes is by making the geometry skeleton-local, by applying `skeleton_xform_world_inv * mesh_xform` to the mesh vertices, tangents and normals.

Is that actually the correct behaviour? I would like for someone with more knowledge than me to rough out the exact math-equation for "postSkinnedPosition = someMatrix * origPosition". There are a lot of bits-and-pieces related to this within the spec (and glTF-overview) at the moment, but there isn't any one cohesive end-to-end layout of all the math. Note: implementations/runtimes may vary on how they get to the same mathematical answer, but there needs to be some standard equation in the spec, to avoid guesswork + errors.

reduz commented 4 years ago

@jbherdman Yes, this is the intended behavior, it's explained here:

https://github.com/KhronosGroup/glTF/tree/master/specification/2.0#skins

Implementation Note: Client implementations should apply only the transform of the skeleton root node to the skinned mesh while ignoring the transform of the skinned mesh node. In the example below, the translation of node_0 and the scale of node_1 are applied while the translation of node_3 and rotation of node_4 are ignored.

This limitation is vital, because otherwise exporters can easily screw up and make skeletons and meshes not share the same space (which was a common problem in exported Collada files).

This is why I say that it should be made clearer how to do this, because exporters half of the time do it the wrong way. They make the skeleton mesh local instead, which does not work when you have multiple meshes affected by one skeleton, so in this case they duplicate the skeleton, and make the copies share joints.

This is a waste, because when this is imported to game engines (which of course none support this), you end up with one copy of skeleton per mesh, and all animation tracks duplicated. which is a lot more inefficient.

This is why, the right thing to do is to force exporters to do this process properly, by forbidding joints sharing by skeletons, and then explicitly explaining the process (math) of making meshes local to skeleton. This would ensure importers get GLTF2 files without waste.

marstaik commented 4 years ago

One of the biggest issues is that in many, many files, the InverseBindMatrices exported for a skin represent some arbitrary bind pose that is different from the pose the joints create in the scene graph!

You will see that a lot of the time the InverseBindMatrices expect a joint transformation that is different than what is actually seen by the transforms shown in the node tree.

I have seen issues where the the "scene pose" of the joints is in A-Pose, and half of the skins are bound to A-Pose (and their IBM's reflect A-Pose) and yet the other half of the skins are bound to T-Pose (and their IBM's reflect T-Pose, presumably because that mesh was made for T-Pose).

These meshes need to be transformed to the space that the skeleton is represented with in the scene graph. There is way too much broken behavior and loop holes going around here.

You can no longer ignore the IBM's exported, as they contain encoded pose data for when the mesh was bound! There is no elegant way to work around this either.

jbherdman commented 4 years ago

Based on my own understanding of "skinning" in general, which lines up with the viewpoint that @vpenades seems to be taking, I'm actually surprised to see that there is a 'skeleton' attribute on the 'skin' at all.

@reduz After re-reading that particular implementation note for the 20th time or so, and paying very-close-attention to the term "skeleton root node" in there, I now have a very different understanding of how glTF probably intends to do the math. It doesn't surprise me at all that this is a point of confusion for all sorts of people/importers/exporters. The spec probably needs to be less subtle & implicit about what is being declared.

In my own view of the world (perhaps not shared by glTF) bind-poses are either done relative to world-space, or the mesh carries its own bind-matrix (relative to the IBM's stored for the joints).

reduz commented 4 years ago

@marstaik The inverse bind matrix usually is just the rest from your modelling program. You can obtain it easily in Maya, Blender, Max, etc. The only confusing part is what they are relative to. This is why I think the skeleton tag must be mandatory, this just simplifies guessworking entirely for us game engines which do all have the concept of skeleton.

If this tag is not included, then we need to kind of guess where to put the skeleton node, and there is room for exporters to screw up. Collada has a much better concept of Skin in this regard. Still this is probably the least harmful of the points I listed (it just makes importers less bug prone) and worst case it could be left as-is. The others should definitely be mandatory changes.

reduz commented 4 years ago

@jbherdman Yes, same happened to me, at first I didn't understand why exporters were doing such convoluted things like using multipe skeletons sharing a skin, but then it becomes obvious that they were trying to workaround this limitation the wrong way.

This is why insist that we need to combine forbidding the sharing of joints for multiple skeletons with a good description on how to localize the meshes to the skeleton else we'll continue seeing exporters that produce files that are unreadable for game engines.

jbherdman commented 4 years ago

@reduz This might be a silly question, but if all joints were to have the same bind-pose (IBMs) for all skins, does that get around your desire to forbid multiple skins sharing joints? I'm just trying to get my head around the "real problem" here.

reduz commented 4 years ago

@jbherdman The "real problem" is that there is no need in practice to duplicate skeletons, they are being duplicated as a a bad workaround to a spec limitation. If this limitation in the spec did not exist, and skeletons applied in world coordinates (like in Collada), no one would be duplicating skeletons (but it would create other problems).

So all that needs to be done is to simply forbid this. I know you may be thinking "why being so strict? someone may be needing this.." but it's the same reasoning I mentioned before. Allowing this flexibility means it will eventually be abused in ways not indented.

It's better to force exporters to do more work, than risking importers to not be able to import a file properly. Think that exported files can be easily validated with a script and easily tested in a viewer, while for importers, all we can do is test a very large amount of files to ensure we don't find anything weird. Shifting the burden to exporters is always the right way to go.

jbherdman commented 4 years ago

@reduz I sense that your previous comment is using "duplicate skeletons" where you mean "reference a joint in >1 skin"? Because if only one skinned-mesh is allowed to use each joint-node, then there are cases where the "skeleton" (set-of-joints) might need to be duplicated within the scenegraph (at export time). Either that, or I'm reading your entire argument backwards. :)

marstaik commented 4 years ago

@jbherdman A "Skinned Mesh" is defined by a node using both a "mesh" and a "skin".

Skin -> Marks nodes as joints Mesh -> contains a bones array whose values are indices in the skins joint array

You can have multiples meshes use one skin. You can have a Mesh work with different skins (same indices).

reduz commented 4 years ago

@jbherdman Oh, no, the problem of referencing a joint by >1 skin is simply that:

It's pretty much useless. No real-life reason to do this exists.
No game engine supports this.
No 3D modelling app supports this, as far as I know, so this simply can' t even be created intentionally.

When I implemented the GLTF2 parser in Godot, I didn' t even expect anyone would do something like this simply because it's silly (reasons above), and when I found files doing this, 100% of the time they were trying to work-around the problem I mentioned before. There should never be a reason to duplicate the skeleton on export.

This is why this behavior should simply be forbidden.

vpenades commented 4 years ago

@jbherdman Oh, no, the problem of referencing a joint by >1 skin is simply that:>

It's pretty much useless. No real-life reason to do this exists.

Not true, it is extremely useful for clothing, procedural mesh generation, and complex scenes.

No game engine supports this.

Not true, Several engines support this, and even I've been able to give support for this to MonoGame which is an incredibly restrictive engine in that aspect. If monogame is able to handle such skin setup, other engines should be able to.

As I've said many times and I will keep saying, this is not a glTF design problem but an engine implementation problem.

No 3D modelling app supports this, as far as I know, so this simply can' t even be created intentionally.

Not only not true but the opposite; all 3d authoring packages support it and it's what gives artists freedom to arrange the animatable scene in whatever way they wish.

Furthermore, complex characters that use more than just one mesh are created exactly like this.

Actually, I did mesh rigging for a while in 3dmax, when you do the rigging, you're not required to assign all the nodes of the skeleton to the skin, you just pick whatever bone you need and leave the rest unused or just as part of the internal skeleton transformation. So what you are asking, would actually force artists to create invisible vertices so bones that are not required to be asigned to the skin are used in some way.

Precisely because all 3d authoring packages naturally allow to rig skins in that way is the reason because all the previously existing skinned 3d formats have been a source of headaches for artists; glTF addresses this issue by defining a skinning system that allows exporting the full skin setup defined by the artists without restrictions.

This is why this behavior should simply be forbidden.

Restricting this behaviour brings us back to the days of 3d file formats with 1 mesh 1 skin, so tell that to artists. If that's going to be the case, we better stick with FBX, or even MD5, we don't need glTF at all if we're going to do exactly the same stuff done by previously existing file formats.

glTF is a multi-mesh, multi-skin format that brings a lot of much needed freedom to artists, and makes asset creation a lot easier for thousands of artists, that's the reason why this behiavour should be allowed and encouraged.

reduz commented 4 years ago

@vpenades You completely misunderstood this and are probably thinking about multiple meshes deformed by a single skeleton, which is fine.

Please re-read, this is unrelated to meshes, It's about sharing a single bone between multiple skeletons. No 3D software or game engine supports this, because it makes no sense.

vpenades commented 4 years ago

@jbherdman Oh, no, the problem of referencing a joint by >1 skin is simply that:>

@reduz I understood you perfectly, the problem remains into thinking that a a skin is a skeleton and it is not.

You don't want to forbid a node to be shared by more than one "skeleton", you want to forbid a node to be shared by more than one glTF Skin.

By saying that a skeleton is a skin, you're misleading a lot of people in this discussion.

jbherdman commented 4 years ago

@reduz Except, 3DS MAX definitely supports using the same bone/joint with different bind-poses within different skins/skeletons. All of the MAX "skinning" logic (including bind poses, etc) is localized to an object-modifier on the mesh. (That architecture also makes it possible for the output of one "skin" to be used as the base-mesh for another "skin", which is totally impractical and weird, but that doesn't mean I haven't seen customer files with such oddities.)

Maya also definitely supports a similar scenario with its 'skinCluster', but I'm starting to forget some of the details on that side of things. I'm not sure if it can get quite as crazy as the MAX skinning model (but it probably can).

I'm not saying glTF must or should support all such shenanigans, but you equally can't say that "no 3D software supports this".

reduz commented 4 years ago

@jbherdman ,@vpenades Right, still, letting naming aside it still makes no sense to do this. I can understand where you are coming from, because Maya and Max treat bones separate and store the skin on the mesh, so we were indeed discussing different things.

In practice though, there is no need for doing this, as it makes the export format considerably more complex and inefficient for run-times. This is something that was carried over from Collada, which was an exchange format. It's much faster to process an entire skeleton (with a single bind-pose per joint) and then have all the meshes depending on it use this (they are localized to the skeleton before exporting).

Exporting individual skins instead of a skeleton also has an additional problem, which is what I mentioned before. You are tempted to remove nodes that are not used for the skin, and this makes it impossible to reconstruct the original skeleton bind pose, which is very useful in advanced animation blending. This is why Collada exporters ended up exporting the whole thing for every skin, even if the bone was not used. It's just a broken logic.

As I mentioned, it's a bad habit to use this logic for run-time exports, because you end up missing information. Instead, skeletons should be properly consolidated. All you need to do is to apply mesh transforms to the vertices/normals relative to the skeleton before exporting.

The "loophole" in the format itself is that you can assign multiple meshes to a single skin (in Collada, this was not possible, as far as I remember), and at the same time multiple skins to a single skeleton.

One of both should be closed, and I would definitely not allow multiple skins to a single skeleton for practical reasons, as you encourage exporters to lose information.

jbherdman commented 4 years ago

@reduz I don't think that it makes sense to leave the naming out of it. A 'skeleton' and a 'skin' are very different concepts, or rather they are the bases for different types of skinning systems.

Let's take a simple case where there is a node-hierarchy consisting of a single chain of 3 nodes: 'A' is the parent of 'B', and 'B' is the parent of 'C'. Suppose we have a skinned mesh that wants to be influenced via that sub-tree of 3 nodes. However, it only has vertex-weights assigned to nodes 'A' and 'C' -- there is no direct dependence on node 'B'. ('B' only matters because it impacts the world-transform of 'C'.)

Under a "skeleton"-based system, all 3 nodes are joints/bones. They are all assigned bind-poses. This seems to be the lens from which you are viewing the world.

Under a "skin"-based system, only nodes 'A' and 'C' are joints/bones. There is no need to store a bind-pose for node 'B', as it is not involved in the skinning. If one wanted to do some clever optimizations on the scenegraph, and nothing else was directly observing/using node 'B', one could potentially get rid of it (of course, that would require recomputed/combined animation-data for 'C', so that it kept the same frame-by-frame transform relative to 'A').

Here's the problem with your requirement for a connected skeleton -- under a "skin"-based system (which is most modellers; definitely MAX, Maya, and C4D, not to mention anybody writing to Collada/FBX files), there is no guarantee that node 'B' has a bind-pose. If the originating system/modeller didn't see a need for it, why would it store it? I mean, you might get lucky, but there is no guarantee.

So, what does the exporter do? It maybe guesses and invents something, which isn't necessarily any better than the guess/invention of whatever runtime engine wants to read that bind-pose in the first place. It doesn't know how you're going to use it, and it could throw an identity-matrix (or the frame-zero transform) into the bind-pose just to "fit the requirement" that it doesn't think matters.

The other problem, of course, is: what happens if the original "skin"-based modeller also created another mesh, and that mesh wants to use nodes 'A', 'B' & 'C' with a different set of bind-poses? That is entirely legal in a "skin"-based system. If glTF remains a skin-based system, then it is perfectly legal to represent that data in a 1:1 fashion. However, if glTF becomes a skeleton-based system, then it is up to the exporters to throw errors when they can't map onto a "single skeleton with a unique bindpose for each joint-node". Now, that may or may not be "better", depending on one's point of view.

There is no particular reason, of course, that a game engine couldn't be based on a "skin"-based skinning system. It's neither more or less efficient than a "skeleton"-based system, it just requires a slightly different perspective on the design of the skinning system.

vpenades commented 4 years ago

Additionally to what @jbherdman has said, regarding removing "unused" nodes for the sake of optimizations, this is also dangerous, it might be that these nodes are leftovers of the artist workflow, but it can also happen they're intentionally there to signal the position of a character relative point that is required by the game's logic. Classic use of this feature is attachment points for weapons.

Engines should try to optimize within its limits, but trying to overthink the artists intentions is also unwise.

Regarding pre-transforming the vertices of multiple skinned meshes so we can merge everything into a single big "one-skin-one-mesh-one-skeleton" thing, keep in mind that glTF also supports morphing.

So at some point, you're bound to find a glTF model where the head is a separate mesh with facial animation made with morph target, AND skinned using just the bones of the head and neck, then you'll find another mesh with another skin, using the joints of the neck and the rest of the bones of the body. In this case, the skins of both the head and the body will share the neck joint. Notice that your requirement is forbidding the single most important case which is advanced character animation.

Finally, glTF is currently discussing adding suport for subdivision surfaces, which will probably be compatible with skinning and morphing, adding more complexity to the mess.

All this complexity is certainly giving a lot of headache to 3d engine developers, I am having hard times too... but hopefully this will allow artists to do really amazing things...

reduz commented 4 years ago

@jbherdman, @vpenades I understand this, but I was under the impression that Maya/Max did store bind poses internally though, if I'm wrong about this, then the OP makes no sense, unfortunately.

marstaik commented 4 years ago

@vpenades This is incorrect. The exported mesh has 2 arrays that are relevant for skinning, the Joint Indices (currently in relation to the skin) and the Skinning Weights. The only thing that needs to change is that the joint indices of the meshes should now be of the larger, exclusive skin.

Ive already done this with a custom importer I am running. The problem is that I (nor any other importer) shouldn't have to do this at all.

Notice that your requirement is forbidding the single most important case which is advanced character animation.

I don't see how its breaking this requirement at all.

You are giving leniency to the exporter. This is making importers go through hell. Game engines store all these bones in one place, so that they can be mapped to a uniform/texture buffer so that we can render things at a decent framerate. Theres a reason why Maya and Max cant render anything at any decent framerate in the viewport.

jbherdman commented 4 years ago

@reduz Let me preface this by saying that most of my Max/Maya skinning expertise is from roughly 2001-2008. If they have tightened up the UI restrictions/behaviour since then, I wouldn't know about it. On the API/back-end, though, they are locked in to backwards compatibility, so nothing much would have changed at that level.

Under Maya, there is technically a "jointNode.bindPose" property. It is supposed to be the inverse-matrix to the "skinCluster.bindPreMatrix[i]", but there is no programmatic constraint holding those two values together. In theory, as long as you never add a second 'skinCluster' to that 'jointNode' inside Maya, the constraint should hold. Back in the day, these attributes were notoriously fragile. I'm not sure what UI operations would mess them up, but a good fraction of our customers were seeing the "inconsistent bind poses detected" messages that we popped up when we saw any of the 4 different bind-pose-attributes as being out-of-sync with the others. (The other 2 copies of the bindpose were on the 'dagPose' node, which wasn't always present.)

The situation under MAX was similar. They added an ISkinPose interface which was supposed to store a unique bindpose for each INode, but it wouldn't always get set. Technically, only the ISkin->GetBoneInitTm() would be "required" in order for skinning to work within MAX. Again, we had to write a bunch of code in our exporter to detect and warn if/when those bind-poses got out-of-sync.

In either case, the Max/Maya developers basically retrofitted some support to try to appease "skeleton"-based skinning systems. Contrast that with Cinema4D, which has no concept of a central joint-bindpose -- it is more purely "skin"-based, in that the bindposes are only stored on their equivalent of a "skin modifier" on each mesh.

FBX follows the Max/Maya hybrid weirdness, in that there is an 'FbxPose' that ostensibly holds the bindpose data, but these matrices are also stores in the FbxCluster nodes (which live inside FbxSkin; somewhat akin to the 'joints' + 'inverseBindMatrices' of the glTF 'skin'). In practice, these FBX bindposes are "supposed to be" self-consistent with each other, but that falls down in practice at least as often as the related Max/Maya structures do. (And probably worse, since some third-party writing out an FBX file can probably miss one of those bind-pose locations pretty easily.)

We've all just been re-inventing the same wheel for like 20-30 years.

marstaik commented 4 years ago

@vpenades

So at some point, you're bound to find a glTF model where the head is a separate mesh with facial animation made with morph target, AND skinned using just the bones of the head and neck, then you'll find another mesh with another skin, using the joints of the neck and the rest of the bones of the body. In this case, the skins of both the head and the body will share the neck joint. Notice that your requirement is forbidding the single most important case which is advanced character animation.

A (Head) > B (Neck) > C (Body)

Skin 0: Joints: [0: A, 1: B, 2: C]

Mesh: 0 (Head Mesh) Bone Indices: [0, 1] Vertex skinning weights: ....

Mesh: 1 (Body Mesh) Bone Indices: [1, 2] Vertex skinning weights: ....

I think you are misinterpreting what the current skin model does, and I feel like that is because it has been thoroughly abused by so many exporters. Currently, it is supposed to provide the IBM's in relation to world position (or skeleton root). But currently it also usually contains bind pose of the skins "joints" were in when the mesh was bound. An example of this is when the joints of the skin get exported to the scene graph in the first frame of some animation, while the skins IBM's are actually based off of a T-Pose.

The Meshes bone indices (JOINTS_0) provides the indices for which joints to bind to, in the order defined in the skin.

The changes proposed do not prevent you multiple meshes from using a skin. In fact, meshes never had to use all the joints defined in a skin. The example above uses two meshes that only use a subset of the the single skin.

In fact, you could have a second skin, Skin 1, that used completely different joints, and the Meshes above could also work with Skin 1. All that matters is the joint indices defined on the mesh match up with the skin.

The localization of the mesh to the skeleton root is supportive of reusing the mesh across different skeletons, not against it. It means I could place Mesh 0 or Mesh 1 and attach it to either Skin 0 or Skin 1 and it should be able to attach with ease if the IBM's are relative to the skeleton root, and not posed arbitrarily.

If a modeling program wants to support a random locator node in between two joint nodes that is fine. But just because the modeling program can do it does not mean everyone else can, or should. The exporter will know infinitely more about the current scene than the importer can only hope to deduce. After all, the importer only knows what is written into the final gltf file. However, the exporter could easily make the decision to either export that locator node as another joint, and add them to the end of the skins joints array without doing much, if at all, extra work.

The propositions above are not trying to take away the flexibility the system provides, only to tame it a bit so an importer can clearly look at the gltf2 file and know exactly what to do with it.

I do not know any game engine that can import the current gltf files without some sort of internal assumption/decision making, or having to compute psuedo-skeletons from combining skins that overlap together, etc, recompute the IBM's, and then try and deduce the world transforms of the skins (and I say that because I have done it). It does not feel right, and still has the potential for unexpected behavior. It doesn't feel right to try and have to guess what to do with these skins that can have one node across the node hierarchy and have another skin in between them. It just doesn't work for game engines.

And if I am wrong, and there exists a game engine that has done it, I want to know how it did it without having to follow 50 pointers to nodes and copy the nodes current data into a larger buffer every frame so that it can be bound to the GPU.

@jbherdman Maybe we have been re-inventing the same wheel because we want everything to be super flexible and supportive of every possible situation. We have to at one point tell exporters "hey, what you are doing with all these tricks is great and all, but keep it to yourself, we don't want that in the exported file" and we should be okay with that.

jbherdman commented 4 years ago

Currently, it is supposed to provide the IBM's in relation to world position (or skeleton root). But currently it also usually contains bind pose of the skins "joints" were in when the mesh was bound. An example of this is when the joints of the skin get exported to the scene graph in the first frame of some animation, while the skins IBM's are actually based off of a T-Pose.

@marstaik Can you expand on what the problem is, in this exact scenario? I don't understand the issue, but I could be missing something.

Let me define some of my terminology, to maybe clarify the situation. (Sorry if this gets too pedantic.) In most modellers, there is a "bind pose" (or "rest post", etc). This is a set of "world-space" matrices, one for the mesh, and one for each joint. (Technically, they don't have to be world-space, but do all need to be relative to some common/defined space.) When you set each node to have its "bind pose" transform, then you will wind up with the skeleton + skinned-mesh overlapping nicely in something like a classic T-Pose. This bind-pose (via its transforms) then forms the basis for all the skinning math. (This isn't anything new, but the glTF spec just glosses over the concept, so I figured it was worth spelling it out.)

Here's where my glTF knowledge gets a little sketchy, because I feel like the skinning portion of the spec isn't worded quite clearly enough around the math. Let's say that I have world-space bind-pose matrices, though, and I want to run with that (as an "exporter"). I know from one of the implementation notes that I either need to multiply my mesh-bind-pose matrix into my vertices (et al), or else I can inverse-multiply it to my joint-bindposes. Dealer's choice, really, I just don't have a "mesh bind-pose" slot to insert it, which is fine. Then, I go take my joint-bind-poses, invert them (because that's how they are most useful at runtime), and store them into the IBM's for the skin.

Honestly, I'm still just a little confused about what happens if my mesh-node has a non-identity transform, but that's a problem that impacts runtime-skinning, not the bind-pose data. (If I get really paranoid, I can just make sure that my meshes are written out with identity-transforms, and rely on the joint-nodes all being in the right world-space locations.)

So, let me ask this question. How is the

bind pose of the skins "joints" were in when the mesh was bound

different than the modelling definition of "the bind pose"? You perhaps mean something other than what I am reading from your sentence, because what you are writing just seems to be the definition of the bind-pose (and thus IBMs).

Further:

An example of this is when the joints of the skin get exported to the scene graph in the first frame of some animation, while the skins IBM's are actually based off of a T-Pose.

I'm not sure why this is wrong, if I'm reading everything correctly? The joint's transforms in the scenegraph never need to match their bind-pose. In most modelling programs, the bind-pose is only accessible by using some special "go to bindpose" UI command, or viewing a file where the character just happens to be in bind-pose. (Such as a file that the rigger has passed to the animator, without moving the character out of bind-pose.)

marstaik commented 4 years ago

@jbherdman I think I can see now where a lot of our confusion stems from. I will try my best to describe the situation I am seeing.

You are right that I was defining a bind-pose, but that in itself is not the issue.

In the current skin-centric model, the skin describes a the bind pose of a mesh relative to a set joints, with what position they were in when bound recorded into the IBM's. This is the "bind pose" we are talking about.

The problem that I am seeing is that this allows another skin to have IBM's that describe the joints as if they were posed in a different "bind pose".

Again, in a skin-centric model, this may be fine. But that assumes that you have what you mention earlier:

Dealer's choice, really, I just don't have a "mesh bind-pose" slot to insert it, which is fine. Then, I go take my joint-bind-poses, invert them (because that's how they are most useful at runtime), and store them into the IBM's for the skin

It relies on the existence of per-mesh bind poses. And why is that?

Well, most game engines are going to put these joints in one skeleton (and for good reason), and the meshes then link to the skeleton, and therefore will use the skeletons IBM's. They then get copied into the GPU buffer straight from the skeleton.

And this is the problem. If you have one skeleton, you can only have one bind pose for said skeleton. But above I mentioned that two skins referring to the same joints can have two different bind poses.

Say we use the first mesh's IBM's for the skeletons bind pose. Now the second mesh will bind to the skeleton incorrectly. We would have to make a custom node to map the skeletons bind pose to the bind pose that the second mesh expects.

But now, there's even more ambiguity. Say that mesh one and mesh two had two different bind-poses, but now the joints themselves were exported in an entirely different pose. What do I decide to use as the actual bind-pose?

But then, here is another issue: To even create said bind-pose mappings for the second mesh, the first and second IBM's need to have their world space matrix removed. So I need to use the data from the mesh + skin and compare it to the the root joints location in the scene to compute the world transform of the root, and extract that out from all of the IBM's. This ends up being a huge mess.

Please let me know if something needs additional explanation.

jbherdman commented 4 years ago

@marstaik Yes, thank you, I think we are both on the same page now, in terms of understanding the problem.

What we're talking about, though, are the fundamental limitations of "skeleton"-based systems vs. "skin"-based systems. At the end of the day, if the skeleton-based system can only store one IBM per joint, then "some modification" needs to happen. In the past, I've specifically written exporters that go from "skin"-based systems to "skeleton"-based systems. It is a pain. Basically, the only perfect solution is to duplicate the joint-transform-nodes into different hierarchies, just so that each joint-transform-node can have its own unique IBM. That is terribly expensive at runtime, of course, because now you are unneccessarily computing the animation/transforms for 2x (or 3x or worse) as many joint-transform-nodes.

One step shy of that, if you get really lucky on how the artist created the input-data, you can try to massage the bind-poses so that they overlap properly. That is, you can sometimes get lucky and make the IBMs 'shared' across multiple meshes/skins, if the problem is simple enough that you can solve it by multiplying the mesh-bind-pose into the mesh.

For example, let's go with that example of a skinned-character who has a separate "head" and "body" mesh. If you're lucky, both meshes were bound in the same set of joint-bind-poses, but probably have different mesh-bind-poses (as their mesh-local origins are likely different). As long as you multiply the mesh-bind-poses into each mesh (rather than into the joint-bind-poses), then you can successfully share the same joints in a "skeleton"-based system. But, if the artist did "something weirder" (such as binding the head to a bind-pose where the characters neck-joints are at a different set of relative angles vs. the body-bind-pose), then you will be left with no choice but to duplicate the joint-hierarchy when exporting to a "skeleton"-based system.

The more optimistic situation is that ideally the game-engine could shift from "skeleton-based" to "skin-based". It requires just one extra layer of indirection -- each skinned mesh would have a collection of "joint" objects which held the IBM + a pointer to the scenegraph-node from which that "joint" receives its animated transform-data. It's a relatively minor shift, and it lets you get away from the worst-case-scenario where you need the exporter (or importer) to potentially duplicate the hierarchy of joint-transform-nodes.

I personally think that glTF supporting a "skin"-based system is great, and still the way to go. If a lot of runtime engines need support transforming data from a "skin"-based system to a "skeleton"-based system, that seems like a common tool that could be written (e.g. glTF->glTF transformation). Exporters could try to be more aware of the situation & challenges, but sometimes their hands are tied by the source data being "weird" because of something the artist did.

vpenades commented 4 years ago

Having a glTF with multiple skins that share joints, but don't share IBMs is required for some artistic workflows.

One of the classic issues with skinning is extreme vertex deformation at acute joint angles. I've seen very skilled artists come with a very clever solution: Instead of modelling the character in the classic T-Pose, they model it in a fetal or relaxed pose. This way of modeling allows for a more natural vertex deformation at the joints.

Skin_Pose

But when they want to put clothes over the naked character, and the clothes are made of separated meshes, they move the skeleton into a classic T-Pose, because it's easier to model clothes in that pose.

combination glTF clothing live demo

So, since the base body and the clothes have been bound to the skeleton at different times, the IBMs of the base body skin and the clothes skins will be different, which is good because a skilled artists can take advantage of this to reduce deformation artifacts.

This is by no means a rare case. Now, the solution that @marstaik proposes is that the exporter merges all the meshes and skins sharing joints into a single big mesh-skin, doing the reverse maths and duplicating joints if neccesary.

The problems I see to it are:

This effectively changes glTF from a skinned-centric file format, to a skeleton-centric file format, it is a huge paradigm change, which is probably too late to overcome, specially when some engines have been successful in taking advantage of the skinned-centric approach of glTF.
What if I don't want to merge the meshes, because in my application's logic I want to enable/disable some of the meshes visibility?
What if one of the meshes you want to merge has morph targets? how the heck are you going to merge that?
And why should we do it? Skinning-centric paradigm is superior than skeleton-centric paradigm, because it has less limitations and lets artists workflow to be exported seamlessly. I don't see the point in moving to a lower standard with more limitations.

Now, being practical: for those skeleton-centric engines that have a hard time importing glTF, maybe they need to rethink how they're handling glTF, instead of pretending it's a skeleton-centric format and failing to import 30% of the glTFs around. Maybe what's needed for those cases is to wrap the glTF scene into a master glTF node within their hierarchical node system.

@marstaik keep in mind that not everybody is doing videogames, in our case, we're using glTF for biomechanical and anthropometric research and visualization, so rendering performance is not a priority, but having a consistent skinning-centric file format is absolutely critical.

Skinning with shared joints with different IBMs is not a monstruosity, it's a much needed feature, and you cannot ask it to be removed just because you don't consider it important.

marstaik commented 4 years ago

@vpenades @jbherdman What if we can have the best of both worlds?

I think now that we have identified the multiple-bind issue, we can make a solution that benefits everyone.

The good thing is, for the IBM's at least, there is a "performant" solution for game engines to this problem. We can add a final bind transform on the mesh instance, in the the same flat buffer format as the GPU, call this the bind_pose_offset. The shader can then take: final_ibm[i] = bind_pose_offset[i].inverse() * skeleton_bind_pose[i].inverse() where i is the joint index, since final_bind_pose = skeleton_bind_pose[i] * bind_pose_offset[i] to compute the actual IBM's/Bind Pose needed. The biggest cost of this is memory, as we need to hold the skin's bind_pose differences somewhere.

You have to understand that there is a big downside to a skin based system, because if I have to follow the references a skin has every frame (let it be a c++ pointer) and have to copy the transforms into a buffer to get sent to the GPU, that is a lot of wasted cycles and memory copies every frame. Especially since it now goes from one skeleton per frame, to one mesh_instance per frame. Ouch. The above solution could make it very minimal, with only a larger memory footprint.

However, I still think some of the points I made above stand.

The "joints" defined in a tree need to be a strict sub-tree. Removing the existence of this: A[j] > B[j] > C[n] > D[j] where the skin is defined as: Skin: A, B, D and forcing the exporter to make either C a joint, or, parent the real C to a joint Cb so that the Skin definition is a complete sub-tree should be required.

Now, we can do something similar to point [6] in the OP, and either make a "skeleton" or call it a "master skin", but it would be ideal that there is a way to gracefully resolve the different IBM's.

Something like this (rough idea):

Every skin must be a complete subtree with the skeleton property defined (no longer optional).
Each skin shall either define a new tree of unused joints, or, explicitly be a subtree of a previously defined skin. Let this be known as the "master skin". If a skin is a subtree of a master skin, let it be called a "child skin".
The skeleton property points to a node that may or may not be a joint in the skin, but it must have its child be a joint in the master skin.
The child skins skeleton property is the same as the master skins. (This lets us group them together easily and find the relations)
Let the IBM's of both master skins and child skins be localized to the skeleton node (ie, treat the skeleton node as the origin).
The joint indices defined in JOINT_0 of the mesh should be the indices of the master skin.

The benefits I see here are:

No more random nodes in-between joints
You can directly parent the skinned-mesh to the skeleton node so that it inherits the transform, and the IBM's agree with this, since they are skeleton local. Transform of the skeleton node + IBM's should always create a valid bind.
You can still have multiple bind-poses for meshes. All of the skins can define the IBM's. If it is a child skin, we can easily compute the difference between the bind pose transform of the child skin compared to the master skin, and accommodate that with a solution similar to the one I posted above.
Still lets you have a skin-centric model. Multiple skins can use the same joints, we just have to be more clear about it and have them be a subset.
If an importer was written correctly from the start, and obeyed the skeleton property being defined, then this should work with all existing importers without much modification, as if they supported the skin-centric approach they wouldn't even have to deal with the master skin except for the JOINTS_0 of the mesh being the indices of the master skin and not the child skins (but that's a really easy map).

The downsides:

Exporters need to do a bit more work.
Current importers need minor modification: While importers that could handle the current skins correctly could also handle the new skins almost perfect, there is still a little bit of work to do.

We could keep GLTF skin-centric, which I agree is more flexible with binding meshes to joints, but also make it much, much easier for importers (not limited to game engines) to parse and represent properly.

jbherdman commented 4 years ago

@marstaik It seems like you didn't really absorb the excellent use-cases that @vpenades put forward, though? In those cases, the multiple/conflicting IBMs used in different skins can be a feature, not a bug. So, I'm not really sure why you still want to move towards a glTF restriction that seems to move away from those capabilities?

Every skin must be a complete subtree with the skeleton property defined

This really doesn't matter (or even make sense) to a "skin"-based skinning system. I'm still unclear on why you consider this necessary, except that it might make life easier for your particular skeleton-based runtime engine?

Likewise, the performance issue you raised seems negligible. Under a "skin"-based system, you are still only paying a proportional cost to what you are rendering. You never even have to "duplicate" the node-transforms, if you can index into them appropriately. And you still only have 'N' node-transforms in your glTF file, even if some of them are being referenced by some 'k > 1' number of skins. You can just build an array of IBMs per-skin, and index into the appropriate IBMs-array as needed. Assuming that the engine/runtime is doing anything other than just throwing skinned meshes at the graphics card, I'm not convinced the performance difference would even be measurable.

At a certain point, if you are using a target runtime-engine that is "less capable" than glTF (e.g. skeleton-based skinning), the only real option is to control your art pipeline. I would argue that, what it sounds like you want to do is actually best served by:

Ensure that your artists aren't creating files that have multiple meshes/skeletons/skins, or disconnected-skeletons, or whatever else is causing grief for your engine
Perhaps contribute a fix to whatever glTF exporter is causing you grief. If, for example, the incompatible-IBMs are being caused in cases where you "know they shouldn't be", maybe an exporter somewhere could benefit from a change to multiply the mesh-bind-pose into the mesh-data instead of the IBMs?

Or, as I previously suggested, you could "pre-process" glTF files to more directly suit your purposes. With a good understanding of the "skin"-based system that currently exists, you could manually process the glTF files to be more compatible with your skeleton-based engine. If you add that processing into your own art-pipeline, then you can prevent "bad data" from hitting your engine (where the definition of "bad data" is purely from your engine's perspective).

Think of it this way, if your runtime engine didn't support morph-targets, you wouldn't be complaining about morph-targets being part of the glTF spec. You would be controlling your art-pipeline so that your engine didn't have to deal with morph-targets.

marstaik commented 4 years ago

@jbherdman Please re-read my post. What I posted supports @vpenades use-cases. It still allows child-skins to define their own IBM's. They just need to be relative to the skin root.

This really doesn't matter (or even make sense) to a "skin"-based skinning system. I'm still unclear on why you consider this necessary, except that it might make life easier for your particular skeleton-based runtime engine?

Yes, it makes life much easier for skeleton based systems. Is that such a bad thing?

Likewise, the performance issue you raised seems negligible. Under a "skin"-based system, you are still only paying a proportional cost to what you are rendering. You never even have to "duplicate" the node-transforms, if you can index into them appropriately.

You are neglecting that the "joints" in the scene get posed by animations, and without a skeleton saving those in some flat buffer, you need to visit those nodes one by one and get their transforms, every frame, and put them into a buffer for the GPU. That is a lot of wasted CPU cycles.

What is the point of a format that is only good for everything other than high performance rendering? The current reasoning is "Lets just make some of the primary consumers of GLTF data have to go through many hoops to get a satisfactory result." It sucks. You could be a little bit stricter on the skin definitions and make life not a living hell for them.