Stricter Skinning Requirements

KhronosGroup / glTF

glTF – Runtime 3D Asset Delivery

Other

7.12k stars 1.13k forks source link

Stricter Skinning Requirements #1665

Open marstaik opened 5 years ago

marstaik commented 5 years ago

After messing around with the gltf spec and various engines, I feel there are quite a few cases with the current specification that make it extremely difficult for most importers to handle with their own internal rendering engines. Either this leads to a bunch of extraneous work and guessing on the importer side, or leads to duplication of data by the exporters.

Here are some ideas that I believe could tighten up the specification:

1) The skeleton root should be defined, otherwise the direct parent of the highest joint in the skin hierarchy must be used. The direct parent does not have to be a joint, to allow for multi rooted skeletons. Note that the direct parent may be the root, but it should not be the root as it was before when the skeleton property was undefined.

2) All joints in a skin must form a connected subtree with the skeleton root/direct parent (1) This means there lie no non joints (or other skins joints) as connecting joints between skin joints. This however still allows you to have other nodes attached to joints, as long as no subsequent joints of the skin follow. A lot of engines treat the skeleton/skin as a single entity, and allowing for non joints embedded in the tree of joints makes the engine have to create a bunch of shadow bones to get things to work properly.

3) A mesh can only bind to a single skin I know this is pretty implicit, but I believe it should be defined.

4) As a per previous discussion here: https://github.com/KhronosGroup/glTF-Blender-IO/issues/566#issuecomment-523584953 All meshes skinned must be normalized to the local space of the skeleton.

Now I have another requirement that I am torn between two options: (5) and (6)

5) Each joint shall belong to only one skin This is the ideal choice as it makes things extremely simple for importers as they do not need to resolve/union skinning trees to find the master skeleton/skin (see below)

6) Each skin shall either define a new tree of unused joints, or, explicitly be a subtree of a previously defined skin. This allows for multiple skins per skeleton, but the subtree definition implies that all skins for a single skeleton must have a "master" skin that holds all of the joints for a skeleton. This makes it easy for an importer to map smaller skin definitions to a master skeleton containing all the joints.

jbherdman commented 4 years ago

You are neglecting that the "joints" in the scene get posed by animations, and without a skeleton saving those in some flat buffer, you need to visit those nodes one by one and get their transforms, every frame, and put them into a buffer for the GPU. That is a lot of wasted CPU cycles.

@marstaik The transform-nodes get updated by animation data, sure. And computing the transform-node matrices from the animation-data each frame is expensive on the CPU -- far more expensive than copying those computed matrices around. And in the bigger picture, that is all still basically "free" compared to the rest of the CPU cycles you are likely to be spending each frame.

That said, I don't see why you couldn't design a system to place all the transform-nodes matrices into a single flat-buffer (indexed by transform-node-id), and send that to the GPU the same way your "skeleton" case would. I'm just not convinced that optimization would make any measurable difference to the runtime performance. (Sure, it might technically save you CPU cycles, but would you ever be able to measure the performance difference between a system that had that optimization, and one that didn't?)

vpenades commented 4 years ago

Regarding optimization and performance, it is possible to create a very inefficient glTF file, in the same way you can create a JPEG with a very bad compression algorythm that takes a lot of space while the quality of the image is bad.

For example, you can create a glTF model with 100 meshes where each mesh has its own vertex/index buffer. So it's 100 buffer bindings and 100 render calls.

A glTF optimization pipeline can take that glTF file, analyze it and squeeze every bit from it so it produces a new glTF with a single mesh, or multiple meshes if the meshes are not compatible, but maybe with a single vertex/index buffer, so it will render faster.

But this doesn't mean that an engine, any engine, should only be able to display the optimized mesh, and complain about the unoptimized mesh. A glTF compatible engine should try to replay the contents of a glTF model with as much precission as possible, optimized or not.

If a glTF model comes with 17 mesh-instances and an engine needs to do 17 rendering calls, so be it, it's what's needed to display that particular glTF model. If performance is an issue, we can develop tools to try to merge what's mergeable and optimize the vertex/index buffers, and convert the textures from JPEG to DDS or whatever.

But ultimately, any engine should try to render the contents of a glTF as it comes, and not try to overthink how should had been rearranged.

One solution I do like is what Windows 10 3D View app... it tells you the number of render calls, along with number of polygons, so you can get an idea of how expensive is to render a particular model, and then an artist or a developer can try to improve it.

@marstaik , @jbherdman , and guys, I feel like we're running in circles, all the arguments have been laid politely, and I don't think I have much more to say, so I'll leave this open so people from khronos can read this thread and leave their opinion or veredict on this.

On my side, I'll probably release my monogame glTF code soon, so it might serve as an use case.

Peace! 😄

Selmar commented 4 years ago

Just to give you another idea of what happens out there in the wild, I figured I could contribute my experience, for what it's worth. I am by no means a skinning expert, though I believe to understand the individual parts by now.

Shortly after I started at my current company, I used our engine's (skeleton-centered) skinning system to implement glTF skinned meshes. There are still a number of issues outside of the mentioned limitations below.

Specification issues

I've had my share of issues with ambiguities and guesses, which came in part from a lack of understanding and in part from a lack of clarity on the specification. I've made an attempt to summarize the problems encountered:

Why are inverse-bind matrices necessary, when I can derive the t-pose matrices from the scene graph?

I currently believe this is only really necessary for advanced skinning techniques.
What are inverse-bind matrices relative to?

I still do not know for sure. I was assuming inverse bind matrix mesh's node transform (as defined by the initial node hierarchy) vertex transform == local joint space vertex transform, so relative to the mesh node transform, though we do not use the IBM's, so I haven't dug into it.
Is the joints array the skeleton?

No, a skeleton does not really exist in glTF. The skeleton is implicitly represented by the node hierarchy.
What does the skeleton property mean?

It is the root of the implicit skeleton, but the exporter we worked with did not point to the correct node, leading me to calculate the skeleton root myself.
What spaces are the different matrices/objects in that are described in the gltf skinning explanation (this image)? most sentences say either what they transform from or to, not both, or their starting point is ambiguous (i.e. transforms the mesh into local space of the joint does not explicitly mention in what space the mesh should be).

I thought several times that I knew, but I'm still not sure whether our implementation (and by extension the implementation of the exporter we use) on this topic is correct, although things are working.

Engine implementation

Our engine has a 1-1 pairing of skin-skeleton and treats nodes and bones as separate entities. Thus, with the limited knowledge I had back then (and time constraints), the result is something rather inefficient, but mostly functional:

for every skin, we have a unique skin/skeleton pair
the explicit skeleton hierarchy required by our engine is built from the skin's joint list, finding the actual skeleton root and recreating the hierarchy
nodes used by a skin exist as both a bone and a node (duplicate transforms)
nodes used by multiple skins exist as a separate bone in every skin-skeleton pair (duplicate transforms)
supplied inverse bind matrices are ignored (our engine didn't have a way to use them directly anyway)
we do not support non-uniform scaling for nodes, but we do for bones, leading to potential mismatches if non-uniform scaling is used for bones (we currently enforce uniform scales)
not really related, but we use material names as unique identifiers for the materials of the skinned object, meaning we have to enforce unique material names when importing a glTF file.

Conclusion

To me, the above results are purely engine limitations; the engine is less flexible than the glTF specification. This can be annoying, but I like the glTF approach more, on a conceptual level. It can do everything a skeleton-centered approach can do, and more. When done right, I believe an implementation does not need to have a larger complexity or runtime cost than a skeleton-centered approach, either. But, of course, we usually don't take the time to rewrite our skinning code.

marstaik commented 4 years ago

I'm coming back to this issue after spending some more time with Maya, Blender, and a few different Importers and Exporters.

I have accepted that the joints array in the skin definition does not need to be a strict hierarchy. In terms of exporting a closer rendition of the scenes defined in 3d modeling programs, this is now reasonable to me.

From Maya and Blender, I was able to bind to joints/bones (not any random node) not part of the same hierarchy:

2c3

Note that in Blender, I had to bind to two separate armatures to mimic this behavior, but yes, it is possible:

2c2

However, I was not able to get a mesh to skin itself to another mesh (or other non-joint object).

I was able to handle importing these non-strict-tree skin definitions in Godot's glTF importer by performing union of disjoint sets and creating fake joints where non-joints lie in between joints.

But, now having used/dealt with various importers and exporters, I have come to the conclusion that the real issue that creates ambiguity in exported files is this:

Implementation Note: A node definition does not specify whether the node should be treated as a joint. Client implementations may wish to traverse the skins array first, marking each joint node.

Joints need to become explicit in the glTF specification, and I'll show you why:

joints

The current specification implies that skins define what the joints are in the scene. This too however is incorrect. It's the modeling program that defines the joints, not the glTF file.

If you export from whatever modeling program and re-import the exported file, you will not get back the same result most of the time.

Imagine trying to export separate meshes bound to joints and bring them into a single scene later. Imagine trying to export animations separately from meshes and bring them into a different scene later. Imagine trying to export just a skeleton tree (connected joints) and you can't (without an empty skin).

Now imagine trying to do all of the above while having to insert fake bones and create a skeleton definition for a game engine. However, because each scene in the modeling program has skins that mark different nodes as joints, the logic required to interpolate a skeleton so may never produce the same skeleton for different scenes.

Now try to deal with assets from Maya and Blender, with exporters written by different people. Some strip out zero skin weights - and why shouldn't they? The are unused by the skin. See: https://github.com/WonderMediaProductions/Maya2glTF/issues/93

So should exporters get around this by exporting a skin with no IBM's just to mark joints? This seems extremely stupid.

I believe that glTF needs to treat joints as first class citizens. They need to be marked on the nodes, the same way that meshes and skins are marked, even if its just a boolean flag.

Further, since I cannot get Blender or Maya to bind to anything other than joints/bones, I would propose that any "joints" in the "skin" must actually be marked "joints" in the node hierarchy.

Finally, the modification to the specification should look something like this:

Joints must be marked clearly in the "nodes" array, and thus be promoted to first-class citizens.
Joints must not be a mesh, camera, or anything other than a transform.
Skin joints array must point to nodes which are marked as being "joints".

So, could an exporter still not mark the original joints in the modeling software as joints? Yes. But it would be completely stupid to do so.

I believe that these simple changes (in addition to perhaps some clarification of the IBM's) could greatly improve the consistency of scene exports/imports across multiple applications.

jbherdman commented 4 years ago

I'm glad to see that you are starting to come around.

I believe that glTF needs to treat joints as first class citizens. They need to be marked on the nodes, the same way that meshes and skins are marked, even if its just a boolean flag.

Further, since I cannot get Blender or Maya to bind to anything other than joints/bones, I would propose that any "joints" in the "skin" must actually be marked "joints" in the node hierarchy.

The thing is, I would claim that Blender and Maya fail to treat their joints as first-class citizens. I had to go remind myself, but Maya more or less limits skinning to use its special "joint" nodes. (You may or may not be able to get around that at a lower API level, but it would probably give the UI a headache.) I haven't dealt with Blender much, but the "armature" system brings to mind Lightwave-style bone-skeletons (and other "bones are different than transform nodes" systems from the 90's).

A system like 3DS MAX doesn't have those restrictions. You can happily fire up MAX and skin a mesh using 2 camera-nodes as its "joints". That is because a "joint" isn't a special/distinct node type; you can use anything that has a transform-node as a joint, assuming that you store the appropriate bind-pose data somewhere.

So, in my mind, glTF is already treating its joints as "first class citizens" by simply allowing any transform-node to be referenced as a joint, and not requiring joints to be specially marked via some separate mechanism.

If you export from whatever modeling program and re-import the exported file, you will not get back the same result most of the time.

That is generally true for all non-trivial data conversion. It is much like running a sentence through Google Translate into a different language, and back again. Best case scenario, you will get something "functional", but the process will strip off a lot of nuance and artistic style from the original.

marstaik commented 4 years ago

So, in my mind, glTF is already treating its joints as "first class citizens" by simply allowing any transform-node to be referenced as a joint, and not requiring joints to be specially marked via some separate mechanism.

This seems extremely contradictory. If you want to say that skins are just use node transforms, that is fine - by your logic skins don't define joints, they just use nodes. Then keep it that way in the definition. But in a 3D modeling application and for many importers, they expect to be able to easily tell what a joint is.

A joint is a named entity in almost all modeling software. I see absolutely no reason to ignore its existence and shove it under the rug.

That is generally true for all non-trivial data conversion. It is much like running a sentence through Google Translate into a different language, and back again. Best case scenario, you will get something "functional", but the process will strip off a lot of nuance and artistic style from the original.

If you are a bad translator, sure. But I would expect a proper open source specification to allow, lets say, a blender document to be exported via glTF to Maya/MotionBuilder to do some proper motion capture handling, and then be able to be brought back into Blender. Or maybe go from MAX to Blender and back. A lot of 3D pipelines require consistent transfer between applications.

What is the harm in representing actual joint nodes in the specification? If the importing application doesn't care, then it doesn't care. But most of them do.

If you wanted to be less strict you could:

Rename the "joints" array in skin to "nodes"
Add in the "joint" flag on actual nodes, and force them to only be a transform

Maybe its better that way. At least the specification isn't trying to lie to itself. You can let any skin bind to whatever nodes it wants. But if we still mark actual joint nodes, importers can easily tell the user "Hey, we don't support skins on non-joint nodes" and call it a day. The current specification makes it difficult to even do that, because the skin defines "joints".

Sadly glTF doesn't have the weight that Autodesk has with FBX, since Autodesk has a complete modeling/animation pipeline and age to back it. And at this rate it never will if you don't allow game engines to make better use of this format. I and many others may as well go back and use FBX. It may be broken/inconsistent but at least anyone that uses the provided SDK's can generally make an importer that doesn't explode.

lexaknyazev commented 4 years ago

@julienduroure Could you please provide Blender-IO perspective on https://github.com/KhronosGroup/glTF/issues/1665#issuecomment-538216319?

Selmar commented 4 years ago

Imagine trying to export separate meshes bound to joints and bring them into a single scene later.

This is already problematic because joints are indices into the nodes array. To match indices, the entire hierarchy would have to be exported in every scene. I don't think partial exports are in any exporter's mind, currently. We're starting an implementation ourselves, where the most difficult challenge is animation target identification across glTF files. But that's a different topic.

Imagine trying to export animations separately from meshes and bring them into a different scene later.

I don't see why this is problematic specifically with skins and joints. Exporting animations separately is already difficult, because, as described above, the only thing you have to identify nodes across different glTF files is the name, which isn't necessarily unique. Animations work just like the skins; they can reference any node arbitrarily. Since joints are nodes, I don't see a joint-related problem here.

Imagine trying to export just a skeleton tree (connected joints) and you can't (without an empty skin).

There would be no way to mark it as a skeleton tree, indeed, but you should still be able to export the hierarchy as usual.

Now imagine trying to do all of the above while having to insert fake bones and create a skeleton definition for a game engine. However, because each scene in the modeling program has skins that mark different nodes as joints, the logic required to interpolate a skeleton so may never produce the same skeleton for different scenes.

Assuming the above problems are solvable, it may not be the same skeleton, but it should still have consistent results, correct?

Add in the "joint" flag on actual nodes, and force them to only be a transform

It seems to me this is the centerpiece of this discussion right now.

In an optimal world, it wouldn't be necessary to force a joint to be "just" a transform, for the same reason that a node can have both a camera and a mesh. Perhaps even a light on top of this, if you use this extension. This is currently possible, in the specification. Our own engine doesn't support this, so to this end I have made camera nodes be children of the nodes they are attached to in the glTF.

If "joint" would be an exclusive property of a node, then in my perspective, then cameras, meshes and lights should also be exclusive. In 3DSMax, this is already the case, though I don't know about Blender and Maya. Whether that's a good idea or not I don't know, but it seems unnecessary to enforce this in the specification.

Personally, I don't see a compelling reason to treat joint nodes any differently from regular nodes, other than implementation details which differ across engines. Our engine's implementation would not gain anything from this, currently.

If you are a bad translator, sure.

The JPEG analogy may work better for his argument, I think.

vpenades commented 4 years ago

@marstaik @Selmar If I understand correctly, you're trying to import glTF by taking its internal building blocks and trying to convert them into their respective engine specific counterparts.

If that's the case, then I understand why you're having so much trouble trying to import glTFs; if the glTF components and relationships don't have a perfect match with the engine's component counterparts, then some glTF configurations cannot be imported correctly.

I believe a good alternative approach to import glTF is with Sandboxing. So whenever you import a glTF model, all their internal structures are preserved within the sandbox. The engine interacts with the glTF through the sandbox, instead of trying to import all the components.

In this way, you don't have conflicting issues between glTFs, and you protect your engine from future changes in the glTF specification.

If the issue is about sharing resources across multiple glTFs, then I believe the right approach is to use a glTF toolchain to merge the scenes of multiple glTF files into a single big glTF with all the scenes contained inside. So the engine only needs to import the master glTF to access all the scenes through the sandbox.

BTW, I while ago I published the showcase of loading and rendering glTF files in monoGame, you can find the example here.

The monogame loader example loads every glTF model into a "sandbox". The interaction with monogame's graphics engine is minimal, since only glTF meshes and materials are converted to monogame's counterparts. But nodes, animations, hierarchy, etc, is preserved within the glTF sandbox.

marstaik commented 4 years ago

I believe a good alternative approach to import glTF is with Sandboxing. So whenever you import a glTF model, all their internal structures are preserved within the sandbox. The engine interacts with the glTF through the sandbox, instead of trying to import all the components.

What? You want to render a gltf files json straight to the renderer every time? Why wouldn't you want to match scene entities to the engines version and construct a scene? This seems absolutely stupid.

If the issue is about sharing resources across multiple glTFs, then I believe the right approach is to use a glTF toolchain to merge the scenes of multiple glTF files into a single big glTF with all the scenes contained inside. So the engine only needs to import the master glTF to access all the scenes through the sandbox.

Why on Earth would this be a correct solution? If an RPG had 1000 armors, you want me to have to import for an entire day everytime a mesh gets added?

If "joint" would be an exclusive property of a node, then in my perspective, then cameras, meshes and lights should also be exclusive.

To be honest, I don't know any engine that could support a camera-mesh-light node anyways. Seems like stupid design.

At this point we may as well take away cameras. Oh, and you know what, maybe lights should go to, they are not that special. The importer can figure it out. Hmm, now why should I bother exporting a mesh? The importer can also figure out that...

Aha, let's just only export nodes with no attributes, that will definitely make the format much more useful.

And again, no one seems to care about consistency of export.

By most of the logic presented here, we should just leave glTF to be a showgirl format and abandon it for something more practical. There goes ~~collada~~, ~~gltf~~, ~~gltf2~~, maybe collada2 will finally solve it.

Or maybe USD.

donmccurdy commented 4 years ago

In the interest of keeping this discussion productive, let's constrain the scope a bit — nearly all engines and DCC tools have some pre-existing concept of skins, skeletons, bones, and meshes, or at least some of those things. What would a strict skinning specification look like, maximizing portability of a glTF file across existing tools, with the assumption that the glTF file will be loaded into the tool's native object representations?

For my own opinion, while the current skinning specification does a sufficient job of defining a skinning representation that offers flexible technical features, it is perhaps not specific enough about the structure and best practices that allow a skin to actually be broadly portable across tools.

Unfortunately, I'm not at all confident that I know what a broadly portable skinning specification would look like. I would certainly be curious to get more feedback on @marstaik's suggestions in https://github.com/KhronosGroup/glTF/pull/1669.

If there is consensus on useful restrictions, clarifications or best practice – here is how I would imagine the process could proceed. We can't simply add the restrictions listed here to the glTF 2.X specification; doing so would invalidate many existing models, and is not compatible with our versioning process. Because glTF 3.X is likely to be some ways off, a near-term alternative could be to provide an extension (KHR_skinning_strict?) that imposes additional requirements without modifying the schema. Implementations can begin using that extension, and if things go well, the extension could become part of the glTF 3.0 specification later on.

Ideally, the extension would add new restrictions to the existing specification, rather than introducing a new representation that loses backward-compatibility with tools that support the existing spec.

donmccurdy commented 4 years ago

Here is an attempt to define a stricter skinning subset, for greater portability across engines at the cost of some flexibility: https://github.com/KhronosGroup/glTF/pull/1747.

WyattKimble commented 4 years ago

It's pretty much useless. No real-life reason to do this exists.

No game engine supports this.

No 3D modelling app supports this, as far as I know, so this simply can' t even be created intentionally.

@reduz, you keep making broad, unsubstantiated claims like this because that's all you know from your limited experience. Just because Godot doesn't do something doesn't mean that nobody else wants it or would ever do it. You're being presumptuous, arrogant, and unprofessional, right in line with your reputation. You are often wrong about these things and you disrespectfully defend your misinformed opinions to the death against people who know what they're talking about. Please stop.

donmccurdy commented 4 years ago

@WyattKimble I've marked your comment as off-topic. You may disagree with @reduz's claims, but please refrain from personal criticism and review the Khronos Group Code of Conduct on respecting differing experiences. Constructive disagreement is welcome, but this is already a complex and challenging thread, so please be conscious of that.