Multiple Animation Support for USD

PixarAnimationStudios / OpenUSD-proposals

Share and collaborate on proposals for the advancement of USD

106 stars 27 forks source link

Multiple Animation Support for USD #11

Closed dgovil closed 1 year ago

dgovil commented 1 year ago

Multiple Animation Support for USDSkel

This proposal suggests an addition to USDSkel that would allow for multiple animations to be bound to a skeleton.

I also recognize that there would be follow up desires to have a more generic solution, which I make light mention of here as I prepare a follow up proposal.

I think the USDSKel route is the fastest, and should not be blocked on a more general solution, but I make mention of it and will prepare a secondary proposal to address that idea. However I do think the general solution would take considerably more time and effort.

Contributing

[ X I agree to and accept the Supplemental Terms.

cameronwhite commented 1 year ago

This is great to see and would be very useful for crowd assets!

I think a separate relationship like option 2 for the schema sounds much better to me, perhaps even with a different name to make it more distinct from skel:animationSource. IMO there's a use case for both and I don't think this should replace skel:animationSource

The skel:animationSource relationship defines the pose used when the skeleton or skinned meshes are displayed, so changing this into a list relationship (as in option 1) seems confusing to me, e.g. one might expect that the list of animations would be blended together in some manner when rendering the asset. It seems the intention is more that this is a "library" of clips that the DCC can choose to do something with, but doesn't have any effect on rendering?

I think this distinction is also important for the case of having many instanced characters for a crowd, as in https://openusd.org/dev/api/_usd_skel__instancing.html The skel:animationSource can be inherited down to override the animation that should be displayed for a particular instance. So for a crowd that has been exported for rendering, this might have the unique poses produced by the crowd simulation for each agent, with things like IK, ragdolls, etc applied. Having a separate relationship, e.g. skel:animationLibrary, which does not have inheritance and can only be applied to the skeleton, would let you separately describe the original clips that are available for any instance of the agent.

dgovil commented 1 year ago

Yeah, animationLibrary sounds less likely to get confused, and is clearer as a name. I'll update the PR with that suggestion

hybridherbst commented 1 year ago

+1 to solving multiple animations in general and not just for skeletons. It's unclear to me why rigged characters would have multiple animations but non-rigged (e.g. mechanical) characters that don't use UsdSkel, or generic files, wouldn't.

Treating "animation clips" differently between skel and non-skel assets is a bit strange anyways (I think done at some point for performance?) and this PR would splinter this further, or do I understand the proposal wrong?

dgovil commented 1 year ago

To be clear, I’m not saying one over the other. I’m saying let’s do skeletons first and general case later because the general case will take significantly longer.

Skeletons are relatively easy because they already separate the animation from the hierarchy, and encapsulate them within a single prim. Therefore adding support for more animations is as simple as pointing to multiples of those prims. They also have the advantage that the skeleton hierarchy itself is defined by a single prim as well so it’s just a 1 to 1 mapping changing to a 1 to many mapping. If we agree on a proposal, it’s a change we could theoretically implement within a week.

On the other hand, other prims have their animation inline, and the object hierarchy is defined over multiple prims. To have many animations would require quite a bit of work to add to USD. Each prim includes its own animation, so choosing between alternate animations would require either introducing a shadow hierarchy of animations to prims, or a new type of variant selection per prim that is runtime friendly. Then you’d have to sync that selection across the hierarchy.

It’s not insurmountable but I think it’s the kind of thing that would require a long amount of discussion and implementation would also be lengthy. It would be a massive undertaking that might take a year or two to complete given the other work that exists.

Hence why I want to split the two. Skeletons are something I can fathom developing in a quick time, and would likely encompass the majority of use cases. Many games and renderers treat animated objects as single joint skinned objects anyway to similarly simplify their pipeline.

I’ll work on the general case proposal, but I think it’ll be a lot more work to do correctly. Since skeleton’s already do their own thing, I don’t think this split approach would necessarily be incongruent.

On Thu, Jun 15, 2023 at 6:15 PM hybridherbst @.***> wrote:

+1 to solving multiple animations in general and not just for skeletons. It's unclear to me why rigged characters would have multiple animations but non-rigged (e.g. mechanical) characters that don't use UsdSkel, or generic files, wouldn't.

Treating "animation clips" differently between skel and non-skel assets is a bit strange anyways (I think done at some point for performance?) and this PR would splinter this further, or do I understand the proposal wrong?

— Reply to this email directly, view it on GitHub https://github.com/PixarAnimationStudios/USD-proposals/pull/11#issuecomment-1593914896, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABB4XURKRJSFDZDBANMI4B3XLOXUDANCNFSM6AAAAAAZIHF6SQ . You are receiving this because you authored the thread.Message ID: @.***>

hybridherbst commented 1 year ago

Prioritizing the quick solution over the right solution is not necessarily a good thing for a format that wants to be and/or become a standard.

Aren't clipSets already "some" mechanism for multiple animations?

dgovil commented 1 year ago

I would say that "right solution" and "general solutions" aren’t synonyms, and I am certainly not trying to do something just for the sake of expediency. After all, I am in an unfortunate position where I'll face the technical debt of choices worse than many.

UsdSkel already does things differently when it comes to storing the Skeleton definition and SkelAnimation separately, so I'm trying to address a solution that works well within the confines of that. There are also several advantages to using SkelAnimation like sparse joint declaration which can be very beneficial for runtime use

Value Clip Sets are a good suggestion and something I was going to be mentioning in the general proposal I'll put up separately. However there's some complexity there with regards to defining the variable extents that arise and management of what would become a shadow hierarchy as I mentioned in my last post.

I'll work next week to put up the general proposal sooner so you can comment on that as well, but just like UsdSkel hierarchies can co-exist with Prim hierarchies, I don't think this has to be a mutually exclusive situation.

On Fri, Jun 16, 2023 at 8:48 AM hybridherbst @.***> wrote:

Prioritizing the quick solution over the right solution is not necessarily a good thing for a format that wants to be and/or become a standard.

Aren't clipSets already "some" mechanism for multiple animations?

— Reply to this email directly, view it on GitHub https://github.com/PixarAnimationStudios/USD-proposals/pull/11#issuecomment-1594898087, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABB4XUTRUDJTNTA6XWIG76LXLR56LANCNFSM6AAAAAAZIHF6SQ . You are receiving this because you authored the thread.Message ID: @.***>

spiffmon commented 1 year ago

I have not read the proposal yet (but have discussed with @dgovil in the past), but wanted to respond to @hybridherbst . UsdSkel is by design a very special case of "rigged model" that was designed specifically for extreme scalability, rather than expressiveness and generality. Value Clips can be used in conjunction with SkelAnimation, just as they can be applied to anything else in USD, but they are not necessarily the ideal choice for general animation, either.

In our pipeline, "animation libraries" (alibs) are part of a broader toolkit that can apply/adapt animation between different characters, and sit "outside" of your scene, being applied destructively to it (generally in a layer unique to the animator working on a shot). I think we'll want to be informed by OpenExec and potential animation blending features before we try to tackle general animation assets, so I concur with @dgovil that we should address UsdSkel concerns in a way that is consistent with that schema domain and its (intentional) quirks, provide value there, and come back to the general problem later. I know we want to solve the general problem for some needs/workflows, I just don't think we want to rush it prematurely in USD.

pkanyuk commented 1 year ago

Very cool, thanks for the thoughtful proposal Dhruv! I'm going to run this by some other stakeholders in the crowds department who use UsdSkel quite a bit, but I have a few thoughts offhand:

1) If you're under any kind of time pressure, I actually think Value Clips might be your best bet to move things forward. I'm picturing that you could define your default SkelAnimation prim in the asset (t-pose or a-pose maybe), and each alternative animation would be defined as overs in a separate clip usd file. Then you can use valueclips in to stitch together as many as these as you want. As an additional benefit, usdview could visualize this just fine without any code changes.

2) When we originally created UsdSkel, there were thoughts about supporting basic blending, but in practice, crowd systems often have requirements far beyond simple pose blending. Instead we ended up creating internal schemas called PxAgent, PxAgentClip, and PxClipGraph for use in our proprietary animation system, which adds quite a bit of extra data. That said, there's some commonality between needs of crowd simulators, game engines, etc, and our hope was to ultimately work with the community to use this as a basis for an open standard. Naturally this is a much bigger project, but for a fuller set of features, this seems like the logical place to go.

3) Our colleagues at WDAS recently proposed a new scheme called "UsdSkelPoints" that essentially works as a UsdGeomPointInstancer, but for UsdSkel assets. The idea would be to instance combos of Skeleton and SkelAnimations to have UsdSkel scale to the hundreds of thousands or millions. Right now, even with scene graph instancing, UsdSkel gets pretty heavy in the tens of thousands range. However, even though the use case for UsdSkelPoints is extremely large crowds, it could also describe a single character with a menu of many possible animations.

None of these ideas necessarily contradict your proposal, but I just wanted to get that info out there in case this helps with the direction of your project. Potentially (2) and (3) could even build on this proposal and make use of either animationSource uses lists of relationship targets or a separate animationLibrary relationship.

Again though, my gut is to go with value clips since it's a system that's already working. I think it's possible to mock this using the open source UsdSkel example at https://openusd.org/release/dl_usdskel_examples.html . The character, HumanFemale, has HumanFemale.keepAlive.usd and HumanFemale.walk.usd, which conceptually could be used as value clips. Unfortunately, they actually definite the SkelAnimation prim rather than just specify overs, so there needs to be some restructuring to get those defs into the asset. In fact, it may be a good idea to update the open source UsdSkel example to use value clips in order to make sure all the DCCs are supporting that feature correctly. Spiff, do you think that 's a good idea in any case?

Thanks! -- Paul

dgovil commented 1 year ago

Hey Paul,

Thanks for the response.

So we’re not really under a time pressure so much as it’s something that’s often requested and I’m hoping to find a good solution in the ecosystem so that we can build upon it with confidence.

Value clips, like Felix and yourself suggested, do make sense but here are the issues I was hitting when working through various ideas:

They require a separate file per clip (or at least the documentation seems to very strongly push for that). That is alright of course in a studio context but I worry about requiring multi file composition to enable it, as many users and DCCs have a single file mindset which will be untenable to break, I feel. I’d like to see something that could exist in a single file if need be to simplify DCC workflows where most users would just be wanting to export a singular file .

It’s of course doable for skel Animation , but I worry that for regular xforms there isn’t a good way to express a copy of the hierarchy per Animation that shouldn’t be brought in to the stage view. Perhaps though, they just get described as a set of overs, encapsulated under a scope. Though I would almost want to introduce a scope like encapsulation specifically for it, much like materials act for shaders. That way a DCC/engine can more intuitively surface them.
I felt that value clips didn’t offer a way to separately define the extents as they would be after application. Though thinking about it now, that could be done with an intermediary over that defines both the animation binding and the extents. I think it would therefore go back to the issue of having a place to put them within a single layer backed stage.
There doesn’t seem to be a way to easily query the set of the parameters+values affected by each clip. That was one of the appeals of sticking to multiple SkelAnimations, because it’s nicely bounded, and therefore a game like runtime has an easier time seeing what varies when constructing its internal representation of an animation clip.

With USD clips, it becomes harder to determine what has been overridden, especially since it allows for multiple files to be stitched together.

There is the manifest option but it only seems to tell you what properties will change, but I don't see any API to extract the actual final calculated values from each clip. Also, it seems to require a separate file (or computed at runtime). It would be nice (similar to 1) to have a way to contain it within the same layer.

Right now a game engine (which wouldn’t use USD at runtime) for example

would have to apply each clip and pull in the data and do the comparison, unless I’ve missed some APIs.

Perhaps there could be some APIs within USD to extract and present the isolated view of each clip data , to help consumers of USD isolate the properties without needing to replicate USDs clip logic as well.

———

Anyway , those were my concerns with clips. Thinking about it more, I think they’re likely more easily solvable than I thought (though not easy), especially if we could solve those two key points:

Have a convention to structure all the various clip animations/information within a single layer usd, without mucking up the stage hierarchy too much. I'm a fan of composition but I'm wary of forcing expression of data behind it.
Having an easy API to isolate each clip as a pre-stitched view of the data (properties and their values).

I was concerned that the complexity of solving those might be a very long process, but perhaps they don’t have to be. It would be good to get your thoughts on the those.

I’ll spend some time next week and work up an example with clips instead for this PR and what the API changes would need to be.

Thanks again everyone for the valuable feedback.

pkanyuk commented 1 year ago

Hi Dhruv,

Those are all great points about value clips. I double checked and I don't see any examples where value clips aren't separate files, so that may very well be a requirement, which to your point could get very messy. I'm not sure if USDZ has the capacity to archive them all together, thought that does seem like a bandaid. Your sugs are all good areas for improvement for value clips.

Regarding a potential standardization of what could go into a value clip used for SkelAnimation, that was the idea behind Pixar's internal PxAgentClip schema. Were we to turn that into something public, that could help formalize which properties can vary, as well as add important bits of data needed for DCCs.

Talking this through, I see why you're proposing just a modest extension of UsdSkel. I'll run this by the other crowd folk, thanks for explaining your thought process behind what you're going for!

-- Paul

dgovil commented 1 year ago

Thanks, Paul. And if the changes to value clips are something you think are reasonable, I’m also happy to work with y’all on that instead if it could provide a standard form instead of my proposal for skel animation.

On Sun, Jun 18, 2023 at 10:09 PM Paul Kanyuk @.***> wrote:

Hi Dhruv,

Those are all great points about value clips. I double checked and I don't see any examples where value clips aren't separate files, so that may very well be a requirement, which to your point could get very messy. I'm not sure if USDZ has the capacity to archive them all together, thought that does seem like a bandaid. Your sugs are all good areas for improvement for value clips.

Regarding a potential standardization of what could go into a value clip used for SkelAnimation, that was the idea behind Pixar's internal PxAgentClip schema. Were we to turn that into something public, that could help formalize which properties can vary, as well as add important bits of data needed for DCCs.

Talking this through, I see why you're proposing just a modest extension of UsdSkel. I'll run this by the other crowd folk, thanks for explaining your thought process behind what you're going for!

-- Paul

— Reply to this email directly, view it on GitHub https://github.com/PixarAnimationStudios/USD-proposals/pull/11#issuecomment-1596512005, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABB4XUUEGB63S4LM7TUIMPLXL7NJJANCNFSM6AAAAAAZIHF6SQ . You are receiving this because you were mentioned.Message ID: @.***>

lchai commented 1 year ago

Hi Druv. We had a quick discussion about your proposal at WDAS. I think the one concern that we had with regards to using value clips was, as you mentioned, it becomes harder to determine what has been overwritten. We thought this might affect our ability to instance large numbers of crowd elements, as if we can't determine what's been overwritten, it would limit reuse. We thought that it might be more straightforward for the render delegate to determine whether a particular character had been posed and skinned and could therefore be instanced using the originally proposed workflow. We're just starting to look into this however, so maybe take this with a grain of salt. Thx!

spiffmon commented 1 year ago

@lchai , are you saying that at WDAS you do not instance your SkelRoot'd models? If you are, then I'm not sure how the use of ValueClips would impact insancing, as the clips would (could) only contain overrides for the SkelAnimation prims and the SkelRoot prim(s) themselves. Apologies if I'm misunderstanding the concern!

lchai commented 1 year ago

So I worked on our previous crowds pipeline, and haven't looked into our current Usd implementation, and really only have a relatively elementary understanding of how Usd works in general, so I could easily have many things wrong. I'm under the impression that we're currently not instancing our UsdSkelRoot models. In our previous render pipeline, we had a skeleton, and the animation data for each instance was all baked out as point attribute data and would have a precomputed hash value associated with it. If we came across with a point with a hash value that we had seen before, we could just instance the previously deformed mesh. Just reading the docs, it looks like with value clips, animation can be layered on, so it seems like it might be more difficult for the render delegate to know if it has run across the exact pose as before, and therefore end up having to redeform the mesh. Also, it looks like the value clips are stored as separate files, which seems more cumbersome to deal with from a pipeline perspective. Anyway, being able to represent the crowd data more compactly and to do more render-time instancing is something we want to investigate, so I thought I would just mention something that we were wondering about, but I could have a misunderstanding of how things actually work in Usd. Thx!

dgovil commented 1 year ago

Sorry for the delays in updating this proposal. It's been a busy week. I've updated the proposal to represent the various solutions I think exist for a general purpose solution and what their pros and cons may be.

So far I'm leaning towards upgrading value clips to meet the issues I highlighted previously, for the general solution. I do think there's some value in a UsdSkel approach too if the value clips solution is not feasible, but hopefully I did an okay job with recapping things.

dgovil commented 1 year ago

@lchai I wanted to revisit this again to maybe start the necessary work and was wondering if WDAS had any more thoughts based on your previous comments?

dgovil commented 1 year ago

Apologies. I think I accidentally rolled back a commit that remove the general.md. Added it back now

meshula commented 1 year ago

Hi folks, I've approved the review we so can land it for further iteration, as we did for spline animation. Is everyone cool with that?

pkanyuk commented 1 year ago

Sounds good!

On Wed, Nov 8, 2023 at 3:52 PM Nick Porcino @.***> wrote:

Hi folks, I've approved the review we so can land it for further iteration, as we did for spline animation. Is everyone cool with that?

— Reply to this email directly, view it on GitHub https://github.com/PixarAnimationStudios/OpenUSD-proposals/pull/11#issuecomment-1802928011, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAQSSC7B4QXYIPPC4OODVBLYDQLL3AVCNFSM6AAAAAAZIHF6SSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMBSHEZDQMBRGE . You are receiving this because you commented.Message ID: @.***>

dgovil commented 1 year ago

Thanks, Nick and Paul.

spiffmon commented 1 year ago

Had some thoughts about this recently that I wanted to share. Firstly, I've also come around to believe we would regret a UsdSkelAnimation-specific solution here, or even a UsdSkel-specific solution. In addition to OpenExec and the (further) future need to package animations for arbitrarily rigged models (and even just transform-hierarchy models today, as mentioned by @hybridherbst above), two things have happened that I think will make a SkelAnimation solution insufficient even for UsdSkel:

UsdAnim, and the known, strong desire by game studios and others to want to interchange spline animation for Skels. Given that splines will be limited to scalar-valued attributes, a "packed" SkelAnimation cannot be encoded using splines. So I think we'll want to go back to an idea we considered, but eventually put aside, during the design of UsdSkel: encoding joints as an actual transform hierarchy of (e.g. SkelJoint) Xformable prims. So there would be two possible ways to encode animations for a Skel - either the existing relationship to a SkelAnimation, or "direct" animation of the Joint hierarchy, which, if it is located inside a SkelRoot, gives us a nice encapsulation as @dgovil discussed above.
The "springbone" issue raised by Alan Kent on the AOUSD forum points to the more general desire to eventually combine UsdPhysics with UsdSkel, and those needs will also greatly benefit from having actual SkelJoint prim hierarchies on which to associate physics concepts and behaviors.

So my conclusion is that we do want to pursue clips for encoding animations. I'll address some of the key concerns Dhruv raised, but first note that the major limitation currently is that there isn't a way to nicely package up a collection of alternative animations using clips, which was the whole point of this original proposal. However, I think there's a conceptually simple enhancement we can make to ClipSets that should provide that feature and give us a way to handle switching animations quite efficiently.

We would add an enabled field to each clipSet definition, which if not present is true for backwards compatibility. When enabled is false then the clipSet contributes nothing to value resolution, but is still "warm" and known to the Stage. Given that any prim can host an unlimited number of clipSets already, we now have a way to encode many alternate animations, and can enable subsets of them with very simple authoring operations. One extra nice thing that this could provide is that UsdStage change processing can handle changes to the enabled fields specially, and produce a changelist that just merges the lists of properties contained in the manifests of all changed clipsets, so there is no prim recomposition, and an ObjectsChanged notice that names just the animated properties. It'll be worth carefully thinking about how this should interact with instancing and instanced UsdSkel workflows for crowds, but it also seems like there is a very distinct use-case for UsdSkel in games and some other pipelines where richness of representation and interchange is much more important than ultra-scalability.

To some of the concerns @dgovil raised:

There is nothing I've been able to discover that fundamentally prevents multiple clips from coexisting in a single layer, and I believe you should be able to provide both geometry and multiple animation clips in a single layer, though it would be a little awkward due to the need for the geometric description to be strictly weaker than the site that applies the clips. I think three layers would be ideal (One "normal" geometry layer, one "all clips" layer, and one root layer that applies clips and subLayers the geometry layer), while a two-layer setup where the second two layers are combined into one would be next-best... to do it in one means putting the geometry "off to the side" and then referencing it all into place so it is weaker than clips applied as local opinions in the layer, over the references).
I think the API enhancements discussed for interrogating clips are all plausible. The one note of caution is that in general you won't know whether any particular ValueClip actually is the value source for any particular attribute, because it could be overridden by a stronger opinion, which may not even be animated.

Sorry this basically contains a mini-proposal within a proposal - just wanted to keep the discussion going efficiently. We might be able to tackle the enabled addition sometime in 2024 if we agree it's a good direction to take ValueClips. Very curious to hear @pkanyuk and @lchai 's thoughts from a crowds perspective.

pkanyuk commented 1 year ago

Hi Spiff,

Thanks for the great thoughts and info, overall it sounds good to me! I was cautious about my initial suggestion about using value clips since they can sometimes seem a bit cumbersome/restrictive. If we're down to and the necessary features to make them work with @dgovil 's use case, I'm all for that.

That's good to know about the plan to support a joint-primitive based UsdSkel representation. I haven't really seen much of a need for it on my end from a crowds perspective, but it does make sense any per-joint data would benefit from that representation. I would also support interop with strange rigging conventions like having meshes or pivots interleaved with joints. It would be great if the UsdSkel query APIs still work with per joint animation so that we can still get at the pose matrices/quaternions/etc. and not have to deal with the scalar transform attrs used by UsdAnim.

From a scalability perspective, I still think we need something like WDAS' UsdSkelPoints proposal for getting beyond the well known UsdSkel scale limitations, but that's a separate discussion from ths proposal.

Thanks! -- Paul

meshula commented 1 year ago

strange rigging conventions like having meshes or pivots interleaved with joints

That's what the glTF folk have suggested they need in order to have one to one inter-op with USD

dgovil commented 1 year ago

Thanks for the detailed writeup, Spiff. We were mulling over the proposed direction. I think it sounds reasonable in so far as it would enable the creation of clips/takes like FBX.

I especially do like the idea of introducing joint hierarchies.

With regards to the two points around encapsulation:

I have been wondering if we should introduce an explicit "Do not treat as part of the scene" container prim. I'm not sure if a typeless prim is inherently the same , but I imagine having an explicit prim type would be great organizationally. I can imagine in the future that other features might want the same kind of non-scene hierarchy that they can reference from.
I think not knowing if it's the value source is probably fine as long as you can tell what the value is when that value clip set is enabled, such that an engine/runtime could then pull that out into some representation of what deltas there are

For the container prim type, I can put up a separate proposal if you think that should be isolated out, but I'd like to bring that up here first. One area where I think it could help is running flatten on the layer, but you want to keep the various clips in there, or if a DCC is trying to export things into a given file. It would be nice to have as an organization area that is still part of scene description, but not imaged or part of default traversal.