Inference or visualization in the wild

qinb commented 2 years ago

Thanks for sharing your excellent research. I wonder whether this method can infer in the 2d custom video or images. if yes, please offer some instructions. @salvetjx @damienreyt @mourotlucas

mourotlucas commented 2 years ago

Thanks for the positive feedback!

Our approach takes 3D joint positions as input for ground reaction force estimation and foot contact detection, and footskate cleaning additionally requires joint angles (that can obtained from 3D joint positions through retargeting to any skeleton present in our database).

To handle video or images, our method must be adapted, e.g. by first obtaining 3D joint positions from video or images (e.g. with AlphaPose)

qinb commented 2 years ago

Thanks for your reply opportunely.

In the ground reaction force estimation and foot contact detection, must 3D joint positions retarget to any skeleton present in your database?

Actually, I obtain 3D joint positions by SMPL regressor. The difference between the standard SMPL and the database's skeleton is as follows. Fig.1 is the standard SMPL skeleton.
Fig.2 is your database skeleton.

20200917222636165

you say "that can obtained from 3D joint positions through retargeting to any skeleton present in our database".
In other words, should it retarget the SMPL skeleton(Fig.1) to the database's skeleton(Fig.2) in my description? if yes, do you have any suggestions to retarget?

mourotlucas commented 2 years ago

Yes retargeting to a skeleton of our database will yield better results with the pretrained network since during training it has only seen the skeleton depicted in your Fig 2. Actually I did a few attempts on motions from AMASS to see how our method can generalise, and results of contact detection were visually pretty good (see below).

Retargeting in that very specific case is not very complicated since in the end the network will only use joint positions. I used to do a quick numerical optimization of joint angles (e.g. parametrized with quaternions) such that those joint angles applied to its skeleton (from our database) yield joint positions (computed via forward kinematics) close to input joint positions (e.g. SMPL skeleton).

Here are a few key points on how I was doing it:

joint angles initialization is not very important;
Adam optimization algorithm;
minimizing distances between source and target joint positions are minimized, MSE of distances works well;
different subsets for source and target joints can be used to minimize the distances, I used one to one correspondances with joints present in both skeleton and also discarding some joints for better results (e.g. trunk joints are not all needed but annoying artefacts can appear if all are used)
minimizing divergence of quaternion norms from 1, and projecting them onto the unit 4-sphere at each iteration to keep representing valid 3D rotations;

Here are a few example of foot contact detection on examples from amass (original skeleton is drawn but retargeted motion is used for detection):

https://user-images.githubusercontent.com/107840307/192708480-9386edd2-1daf-4c11-ba02-39de0ca001ee.mp4

qinb commented 2 years ago

Thanks for your so detailed and patient guidance, I desire to utilize your approach which generalization is pretty great, but don't have the base skill of retargeting. Do you mind releasing the code that retargets the SMPL skeleton to your database's skeleton? if inconvenient, could you offer by email: qinbr@outlook.com

mourotlucas commented 2 years ago

Sure, I added my prototype code to run contact detection on examples from AMASS in 'demo.py'. Look at functions 'retarget_to_underpressure' and 'contacts_detection_from_amass', as well as the 'if name == main' part at the end of file.

You will have to provide 3D joint positions as a single tensor of shape ... x T x J x 3 where T is the number of frame and J the number of joints, the framerate of the input joint positions and a skeleton for the retargeting (the first of our database by default).

Another important thing is the correspondance of joints between input and UnderPressure skeleton topology, which is defined in line 83 ('AMASS_JOINT_NAMES = ...'). 'retarget_to_underpressure' will be done according to joint names both present in 'AMASS_JOINT_NAMES' and UnderPressure joint names (see 'TOPOLOGY' in 'data.py'): distances between matching joints only will be minimized. You probably will have to edit 'AMASS_JOINT_NAMES' according to your skeleton topology.

qinb commented 2 years ago

Sorry, I reply late because the email doesn't remind me, and will try it. Thanks again for your patience and detailed guide.

luoww1992 commented 1 year ago

@mourotlucas i have the same question, now i am try it. i have made the dataset with default dataset, when i run: ---- python demo.py contacts_from_amass, it shows me error path is '*', i also see the code, the code is to load the joints, can you give me a demo cmd?

luoww1992 commented 1 year ago

@qinb are you sucessful with smpl ? when i run: python demo.py contacts_from_amass， i need to add the args.path: local points_file_path with smpl24 points ?

luoww1992 commented 1 year ago

@qinb @mourotlucas the step is: s1, get the smpl 24 3Dpoints from video, s2, get the angles and trajectory by contacts_from_amass in demo.py s3, get the result by cleanup in demo.py with args from s2 so it is right ?

luoww1992 commented 1 year ago

@qinb are you sucessful with smpl ? for help!

mourotlucas commented 1 year ago

argument -path shouldn't have default value '*', I removed it. The command should be 'python demo.py contacts_from_amass -path some_path' where 'some_path' points to a file containing 3D joint positions (saved with pytorch, i.e. torch.save, since demo.py use torch.load at line 179)

As explained above, 'contacts_detection_from_amass' method in file 'demo.py' is a prototype code to run contact detection on examples from AMASS, and 'AMASS_JOINT_NAMES' might have to be edited according to the joints provided.

A possible workflow to estimates joint positions from video and clean footskate as a post-processing step following our method could be the following:

estimate 3D joint positions from video
retarget estimated joint positions from estimated skeleton topology (e.g. AMASS skeleton topology) to UnderPressure skeleton topology. Please note that this step is not part of our method and is tricky; method 'retarget_to_underpressure' from 'demo.py' is an example but lots of details can be tuned depending on the use case.
detect foot contacts from retargeted 3D joint positions using our method (vGRFs are first estimated using our pretrained network and then mapped to binary foot contacts)
Cleanup possible footskate using our approach. Please note that this step is based on an optimization inverse kinematic approach and hence many things can be tuned. Please refer to our paper for more details.

luoww1992 commented 1 year ago

@mourotlucas in func: retarget_to_underpressure,
Q1: about args , we need to get the default 23 skeleton by TOPOLOGY from ours AMASS_joint_nam(maybe len(AMASS_joint_nam)>23), so len(joint_position_we_need)==23, the to run retarget ? Q2: if i want to retarget to my skeleton from the default 23 skeleton, i need to change the TOPOLOGY and joint_position Q3: the more the better about niters ?

can you share your test joint data file about amass.mp4 ?

mourotlucas commented 1 year ago

Q1: Since our deep neural network has been trained on our dataset (containing foot contact ground truth), it requires UnderPressure skeleton topology as input. To adapt to other skeleton topologies (e.g. AMASS skeleton topology), motion data must be first retargeted from input skeleton topology (e.g. AMASS skeleton topology) to UnderPressure skeleton topology, which is what method 'retarget_to_underpressure' attempts to do. Given input joint positions 'joint_positions', this method optimize joint angles such that joint positions computed by forward kinematic with the given skeleton 'skeleton' best matches input joint positions. To do so, differences between matching pairs of joints are minimized. Argument 'joint_names' parametrize (demo.py, lines 43 to 49) which input joint must be paired with which UnderPressure skeleton joints (which are defined in data.py, lines 19 to 43).

Q2: As explained in previous comment (step 2), you need to retarget estimated 3D joint positions from AMASS skeleton topology to UnderPressure skeleton topology so that foot contact can then be detected using our deep network (see above). Depending on which joints are first estimated, you will have to edit 'joint_names' argument (set by default to AMASS_JOINT_NAMES in demo.py) such that 1) joint names order matches joint positions order in argument 'joint_positions' and 2) estimated joints corresponding to UnderPressure skeleton joints are accordingly named (see above and data.py lines 19 to 43). Only joints with matching names in 'joint_names' and UnderPressure skeleton joints will be used in the optimization loop of 'retarget_to_underpressure'.

Q3: As observed empirically, no. retargeting is a difficult task consisting in transfering motion from a source skeleton to a different target skeleton, which is not well defined. The optimization done in 'retarget_to_underpressure' minimizing joint position differences is a naive implementation of retargeting and hence introduce artifacts if joint position differences are to low. A limited number of iterations allows to stop the optimization before artifacts appear. There is no good value for this parameter in the general case, so that you might have to tune it. We empirically found the default value we set in demo.py (150 iterations) to be good for a few samples from AMASS.

What amass.mp4 ? If you mean data samples from AMASS, unfortunately no. Please see #5

luoww1992 commented 1 year ago

@mourotlucas i have see many steps in code, when i run func cleanup in demo, and prepare to show it in blender. i have a question, the result : angles, skeleton, trajectory = cleaner(item["angles"], item["skeleton"], item["trajectory"]) Q1: angles is in world space or in skeleton space ? if in world, how to translate to skeleton space with parent-children ? Q2: trajectory is in world ? i notice when i need to show the result , it will do some calculation with ground value, then show result by trajectory an ground value . Q2.1 : the trajectory value is pelvis position ? or the whole body root position. i find it is too large in Z direction Q3: the order of angles is the same as the default 23 skeletons ? Q4: in MocapAndvGRFsApp(), i notice it needs global_position, global_trajectory, and get a new angle by two farmes angles in Skeleton.update(), how to do it detailly ?, i can't find where to get the args: frame, the use these in def update(self, frame: int, prev_frame: int, next_frame: int, dframe: float)

mourotlucas commented 1 year ago

Q1: both angles and skeleton are expressed w.r.t. the global frame which seems to be what you call world space. If I understand well, what you call skeleton space is expressing joint angles w.r.t. to their parent joints. To translate you must express each global joint orientation with respect to the global orientation of its parent (like expressing a position w.r.t. another but with 3D rotations which are not commutative so be careful) Q2: yes Q2.1: not exactly, in our implementation trajectory is simply added to all joint positions after FK. The reason here is that it allows to have a single (static) skeleton for an entire sequence stored separately from a temporally changing global position (=trajectory).

ps: pose representations in character animation are a bit involved to work with, especially when going down into the details. If you are not already familiar with them I suggest you take a look at the section 2.1 of our survey paper https://arxiv.org/pdf/2110.06901.pdf, and if this is still too involved you might want to follow some course or textbook on character animation and/or representations of 3D rotations.

luoww1992 commented 1 year ago

@mourotlucas https://github.com/InterDigitalInc/UnderPressure/issues/3#issuecomment-1372000266 about Q1， have you some advices ? if translate the result from global space to skeleton space , that will be more useful in use. and it is better in after steps.

other trys: if we use 3d key points in camera coordinate space to do cleanup ?

mourotlucas commented 1 year ago

If your are not comfortable with implementing it, check out existing animation frameworks e.g. pynimation.

Our network has been trained in global world coordinate space, so there is no reason foot contact detection works well in camera coordinate space. Same for our proposed cleanup approach since it relies on our network at each optimization iteration.

luoww1992 commented 1 year ago

i have see pynimation doc for several hours, the human.fbx base pose is T-pose. this is my code to trans world to skeleton space about angles:

import numpy as np
from pynimation.anim.animation import Animation
from pynimation.common import data, vec

angles, skeleton, trajectory = cleaner(item["angles"], skeleton, item["trajectory"])
index = [0, 9, 10, 11, 12, 21, 22, 17, 18, 19, 20, 13, 14, 15, 16, 1, 2, 3, 4, 5, 6, 7, 8]

fbx_file = data.getDataPath(r'data/animations/human.fbx')
animation = Animation.load(fbx_file)  # the default sort after loading is changed, use index to make it right  

# global Quat
gs = animation.globals[:, index]
for angle, g in zip(angle, gs):
    # only change the body direction
    g.root.rotation.quat=vec.Vec4(angle[0])

animation.save('rootQuat.fbx')

================================== while when i open the rootQuat.fbx, this is nothing changed ? can you give me some advices ？

some others animation frameworks ?

luoww1992 commented 1 year ago

@mourotlucas maybe is that: s1, get result angles, skeleton, trajectory by cleaner(item["angles"], skeleton, item["trajectory"]) s2, get skeleton keypoint global position by anim.FK(angles, skeleton, trajectory, TOPOLOGY) s3, set global position in pynimation, then get local pose is right ?

mourotlucas commented 1 year ago

Sorry but I don't know much about pynimation, I can't help here.

InterDigitalInc / UnderPressure

Inference or visualization in the wild #3