Garfield-kh / PoseTriplet

[CVPR 2022] PoseTriplet: Co-evolving 3D Human Pose Estimation, Imitation, and Hallucination under Self-supervision (Oral)
MIT License
305 stars 25 forks source link

how to visualize on 3d model? #2

Closed lucasjinreal closed 2 years ago

lucasjinreal commented 2 years ago

Hi, how to convert the generated 3d pose to bvh or fbx?

Garfield-kh commented 2 years ago

Hi, Thank you for the interest! I built my IK part based on this video2bvh with different joints settings. As the pose estimator provides 3D position of joints, and the RL requires axis-angle for each joint. This IK converts the estimation result to usable reference motion for RL.

Hope this helps, :)

lucasjinreal commented 2 years ago

@Garfield-kh thanks for your reply. I have ran the code it can basically seamlessly connect with video2bvh except a tiny problem:

image

if I guess it right, video2bvh simplest demo using h36m skeleton it was 17 key points and it provides a clear skeleton file of it.

However, this repo using just only one less keypoint:

image

I digged around a little bit didn't found a clue indicates the definition on each key points topological definitions.

Can you provide one or just like video2bvh does provide a skeleton file for users to further experiment on real 3d models?

It would be great if we can also have a file like this h36m_skel = h36m_skeleton.H36mSkeleton()

Garfield-kh commented 2 years ago

Hi, I have upload the pose2bvh IK part here estimator-pose2bvh. I also include a XML file for mujoco which is matched with the BVH file for RL reference motion. The imitator is built base on RFC, you may check it for more detail about the RL visualization part.

lucasjinreal commented 2 years ago

@Garfield-kh thanks. that's very quick response. However, I ran it but found the bvh upper body not fully right:

image

Does there anything not correct?

Am simply using kunkun's example npz pred3D_pose/bilibili-clip/kunkun_clip_pred3D.pkl generated by example.

And then:

Converter = humanoid_1205_skeleton.SkeletonConverter()
prediction3dpoint = Converter.convert_to_21joint(pose3d_world)

human36m_skeleton = humanoid_1205_skeleton.H36mSkeleton()
_ = h36m_skel.poses2bvh(prediction3dpoint, output_file=bvh_file)

pose3d_world is generated 3d pose.

Garfield-kh commented 2 years ago

How about the 0626_take_01_h36m.bvh? is that showing properly in blender?

Garfield-kh commented 2 years ago

Here is the result I tried in blender. image And maybe you can try bvhacker, as from my experience, the blender will modify something when import bvh...

lucasjinreal commented 2 years ago

@Garfield-kh 0626_take_01_h36m.bvh in test_data opened shows nothing on my blender.

that's weird. what's your blender version? Mine is 3.0

image

I noticed the bvh header not exactly same between your test data and mine:

image

Here is my bvh gened: a.txt

lucasjinreal commented 2 years ago

@Garfield-kh Can u kindly also provide this file if possible? I really want vis the human on RL env.

image
Garfield-kh commented 2 years ago

My blender version is blender-3.0.0-windows-x64 I tried re-run the pose2bvh_debug.py from zip file, it will generate 0626_take_01_h36m.bvh from np.load('bvh_skeleton/test_data/h36m_take_599_predicted_3d_wpos.npy', allow_pickle=True)[690:710], which looks fine on my side... I also tried load the estimation pose: # prediction = np.load('bvh_skeleton/test_data/h36m_take_599_predicted_3d_wpos.npy', allow_pickle=True)[690:710] prediction = np.load('../../PoseTriplet-test/estimator_inference/wild_eval/pred3D_pose/bilibili-clip/kunkun_clip_pred3D.pkl', allow_pickle=True)['result']
The result is fine as shown above .

The header is different possibly means the rotation_order='xyz' in get_bvh_header(). Did you change anything from the zip file? Or is is due to the environment my requirement file?

The comon folder for your reference. Code is under cleaning...

lucasjinreal commented 2 years ago

@Garfield-kh thank u for you patient, let me try again tomorrow.

lucasjinreal commented 2 years ago

@Garfield-kh I finally found where the skeleton was:

image image

why the bvh so far from origin point and orientation looks up side down?

Garfield-kh commented 2 years ago

Nice to see the problem solved! The position is decided by the root position, if you read the root position of npy file, you will see it is far from zero position. The orientation is also decided by the npy, here up side down is because the npy is under camera coordinate space. In my experiment, I will rotate it to Z-up world coordinate.

lucasjinreal commented 2 years ago

@Garfield-kh thanks, I indeed load directly from np which is raw generated. How could I convert the root position to the right one (let's say the origin point which can be parsed correctly by blender or other 3d software?)

Meanwhile, the whole sequence can be also adjusted?

Garfield-kh commented 2 years ago

There is a function called rigid alignment, which scale, rotation, and translation the whole sequence tx16x3. So if you have a standard standing pose (place at origin, and upside down), you can try use it as a alignment reference, caculate the relative rotation, and translation from raw generated pose (t=1), and transform the whole raw generated pose to the origin.

lucasjinreal commented 2 years ago

@Garfield-kh thanks, it's helpful. I am pretty new in pose transform. It looks like we can get aligne pose with:

    predicted_aligned = a * np.matmul(predicted, R) + t

But how can I get a target, which is T-pose here?

Garfield-kh commented 2 years ago

If you want a T-pose [standard standing pose (place at origin, and upside down)], you can try manual set a 1x16x3 pose, where the bond segment length is the same as your test data. Or if the assumption camera is horizontal can be hold, you can try a rotation matrices R whcih directly rotate the Y up to Z up, and reset starting pose hip the original point.

lucasjinreal commented 2 years ago

@Garfield-kh I have 2 questions unclear wanna have a disucssion.

  1. what values should I fill in the 1x16x3 matrix?
  2. Is that every elem in 16x3 is a 3-dim rotation matrix at the bone? What reference does this matrix according at?
Garfield-kh commented 2 years ago

what values should I fill in the 1x16x3 matrix?

3D xyz position for 16 joints, you can try VideoPose3D (17-joints-setting) to check how the joint-position based pose converting from world space to camera space. And you can find its dataset contains T-pose in some clip at the begining. Or you can try plot this pose `array([-0. , 0. , 0.92, -0.1 , -0.02, 0.92, -0.11, -0.02, 0.49, -0.14, 0.08, 0.07, 0.11, -0.01, 0.92, 0.13, 0.01, 0.49, 0.14, 0.1 , 0.07, -0. , 0. , 1.24, -0. , -0.03, 1.49,

  1. , -0.05, 1.61, 0.17, 0.01, 1.44, 0.49, 0.07, 1.42, 0.74, 0.09, 1.45, -0.18, -0. , 1.44, -0.5 , 0.04, 1.44, -0.75, 0.01, 1.48])`

Is that every elem in 16x3 is a 3-dim rotation matrix at the bone?

No, they are joint position xyz.

About axis-angle, joint position

axis-angle = Inversekinmatic(joint position) joint position = forwardkinematics(axis-angle) joint position : usually used in 2D-3D lifting network (e.g., SimpleBL, ST-GCN and VPose3D ). axis-angle: usually used in the BVH file, SMPL representation, RL humanoid. if you are interested in axis-angle format, says SMPL representation, you can refer to this tutorial SMPL. In RL humanoid, you may refer to SFV paper, It contains paragraph to detail the joint postion and axis-angle relationship and how it is used in RL.

lucasjinreal commented 2 years ago

@Garfield-kh thanks for your detailed info! so that the BVH file actually takes axis-angle as final stored format. Which looks reasonable, however, from the pose2bvh_deug.py the steps are:

Converter = humanoid_1205_skeleton.SkeletonConverter()
    prediction3dpoint = Converter.convert_to_21joint(prediction3dpoint)

    human36m_skeleton = humanoid_1205_skeleton.H36mSkeleton()
    human36m_skeleton.poses2bvh(prediction3dpoint, output_file=bvhfileName)

which takes prediction as input, it should in format like joint positions in xyz.

and inside it, there is a func:

def poses2bvh(self, poses_3d, header=None, output_file=None):
        if not header:
            header = self.get_bvh_header(poses_3d)

        channels = []
        for frame, pose in enumerate(poses_3d):
            channels.append(self.pose2euler(pose, header))

the main function is:

def pose2euler(self, pose, header):
        channel = []
        quats = {}
        # quatsV1 = {}
        eulers = {}
        stack = [header.root]

        # check is hand in singularity.
        index = self.keypoint2index
        LeftForeArm_angle = math3d.anglefrom3points(pose[index['LeftArm']], pose[index['LeftForeArm']],
                                                    pose[index['LeftHand']])
        LeftForeArm_straight = np.abs(LeftForeArm_angle - 180) < 10
        RightForeArm_angle = math3d.anglefrom3points(pose[index['RightArm']], pose[index['RightForeArm']],
                                                     pose[index['RightHand']])
        RightForeArm_straight = np.abs(RightForeArm_angle - 180) < 10

        while stack:
            node = stack.pop()
            joint = node.name
            joint_idx = self.keypoint2index[joint]

            if node.is_root:
                channel.extend(pose[joint_idx])

            index = self.keypoint2index
            order = None
            if joint == 'Hips':

So i suppose this is what InverseKinmatic actually does? Then what does forwardkinematics defined if we convert aa back to joint?

It seems VideoPose3D generated pose can automatically change bvh pose to the center in blender, but I can not see any further steps it does inside code, where does it did this?

Garfield-kh commented 2 years ago

So i suppose this is what InverseKinmatic actually does?

Yes, right.

Then what does forwardkinematics defined if we convert aa back to joint?

@bingbing-li Can you help provide a weblink describe the forwardkinematicsfor humanoid?

It seems VideoPose3D generated pose can automatically change bvh pose to the center in blender, but I can not see any further steps it does inside code, where does it did this?

I think what you mean here is because the prediction (root relative pose, root trajectory), if the root trajectory (hip posistion) is not predict (says always 0), then the it will be in center of blender.

lucasjinreal commented 2 years ago

@Garfield-kh I am using raw prediction from Posetriplet, with trajectory to be True by default. Without trajectory is not make sense in realworld application I think, because if no center positon of current person, then it will not have global positions. And the action will looks like very strange IMO. Correct me if am wrong.

However, I currently want to achieve: Make the trajectory predicted start point back to 0, I using the T-Pose as you suggested, and I have the prediction like this:

[[[-1.9068520e-01 -3.9178967e-02  4.5949988e+00]
  [-2.5305325e-01 -2.9162843e-02  4.4994464e+00]
  [-1.8069524e-01  4.3677863e-01  4.4576030e+00]
  ...
  [-2.8469834e-01 -4.5449731e-01  4.6212463e+00]
  [-2.3282693e-01 -2.9664233e-01  4.3486352e+00]
  [-1.2895647e-01 -4.1007891e-01  4.4302406e+00]]

 [[-1.9192021e-01 -3.9367903e-02  4.6015420e+00]
  [-2.5498700e-01 -2.9555742e-02  4.5096269e+00]
  [-1.8597291e-01  4.3562496e-01  4.4672227e+00]
  ...
  [-2.9103380e-01 -4.5551348e-01  4.6341572e+00]
  [-2.4870943e-01 -2.8300324e-01  4.3651376e+00]
  [-1.3434877e-01 -3.8451532e-01  4.4105206e+00]]

 [[-1.9411772e-01 -4.0184531e-02  4.6097612e+00]
  [-2.5815964e-01 -3.0703645e-02  4.5215812e+00]
  [-1.9152476e-01  4.3327647e-01  4.4791899e+00]
  ...
  [-2.9778859e-01 -4.5574141e-01  4.6444201e+00]
  [-2.6997945e-01 -2.6711115e-01  4.3868499e+00]
  [-1.4192513e-01 -3.5707697e-01  4.3994899e+00]]

 ...

 [[ 7.8537606e-02 -1.9512240e-02  4.7203770e+00]
  [ 6.7087561e-03 -3.5564065e-02  4.6529732e+00]
  [-1.2862757e-01  3.8110998e-01  4.6106095e+00]
  ...
  [ 2.5051706e-02 -5.1567096e-01  4.7451878e+00]
  [-1.8966964e-01 -5.3678060e-01  4.4888573e+00]
  [ 1.9376837e-02 -5.9469616e-01  4.4149585e+00]]

 [[ 8.0162838e-02 -1.7930716e-02  4.7151556e+00]
  [ 6.4739212e-03 -3.4628361e-02  4.6456432e+00]
  [-1.3561368e-01  3.8102677e-01  4.5988793e+00]
  ...
  [ 2.2833768e-02 -5.1289767e-01  4.7348070e+00]
  [-1.9386707e-01 -5.2635908e-01  4.4787607e+00]
  [ 1.1895433e-02 -5.8982205e-01  4.4090815e+00]]

 [[ 8.0274649e-02 -1.3761428e-02  4.7100964e+00]
  [ 1.9843355e-03 -3.0324388e-02  4.6375494e+00]
  [-1.4260200e-01  3.8468662e-01  4.5899444e+00]
  ...
  [ 2.1430891e-02 -5.0615418e-01  4.7255592e+00]
  [-2.0255995e-01 -5.1654124e-01  4.4720707e+00]
  [ 3.2196417e-03 -5.8053029e-01  4.4047399e+00]]]

what should I do next here? Should I just minus the T-Pose of all frames poses?

lucasjinreal commented 2 years ago

image

I applied some directly minus it now looks at center, but motion looks not right:

# make first pose to be zero
delta = tpose - pose3d_world[1, :]
pose3d_world += delta
Garfield-kh commented 2 years ago

Can you try a simple case? says only center the postion without rotation:


    Converter = humanoid_1205_skeleton.SkeletonConverter()

    # add here
    starting_point = prediction3dpoint[:1,:, :] * 1.0
    prediction3dpoint = prediction3dpoint - starting_point

    prediction3dpoint = Converter.convert_to_21joint(prediction3dpoint)

    human36m_skeleton = humanoid_1205_skeleton.H36mSkeleton()
    human36m_skeleton.poses2bvh(prediction3dpoint, output_file=bvhfileName)
atodniAr commented 2 years ago

Let me join the discussion. I tried your inference code on a dance video, the 3D points result LGTM.

I thereafter tried the code you mentioned above to convert 3D joint location to bvh file, the result quality is however not as good:

  1. The bone length doesn't seem correct, in my case the character's shoulder is much wider than it should be.
  2. The torso movement is rigid and unnatural, which I can tell why given your implementation in convert_to_21joint.
  3. Details of animation is not identical between 3D joint location results and bvh file.

the RL requires axis-angle for each joint

In your RL set up did you use 21 joints axis-angle or 16 joints? I assume it's the later one, otherwise it seems to me your spine, spine1, spine3 will all have no rotation information since you hard coded a collinear relationship between spine, spine1, spine2 and spine2, spine3 and neck.

I also checked the implementation in pose2euler, seems like no IK of kinematic chain is used, rotation is calculated joint by joint. I'm having the feeling that replace your hard code in convert_to_21joint and inference spine, spine1, spine3, leftshoulder and rightshoulder rotation with chain IK would make probably better bvh output, and will maybe try to implement that later.

Garfield-kh commented 2 years ago

Hi, @atodniAr, welcome~

  1. The bone length doesn't seem correct, in my case the character's shoulder is much wider than it should be.

Yes, in the BVH file, the RightShoulder-Spine3-LeftShoulder are placed in a horental plane, this caused the shoulder wider than it should be. I correct it in my RL/XML file (which is also provided in the link), I placed the RightShoulder/LeftShoulder back to the Neck position in XML file. If you want the BVH looks better, you can modify the humanoid_1205_skeleton.py/self.initial_directions/LeftShoulder to be [1, 6, 0], this should result in normal shoulder width.

  1. The torso movement is rigid and unnatural, which I can tell why given your implementation in convert_to_21joint.

The reason why I add more joints here is to match the joint number of estimator, imitator, and hallucinator. It's a good point to make them the same as well as the 2D detector so that no information mismatch during joint-training.

  1. Details of animation is not identical between 3D joint location results and bvh file.

As long as the reference motion (e.g., bvh) can drive the imitator to generate plausible motions, the projection pairs can help improve the estimator in return.

I also checked the implementation in pose2euler, seems like no IK of kinematic chain is used, rotation is calculated joint by joint. I'm having the feeling that replace your hard code in convert_to_21joint and inference spine, spine1, spine3, leftshoulder and rightshoulder rotation with chain IK would make probably better bvh output, and will maybe try to implement that later.

I do the hard code in spineand shoulder for simplicity, and the RL is runing fine with 21 joints. You can try it here. I think it might be better to try (Point 2) making joints definition to be the same in the 2D detector, 3D estimator, imitator, and hallucinator. (The joints definition are slightly different in different datasets and deep models)

atodniAr commented 2 years ago

I correct it in my RL/XML file (which is also provided in the link), I placed the RightShoulder/LeftShoulder back to the Neck position in XML file.

I'm a bit confused with the usage of these XML files. Are they used in some RL framework, or is there any documentation on the protocol? Thanks 😃

to be [1, 6, 0], this should result in normal shoulder width

Yes, but then your shoulder bone would be far behind your torso.

The reason why I add more joints here is to match the joint number of estimator, imitator, and hallucinator. It's a good point to make them the same as well as the 2D detector so that no information mismatch during joint-training.

I do the hard code in spineand shoulder for simplicity, and the RL is runing fine with 21 joints. You can try it here.

I see this. Since the code for RL part is not yet commited, I'm interested in how it works. I.e., after you mapped 16 joints result to 21 joints, would the inner force and rotation in spine1, spine2 also be model in RL model? Then I assume we should further use your imitator to refine bvh file? Do you have any schedule to release it?

Garfield-kh commented 2 years ago

Do you have any schedule to release it?

I am doing some code optimization (e.g., joint definition mentioned above) currently, and reach a better result compared with the paper reported. I am double-checking it. I think I will release it after checking and data cleaning. I am trying on real-time implementation (i.e., camera-2D-3D-IK-RL), and multi-person as an extension. I want to make it work and release them together, so it might be this month or next month. Here is an example image: image

I'm a bit confused with the usage of these XML files. Are they used in some RL framework, or is there any documentation on the protocol?

The RL environment is opensource Mujoco, and the code for imitation is a mixture of EgoPose and RFC. As the paper limitation mentioned, the cpu-based RL is not time efficient, here the gpu-based RL isaacgym-nvidia is now avalible and might help.

Yes, but then your shoulder bone would be far behind your torso.

umm, It can be corrected to be like this with the proper init_direction: image

atodniAr commented 2 years ago

Thanks for detailed explaination. Looking forward to the imitator release.

I am trying on real-time implementation (i.e., camera-2D-3D-IK-RL), and multi-person as an extension. I want to make it work and release them together, so it might be this month or next month.

We have a small team working on the exactly same real-time pipeline (including algorithm engineer and deep learning devops) so I'm thinking about if we can help in this. I'll dm you if you are interested.

The RL environment is opensource Mujoco, and the code for imitation is a mixture of EgoPose and RFC. As the paper limitation mentioned, the cpu-based RL is not time efficient, here the gpu-based RL isaacgym-nvidia is now avalible and might help.

I'll look into these. Thanks!

Garfield-kh commented 2 years ago

@atodniAr Yes, I am interested. :P

bingbing-li commented 2 years ago

Then what does forwardkinematics defined if we convert aa back to joint? Hi, i dont quite understand what do you want, could you describe it more clearly?

bingbing-li commented 2 years ago

I also checked the implementation in pose2euler, seems like no IK of kinematic chain is used, rotation is calculated joint by joint. I'm having the feeling that replace your hard code in convert_to_21joint and inference spine, spine1, spine3, leftshoulder and rightshoulder rotation with chain IK would make probably better bvh output, and will maybe try to implement that later.

Hi, @atodniAr, there is no straightforward way to find an analytical soltution for high DOF system, we are still trying to find one. As far as what we can find right now, even for 6 DoF robot arm, the joint angles are found one by one, as the last three are highly couple with the first three.

atodniAr commented 2 years ago

@atodniAr Yes, I am interested. :P

Nice. I just realize there is no dm feature in github :D

You use wechat? Add me @plpopk please

atodniAr commented 2 years ago

the joint angles are found one by one

That's true. In computer animation software, you can drag and drop left hand joint location without taking care of arm location, they have humanIK function embedded, which will try to find a "least work" path to acheive end site movement. That's what we say IK of kinematic chain does. Check out FABRIK.

I mentioned this as pose estimation only gives location for 16 joints, I guess chain IK would make better guess for spine and shoulder joints. Not sure how much it can help though, as seems your imitator can do a lot of work correcting poses.

bingbing-li commented 2 years ago

That's true. In computer animation software, you can drag and drop left hand joint location without taking care of arm location, they have humanIK function embedded, which will try to find a "least work" path to acheive end site movement. That's what we say IK of kinematic chain does. Check out FABRIK.

I mentioned this as pose estimation only gives location for 16 joints, I guess chain IK would make better guess for spine and shoulder joints. Not sure how much it can help though, as seems your imitator can do a lot of work correcting poses.

Got you. We didn't implement that in this work. we would like to try if we have time. Thanks for the suggestion.

atodniAr commented 2 years ago

we would like to try if we have time

Looking forward to your future results :D

opentld commented 2 years ago

image do you mean this ? :) @atodniAr