ai4r / Gesture-Generation-from-Trimodal-Context

Speech Gesture Generation from the Trimodal Context of Text, Audio, and Speaker Identity (SIGGRAPH Asia 2020)
Other
243 stars 35 forks source link

Mapping keypoints from Video Pose 3D #46

Closed JabuMlDev closed 1 year ago

JabuMlDev commented 1 year ago

Hi! Congratulations on your work, it's really great!

I am trying to implement this on my data. However, in preprocessing my data I have difficulty understanding the correspondence between the indices of the 17 keypoints generated by Video Pose 3D and those of the 10 used in this repo.

Would it be possible to share the mapping?

youngwoo-yoon commented 1 year ago

Hello, sorry for the late reply. I excluded lower-body parts from the 17 keypoints in VideoPose3D. This is the list of 10 joints that I used. 0: mid_section 1: neck 2: nose 3: head top 4: left_shoulder 5: left_elbow 6: left_wrist 7: right_shoulder 8: right_eblow 9: right_wrist

I could be confused about left and right for 4-9. I will check that later and get you updated in this thread.

JabuMlDev commented 1 year ago

Thanks for your reply, you have been very helpful to me!

However I still have some doubts regarding the generation of the 3D poses. I noticed that in the previous issues you shared the '3d_poses_normalized.pkl' file. Are these directly the output of Video Pose remapped with the indexes you indicated to me? Or are they the result of some other operation?

Furthermore, the mean_dir_vec and mean_pose parameters are simply the flatten of the 2D arrays of shape (10,3) containing the average of the directional vectors and joints contained in the dataset?

youngwoo-yoon commented 1 year ago

We did a normalization of the VideoPose3D results. The overall body orientation is normalized to make it look front and the scale is normalized.

the mean_dir_vec and mean_pose parameters are simply the flatten of the 2D arrays of shape (10,3) containing the average of the directional vectors and joints contained in the dataset?

Correct.

JabuMlDev commented 1 year ago

Thanks for your support!

Would you have the code of these normalizations? Or could you give me input on how to replicate them? Especially for the normalization of body orientation it would be very useful to me.

youngwoo-yoon commented 1 year ago

I did this to make the shoulder vector between -20 to 20. Not the entire code, but I believe you can get the idea.

        # rotate
        shoulder_vec = kps[i, 7] - kps[i, 4]
        angle = np.pi - np.math.atan2(shoulder_vec[2], shoulder_vec[0])  # angles on XZ plane
        if 180 > np.rad2deg(angle) > 20:
            angle = angle - np.deg2rad(20)
            rotate = True
        elif 180 < np.rad2deg(angle) < 340:
            angle = angle - np.deg2rad(340)
            rotate = True
        else:
            rotate = False

        if rotate:
            rot = rotation_matrix([0, 1, 0], angle)
            kps[i] = np.matmul(kps[i], rot)

rotation_matrix is below

    def rotation_matrix(axis, theta):
        """
        Return the rotation matrix associated with counterclockwise rotation about
        the given axis by theta radians.
        """
        axis = np.asarray(axis)
        axis = axis / math.sqrt(np.dot(axis, axis))
        a = math.cos(theta / 2.0)
        b, c, d = -axis * math.sin(theta / 2.0)
        aa, bb, cc, dd = a * a, b * b, c * c, d * d
        bc, ad, ac, ab, bd, cd = b * c, a * d, a * c, a * b, b * d, c * d
        return np.array([[aa + bb - cc - dd, 2 * (bc + ad), 2 * (bd - ac)],
                         [2 * (bc - ad), aa + cc - bb - dd, 2 * (cd + ab)],
                         [2 * (bd + ac), 2 * (cd - ab), aa + dd - bb - cc]])
JabuMlDev commented 1 year ago

This is perfect for me! Thank you so much!