akanazawa / hmr

Project page for End-to-end Recovery of Human Shape and Pose
Other
1.55k stars 391 forks source link

Is the ground truth mosh data of h36m incorrect? #50

Closed zycliao closed 5 years ago

zycliao commented 5 years ago

I tried the following code and the visualization results show that the global rotation of the ground truth mosh data doesn't correspond to the image. Did I get something wrong or the data is incorrect?

flength = 1000.
renderer = SMPLRenderer(img_size=224, flength=flength)
smpl_model = SMPL('neutral_smpl_with_cocoplus_reg.pkl')

fqueue = tf.train.string_input_producer(
    ['/data/tf_datasets/tf_records_human36m_wjoints/train/h36m_train_mixed_0000.tfrecord'])
reader = tf.TFRecordReader()
_, example_serialized = reader.read(fqueue)
image_, image_size_, label_, center_, fname_, pose_, shape_, gt3d_, has_smpl3d_ = data_utils.parse_example_proto(
    example_serialized, has_3d=True)

pose_ph = tf.placeholder(tf.float32, [None, 72])
shape_ph = tf.placeholder(tf.float32, [None, 10])
verts_, joints_, Rs_ = smpl_model(shape_ph, pose_ph, True)
init = tf.global_variables_initializer()
sess = tf.train.MonitoredTrainingSession()
sess.run(init)

while 1:
    image, image_size, label, center, fname, pose, shape, gt3d, has_smpl3d = sess.run(
        [image_, image_size_, label_, center_, fname_, pose_, shape_, gt3d_, has_smpl3d_])
    verts = sess.run(verts_, feed_dict={pose_ph: np.expand_dims(pose, 0), shape_ph: np.expand_dims(shape, 0)})
    vert = verts[0]
    vert_shift = np.array([[0., 0., flength / 112.]])
    vert = vert + vert_shift
    rendered_img = renderer(vert, do_alpha=False)
    cv2.imshow('a', rendered_img)
    cv2.imshow('b', cv2.cvtColor((image*255).astype(np.uint8), cv2.COLOR_RGB2BGR))
    cv2.waitKey()

screenshot from 2018-11-06 16 47 11 screenshot from 2018-11-06 16 47 31

zycliao commented 5 years ago

Sorry, I didn't notice the rotate_base option in def batch_global_rigid_transformation.

MatthewD1993 commented 5 years ago

@zycliao but their released model file seems trained with rotation_base flag OFF, and obviously, the ground truth can only be used when the flag is ON. So I wonder if they really use H3.6M SMPL parameter ground truth or not! Could you share more info if you know how they train the model?

MatthewD1993 commented 5 years ago

@akanazawa Do you set the _rotatebase flag on when you train the model with all the datasets? But you trained model only predict 3d joints with correct orientation when _rotatebase is False. Thanks so much for resolving my puzzle.

akanazawa commented 5 years ago

Hi,

Sorry for the confusion, the global rotation of Mosh is not aligned to the image, it's whatever coordinate frame the raw mocap markers were in (not synchronized with image coordinate frame). It could be some simple transformation but you'd have to use the camera to adjust for each viewpoint. Therefore in this project I do not use the global rotation of Mosh at all. This is also because I didn't want to put an adversarial prior on the viewpoint / global rotation of the person.

the rotate_base flag should be off in all experiments, it's an artifact from something else I did long time ago :p

Hope this clears things up.

Best,

Angjoo

akanazawa commented 5 years ago

Sorry! I thought you were referring to the moshed files that I use for the prior. Those I don't think the global rotation is not aligned to the image.

But I get now that you're talking about H36M tfrecords. These, I computed the correct global rotation using the ground truth 3D joints so they should align. But there is something wrong in the data, as you noticed it is up-side down. You can use the function below to correct the pose. So this is very confusing, but this was actually the data I trained my models. After all the experiments and making the code public I realized that the gt3d is upside down (due to the way we were processing the data previously, hence that rotate_base flag, etc).

I have retried training with this correct ground truth pose, but there was only trivial change in performance, so I haven't updated the dataset yet.. I should. I'd welcome any help documenting this down and adding it to README.

Anyway sorry for your troubles. Here's the code to rectify them:

def rectify_pose(pose):
    """
    Rectify "upside down" people in global coord

    Args:
        pose (72,): Pose.

    Returns:
        Rotated pose.
    """
    pose = pose.copy()
    R_mod = cv2.Rodrigues(np.array([np.pi, 0, 0]))[0]
    R_root = cv2.Rodrigues(pose[:3])[0]
    new_root = R_root.dot(R_mod)
    pose[:3] = cv2.Rodrigues(new_root)[0].reshape(3)
    return pose
MatthewD1993 commented 5 years ago

Thank you very much! I am impressed by your quick reply! I will see what I can do to the README file. Wish you a good day!

On Tue, Jan 22, 2019 at 7:36 PM akanazawa notifications@github.com wrote:

Sorry! I thought you were referring to the moshed files that I use for the prior. Those I don't think the global rotation is not aligned to the image.

But I get now that you're talking about H36M tfrecords. These, I computed the correct global rotation using the ground truth 3D joints so they should align. But there is something wrong in the data, as you noticed it is up-side down. You can use the function below to correct the pose. Sorry this is very confusing, but this was actually the data I trained my models. After all the experiments and making the code public I realized that the gt3d is upside down (due to the way we were processing the data previously, hence that rotate_base flag, etc).

I retried training with this correct ground truth pose, but there was only trivial change in performance, so I haven't updated the dataset yet.. I should. I'd welcome any help documenting this down and adding it to README.

Anyway sorry for your troubles. Here's the code to rectify them:

def rectify_pose(pose): """ Rectify "upside down" people in global coord

Args:
    pose (72,): Pose.

Returns:
    Rotated pose.
"""
pose = pose.copy()
R_mod = cv2.Rodrigues(np.array([np.pi, 0, 0]))[0]
R_root = cv2.Rodrigues(pose[:3])[0]
new_root = R_root.dot(R_mod)
pose[:3] = cv2.Rodrigues(new_root)[0].reshape(3)
return pose

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/akanazawa/hmr/issues/50#issuecomment-456512570, or mute the thread https://github.com/notifications/unsubscribe-auth/AOJqU8Up-AIEzHujZPsGAKX1GPdzJegkks5vF1owgaJpZM4YP_9c .

akanazawa commented 5 years ago

Np :)! Thanks for your interest!!

NewCoderQ commented 2 years ago

Hi @akanazawa , I have access to Human3.6M moshed dataset, what is the different processing between .pkl and the _camx_aligned.pkl, or could you tell me where I can find the process scripts? I want to get the continuous smpl mesh.

akanazawa commented 2 years ago

im not exactly sure, but probably the aligned one is in the camera coordinate space of the said camera. you probably just want to see the *.pkl. There should be a dict with something like pose from which you can pose the SMPL model to get the mesh.

NewCoderQ commented 2 years ago

@akanazawa Thanks for your quick reply! I checked the moshed data, the _camx_aligned.pkl was downsampled with 5, and the .pkl is continuous. The only difference between _camx_aligned.pkl and .pkl is the root orientation. I want to do the alignment from .pkl data to _camx_aligned.pkl, but i'm not sure how to do the transformation. Could you please give me some advice?