How is the performance computed without 2D pose inputs?

Walter0807 / MotionBERT

[ICCV 2023] PyTorch Implementation of "MotionBERT: A Unified Perspective on Learning Human Motion Representations"

Apache License 2.0

1.06k stars 131 forks source link

How is the performance computed without 2D pose inputs? #62

Closed joonjeon closed 1 year ago

joonjeon commented 1 year ago

I was able to successfully measure quality-related metrics after setting up as shown here: https://github.com/Walter0807/MotionBERT/blob/main/docs/pose3d.md#running

However, this gives me one question: If MotionBERT requires off-the-self 2D pose estimation results before deploying DSTFormer for 2D-to-3D lifting, how is the procedure able to compute quality-related metrics of MotionBERT even if it does not explicitly intake 2D pose inputs?

Walter0807 commented 1 year ago

Hi, we (as well as many other works) use 2D pose estimation results for 3D pose estimation. If you want to estimate 3D pose from images directly, you can check the following works:

RammusLeo commented 1 year ago

Hi! @Walter0807 Thanks for your great work and your detailed explanation above! But I'm still confused about the evaluation results of hybrid approaches in Table.3, such as "MotionBERT"+"SPIN", "MAED" or "HybrIK". The input format of these settings are videos to require shape infomation, while the ground truth of 2D keypoints in test set are still need in pose estimation, isn't it?