Shimingyi / MotioNet

A deep neural network that directly reconstructs the motion of a 3D human skeleton from monocular video [ToG 2020]
https://rubbly.cn/publications/motioNet/
BSD 2-Clause "Simplified" License
554 stars 82 forks source link

fair comparison of bone length estimation as show in fig. 9 #17

Closed longbowzhang closed 3 years ago

longbowzhang commented 3 years ago

Hi @Shimingyi ,

I have a question about the comparison of bone length estimation as show in Fig. 9. I suppose you use the GT scale to rescale the estimated bone lengths. However, if I understand correctly, methods such as Pavllo[CVPR19] does not employ any GT information to calculate the bone lengths. Therefore, I am wondering whether this comparison is fair or not? Additionally, what is the unit of the y-axis in Fig. 9?

Thanks a lot in advance. Best.

Shimingyi commented 3 years ago

Hi @longbowzhang ,

Thanks for your question. Because we assume it's no possible to estimate the absolute value of bone length in a 2d to 3d task, so in our methods, we just predict the skeleton proportion.

In the comparison of bone length reconstruction, we always applied same operation for each methods and ground truth. After we have a length_set, we will normalize it by bone_lengths /= bone_lengths[:, [1]] which means convert the lengths to relative number compared to leg length. And then compare these normalized length(proportion). In this step, we don't use any ground truth to rescale something.

Best, Mingyi

wbhu commented 3 years ago

Hi @Shimingyi,

Another question regarding the relative bone length issue. I strongly agree that it's not possible to estimate the absolute value of bones length in a 2d to 3d task. But as you only predict the relative bones length in network E_S, then how could you get the absolute joint position (xyz) via the estimated joints rotation and predicted relative bones length? How to compute the results of the last rows in Tab.1.

Best, Wenbo

Shimingyi commented 3 years ago

@wbhu We will use another factor called alpha which is the average of bone length to represent the 'scaling'.

In evaluation time, firstly we will scale the pose_3d_gt and calculate the error in scaled wise, then recover the error to actual wise by the scaling factor for making it comparable with other methods.

error = alpha*(gt_3d/alpha - pre_3d)
# which is equal to
error = gt_3d - alpha*pre_3d

Related code: Link

Best, Mingyi

wbhu commented 3 years ago

@Shimingyi Thanks for your quick response. The alpha is the scale factor to rescale the average GT bone length to be 1, right?

Shimingyi commented 3 years ago

Yes, we got it here.

wbhu commented 3 years ago

Got it, I have no more questions. You may help to close the issue now. Thanks very much.

Shimingyi commented 3 years ago

That's ok : ) @longbowzhang but do you have any suggestions on this evaluation? Because we use gt_scaling here, I am wondering if you will be confused here. Another absolutely fair idea, we can scale all the results to a same scaling sapce and then compare them, but it requires more works because we need to run all other methods again.

longbowzhang commented 3 years ago

Hi @Shimingyi thanks a lot for clarifying. If I understand correctly, when you evaluate in terms of MPJPE and P-MPJPE (Tabel 1&2), you use the so-called alpha scaling factor which is extracted from the GT test dataset.

Shimingyi commented 3 years ago

@longbowzhang Yes, becuase we can only get the error in a scaled space, so we need a factor to recover it to orignal sapce to make it comparable with other methods.