PKU-EPIC / HOTrack

[AAAI 2023] Tracking and Reconstructing Hand Object Interactions from Point Cloud Sequences in the Wild
54 stars 5 forks source link

Reproduce the Hand-pose Tracking Result #2

Closed dongho-Han closed 11 months ago

dongho-Han commented 11 months ago

Hello.

I'm trying to reproduce the evaluation result from the paper's Table 1, to see the PFO with PST optimization effect. In the paper, you mentioned the three metrics MPJPE, PD, and DD for 'Hand' pose's accuracy.

But, when I try to reproduce the result with this operation, CUDA_VISIBLE_DEVICES=0 python network/test.py --config handopt_test_HO3D.yml --num_worker 0

I think this function is used to calculate the error. https://github.com/PKU-EPIC/HOTrack/blob/0b0120122c2b1ff549db9a7ae22c1ba9a5cb64ef/network/models/hand_network.py#L159.

However, this function returns these 10 values, not the exact three metrics.

hand_pred_kp_loss, hand_pred_kp_diff, hand_init_kp_diff, hand_pred_r_loss, hand_pred_t_loss, hand_pred_r_diff, hand_pred_t_diff, hand_canon_r_diff, hand_canon_t_diff, MANO_theta_diff

Can you share how to calculate the three metrics to reproduce the evaluation part?

Thanks in advance :)

dongho-Han commented 11 months ago

For the MPJPE, I think this is closely related to the loss_dict['hand_pred_kp_diff']:

https://github.com/PKU-EPIC/HOTrack/blob/0b0120122c2b1ff549db9a7ae22c1ba9a5cb64ef/network/models/hand_network.py#L188C10-L188C10

But when I run the code, (1) the scale is quite different from the Table 1's MPJPE score. w/o hand-obj opt: 0.0121 w hand-obj opt: 0.013 I think the unit(Table 1: [cm], Code: ?) will be the problem, but it is my own guess. (2) Also, using optimization increases the error.

Currently, i couldn't find any PD(Penetration Depth) and DD(Disjointedness Distance) related calculation. Is there anything I missed out?

JYChen18 commented 11 months ago

You are right that MPJPE is close to hand_pred_kp_diff. The unit is 'm' in code. I think the number (around 1.2cm) produced by you is correct. It is different from the paper because in the paper we report the average number on different categories, while the default config only tests on "bottle".

To test all categories, you need to add other categories to the list in this line. You can find the name of remaining categories in Table 2 of our paper.

Using optimization does often increase the kp error, but optimization often helps in PD and DD. It requires careful hyper-parameter tuning to keep both metrics low. It is also different across different categories. For the calculation of PD and DD, we use a similar code to the previous work, so it is not included in this repository.

dongho-Han commented 11 months ago

Thanks for the answer. Can you share which previous work did you use? Because only your models' PD and DD evaluation results are written in the paper, I want to evaluate with the same code to reproduce for further work. Thanks!

JYChen18 commented 11 months ago

These metrics come from CPF. You may try their evaluation code. The results may be a little bit different from our reported results due to the implementation details. As long as the results are not so different, I think you can use them for further work, because the comparison under the same code is fair enough.

dongho-Han commented 11 months ago

Thanks for the information. I will try to evaluate with those.