Another question about the difference between paper and the result i got

NVlabs / DREAM

DREAM: Deep Robot-to-Camera Extrinsics for Articulated Manipulators (ICRA 2020)

Other

149 stars 33 forks source link

Another question about the difference between paper and the result i got #20

Closed Jerrrrry-Zhu closed 2 years ago

Jerrrrry-Zhu commented 2 years ago

hello,I have another question and look forward to your generous answer. When I run the pre-trained model to inference ”azure“ dataset ,I got the right answer as the paper

pck_auc = 0.74406,add_auc = 0.69802

1661436547143 1661436617318 But when i run the same model to inference "test dr" dataset,the pck_auc is normal,but as for the add_auc,I got some problem,why my add_auc is very low?

pck_auc = 0.84631,add_auc = 0.00021

1661436711823 1661436778844

tabula-rosa commented 2 years ago

Hello, and thanks for your continued interest in DREAM!

This issue arises because the synthetic datasets have different units -- their data is stored in cm, whereas the real camera datasets use m.

To fix this, replace this line:

pnp_results = pnp_metrics(pnp_add, all_n_inframe_projs_gt)

with this code:

pnp_add_divided = np.array(pnp_add)/100.
pnp_results = pnp_metrics(pnp_add_divided, all_n_inframe_projs_gt, pnp_magic_number=-9.99)

Then, you can run this on one of the synthetic datasets:

python scripts/network_inference_dataset.py -i trained_models/panda_dream_vgg_q.pth -d data/synthetic/panda_synth_test_dr/ -o temp/panda_dream_vgg_q_synth_panda_test_dr_snc -p shrink-and-crop

In this example, I obtain a PCK AUC of 0.77669 and an ADD AUC of 0.78796, both of which match the paper.

I hope that helps!

Jerrrrry-Zhu commented 2 years ago

Thank you for your answer,and i am also confused about the add_from_pose funtion . 1662009161993 the kp_pos_gt_homog is stack from the keypoint_positions_wrt_cam_gt.

the transform matrix is solved by the pnp,which can transform the keypoint_positions_wrt_world to the keypoint_positions_wrt_camera.

why kp_pos_aligned is the result of transform matrix multiply the keypoint_positions_wrt_cam rather than the keypoint_positions_wrt_world?

Addtionaly,if the kp_pos_aligned is the reuslt of transform matrix multiply the keypoint_positions_wrt_cam_gt,when it comes to calculate the error,why is kp_pos_aligned - keypoint_positions_wrt_cam_gt ？In theory, only if my transformation matrix is an eye matrix, the error calculated in this way will be 0.

Look forward to your generous answer!

tabula-rosa commented 2 years ago

This is because the variables translation and quaternion are from the output of solve_pnp (for example, here). The way the code is structured is that if ground truth values are provided to solve_pnp, then translation and quaternion would represent the identity transform. This is because the output of solve_pnp is the transformation that aligns these reference frames --- it returns the pose difference with respect to the camera. The variable names translation and quaternion in add_from_pose are perhaps slightly misleading, and maybe best would be named delta_translation and delta_quaternion.

Additionally, the discussion in Issue #11 may provide more insight as to our implementation of the ADD metric.

I hope that helps!

Jerrrrry-Zhu commented 2 years ago

Got it,that is really an excellent operation,Thank for your reply!