ethnhe / PVN3D

Code for "PVN3D: A Deep Point-wise 3D Keypoints Hough Voting Network for 6DoF Pose Estimation", CVPR 2020
MIT License
488 stars 105 forks source link

Is comparison with other methods on LineMod fair? #43

Closed imankgoyal closed 3 years ago

imankgoyal commented 4 years ago

Hi, I really like the paper and thanks for open-sourcing the code. As explained in #3, a different model is trained for each object category on LineMod. However, I am confused whether the comparison with other methods (specifically DenseFusion and DeepIM) on LineMod is fair. This is because, other methods seem to use a single model during test time. Hence, object category needs to be inferred by the model at test time. I would very much appreciate if you can please correct my understanding. Thanks!

ethnhe commented 3 years ago

The setting on LineMod is following previous work like PVNet. In fact, Densefusion used pre-trained semantic segmentation models to get the target object region and category labels in both LineMod and YCB datasets, which is not inferred by the DenseFusion models. While on the YCB dataset, our single model gets instance semantic segmentation and object poses of the whole scene.

imankgoyal commented 3 years ago

Hi,

Thanks for the reply. I agree that the comparison is fair with PVNet. However, I am still confused as to how the comparison is fair with DenseFusion on LineMod. As you mentioned, DenseFusion uses pre-trained segmentation models on LineMod. Hence, during test time, the model can make mistakes (like wrong object class, unfit bounding box etc) which would affect the overall performance. However, in PVN3D this would not happen as the model has knows the ground-truth category during test time. I feel this would the comparison unfair. What are your thoughts on this?

ethnhe commented 3 years ago

In fact, the segmentation result, especially the category label DenseFusion used on the LineMod dataset is good enough. You can also try to replace the category label to ground-truth label and inference DenseFusion to see the result, the difference is small. I mentioned YCB because I think YCB provides more reference value on cases that require a single model to handle multiple objects in a scene.

imankgoyal commented 3 years ago

Thank you so much for the information.