XuyangBai / D3Feat

[TensorFlow] Official implementation of CVPR'20 oral paper - D3Feat: Joint Learning of Dense Detection and Description of 3D Local Features https://arxiv.org/abs/2003.03164
MIT License
261 stars 38 forks source link

USIP results in Figure 3 #4

Closed rui2016 closed 3 years ago

rui2016 commented 4 years ago

Hi, can you explain the performance difference of USIP on KITTI in Fig 3 of your paper (15%-20%) vs. that in Fig 4 of the USIP paper (30%-60%)? The only hint I can find in your paper is

since USIP and D3Feat use different processing and splitting strategies and USIP requires surface normal and curvature as input, the results are not directly comparable.

However, to me this doesn't fully answer the question. Also as I remember (not 100% sure), USIP doesn't need normal and curvature.

Besides, probably as a drawback of their approach, the performance of USIP is sensible to the two parameters M (number of samples) and K (number of nearest neighbors). I assume this is especially the case if you take their pre-trained model and apply it to another dataset like 3DMatch. Did you try to select proper M, K values for a fair comparison in the first plot of Fig 3?

XuyangBai commented 4 years ago

Hi @rui2016 Thanks for your interest. I think USIP did use the surface normal and curvature for KITTI (You can see here or here). As you say, the two hyperparameters M and K are critical for the performance of USIP, but I use the pre-trained weight on Oxford released by the author (which I get similar result of USIP on their code) and test under my setting, so the value of M and K should not be the reason in this case.

Let me explain the detail on USIP and D3Feat use different processing and splitting strategies. During our experiments, we find the group truth poses of KITTI to be noisy so we first use the ICP to refine the pose. After this refinement process, the testing pairs and their poses are different from USIP. Also we use sequence 0-5 for training, 7-8 for validation and 9-11 for testing while USIP are tested on all 11 sequences. The result of USIP in my paper is got by running USIP model on our dataset, that's why I said the results are not directly comparable. And I also find in my experiment, the result of USIP on KITTI is quite weird as the repeatability on #4 keypts is 15% already, but it doesn't increase as the keypoints number becomes large. I have also tried different values of NMS radius, so it's hard for me to explain it now.

rui2016 commented 4 years ago

Hi @XuyangBai, thanks for the explanation.

XuyangBai commented 4 years ago

Hi @rui2016

Thank you again for your valuable discussion.

Best, Xuyang

rui2016 commented 4 years ago

Hi Xuyang,

thanks for the detailed reply. I've got some ideas on most of my questions. The remaining one is still the inconsistent USIP results on KITTI.

So I ascribe this result to the different data D3Feat and USIP were training on (changed by ICP refinement process).

Please let me clarify that the confusion is not from the comparison between USIP and D3Feat, but rather the inconsistency between the USIP results on KITTI in your paper and those in the USIP paper. Although your test data is further aligned using ICP, I would assume such better aligned test data (together with the fact that difference numbers of sequences are used for testing) should not degrade USIP's repeatability so significantly (more than 50%). This is only my assumption, though.