Closed zhyever closed 2 years ago
Great question! We tried to directly supervise the similarity matrix before, but it didn't perform well. The main reason is that the Sinkhorn layer extracts global information from the similarities of all keypoints. If you directly adapt the E to supervise the similarity matrix, one keypoint doesn't know its similarity with the other keypoints.
Thanks for your quick reply. It makes sense. I guess this reason before raising this issue, but seeing " It is worth mentioning that Context Normalization extracts the global information of all edges. (below Eq.9)" in your paper, I think that it could be solved with the CN layer. Thanks for your explanation! :D
Thanks for your great work.
I have a question about the DWGM. I guess that the 2D-3D keypoints are paired, so why do we need to adopt the Sinkhorn to match pairs? How about directly adopting the E to supervise the similarity matrix?