Open LastEgg opened 2 months ago
This is achieved by computing the similarity between the embedding of all the patches of the two images and find the highest one for each patch. The patches with the highest attention score in the vit backbone is then selected for this figure. Hope this can help you get the figure.
Hello,
I am very fortunate to have read your paper. It has greatly inspired me in my experiments. I would like to know how you achieved the visualization of local feature point matching in your paper. I hope to receive your source code.