Open chullhwan-song opened 5 years ago
This happens without any network modification, additional layers or training.
No local feature detection happens on the original image.
No local feature descriptors and no visual vocabulary are needed throughout the whole process.
http://openaccess.thecvf.com/content_CVPR_2019/papers/Simeoni_Local_Features_and_Visual_Words_Emerge_in_Activations_CVPR_2019_paper.pdf