Closed chenshunpeng closed 1 month ago
Hello, thanks for visiting my code and commenting.
Sure, based on your suggestion, I performed inference on the pitts30k dataset using the CLIP-ViT-B-16 and CLIP-ViT-B-32 weights, with the results as follows:
< test - #q: 6816; #db: 10000 >: R@1: 89.1, R@5: 95.1, R@10: 96.5, R@20: 97.5
< test - #q: 6816; #db: 10000 >: R@1: 90.6, R@5: 96.1, R@10: 97.3, R@20: 98.3
If there are any new weights available, I hope you can update them promptly as it would be beneficial to the community. I will also attempt to reproduce the results myself. Thank you for your response.
All right, I will check the weight problem if i have spare time.
Hello, This is a very valuable work that opens the door to solving Visual Geo-localization using multimodal models. However, I encountered some issues while trying to perform inference using the weights provided in your Baidu Netdisk. First, when I used the CLIP-ResNet50 and CLIP-ResNet101 weights, I couldn't reproduce the results for the pitts30k dataset as mentioned in the paper (ProGeo(CNN)). On the other hand, I encountered errors when trying to perform inference with the CLIP-ViT-B-16 and CLIP-ViT-B-32 weights. Could it be that some of my parameters are incorrect, or do I need to modify additional parameters in the parser? I hope you can help me correct this. Thank you very much.
Results of ProGeo(CNN) from the paper:
R@1: 91.8, R@5: 97.4
Parameters of RN101_best_model:
--backbone CLIP-RN101 --resume_model /work/ccc/project/VPR2/ProGEO/model/RN101_best_model --test_set_folder /work/ccc/datasets/pitts30k/images/test --fc_output_dim 512
Complete parameters:
Reproduced results:
2024-09-02 14:33:37 < test - #q: 6816; #db: 10000 >: R@1: 90.8, R@5: 96.0, R@10: 96.8, R@20: 97.5
Parameters of RN50_best_model:
--backbone CLIP-RN50 --resume_model /work/ccc/project/VPR2/ProGEO/model/RN50_best_model --test_set_folder /work/ccc/datasets/pitts30k/images/test --fc_output_dim 1024
Complete parameters:
Reproduced results:
2024-09-02 14:58:55 < test - #q: 6816; #db: 10000 >: R@1: 90.0, R@5: 95.4, R@10: 96.4, R@20: 97.2
Results of ProGeo(Transformer) from the paper:
R@1: 93.0, R@5: 98.3
Parameters of VIT32_best_model:
--backbone CLIP-ViT-B-32 --resume_model /work/ccc/project/VPR2/ProGEO/model/VIT32_best_model --test_set_folder /work/ccc/datasets/pitts30k/images/test
Complete parameters:
Error message:
Parameters of ViT16_best_model:
--resume_model /work/ccc/project/VPR2/ProGEO/model/ViT16_best_model --test_set_folder /work/ccc/datasets/pitts30k/images/test --fc_output_dim 512
Complete parameters:
Error message: