bytedance / R2Former

Official repository for R2Former: Unified Retrieval and Reranking Transformer for Place Recognition
Apache License 2.0
83 stars 6 forks source link

The accuracy is not up to the accuracy in paper and Jitter during rerank #16

Open kaiyi98 opened 8 months ago

kaiyi98 commented 8 months ago

Hi! I have trained the r2former, but I can't get the same result as paper. Firstly, I run thr train_global_retrieval.sh (train_batch_size=12). I get the best R@1 over 79.

Then, I run the first command of train_reranking.sh (train_batch_size=24, rerank_batch_size=6). I get the best R@1=87.432, and the rerank recall changes as follows QQ截图20240229102739

Lastly, I run the second command of train_reranking.sh (train_batch_size=24, rerank_batch_size=6). I get the best R@1=87.297, and the rerank recall changes as follows QQ截图20240229102754 and the last model rerank R@1 is lower than best model, but the global retrieval is the opposite, as follows best model image last model image (1) Q1: Why is this happening?

Other parameters remain unchanged. I train the code on 2 RTX3090. Q2: Why The accuracy is not up to the accuracy in paper and Jitter during rerank?

Thanks!

Jeff-Zilence commented 8 months ago

This is normal if you are changing the overall batch size but keep the lr unchanged. You might tune the lr to get the same results, but we did not try using 2 GPUs. Unfortunately, the lr scaling rule does not apply to Adam optimizer and you might need to try different numbers to figure it out. You might also check the version of python packages and torch versions carefully.

The first stage model does not matter that much. You should be able to get a similar performance if you just load our pre-trained stage-1 model.

nuozimiaowu commented 7 months ago

Execuse me,during the training of R2Former,I run thr train_global_retrieval.sh, with the minime batch size, but the the thread is killed during my training, maybe because It out of the range of my CPU memory(120G),I wants to konw how much memory allows the program run, or it is other things cause the problem.

Jeff-Zilence commented 7 months ago

This has nothing to do with CPU memory. The global retrieval model does not require much memory. You can check the original repo https://github.com/gmberton/deep-visual-geo-localization-benchmark and see if you can run their training script.