Closed BinuxLiu closed 4 months ago
Hi @BinuxLiu,
In my experience, the training time is not something that is most often taken into consideration. I always try to avoid aggressive learning rate at the beginning of training so that the network doesn't get stuck into local minima.
Here are some random insights:
self.queries
) which are initialized with normal(0, 1)
. The learning rate can have an impact on them, I would suggest you keep an eye on the initialization if you want to play with bigger learning rates (maybe use nn.init.normal_(0, 0.05)
? ).I didn't spend enough time with DinoV2 backbone (as I was only interested in ResNet50, which was the most used backbone for VPR in recent years), so I don't know what could be the best set of hyperparameters.
Have you tried training SALAD and NetVLAD with the hyperparameters I shared?
As for the results you're getting, they seem in accordance with what I get (although, I get ~0.75 R@1 at the first epoch). Here is a screenshot (these are validation results at each epoch with 280x280 images, maybe you're better validating with 322x322 to get the best possible model):
Side note: You're doing 5min per epochs? Wow!! that is at least 4x faster than my RTX8000, what GPU are you using?
2024-07-13 05:29:47 Recalls on val set < BaseDataset, msls - #database: 18871; #queries: 740 >: R@1: 93.24, R@5: 96.62, R@10: 96.76, R@100: 98.24
2024-07-13 05:29:47 Not improved: 2 / 10: best R@1 = 93.2, current R@1 = 93.2
It seems that I can probably reproduce your results. Next, I will also try to implement DINO-NetVLAD with a smaller learning rate. Previously, I used a very aggressive learning rate and converged to about 92.5%. I used 4 4090 parallel training. (The warmup I implemented now cannot be used with mixed precision acceleration. It can be faster in the future.) Thank you for your answer.
Would you be willing to share the code about linear scheduling of learning rates? This has been bothering me for a long time, thank you very much
Hello, @amaralibey Previously, I have been using the training parameters of SALAD, such as lr = 6e-5, and fine-tuning four or five layers (Some papers have obtained this parameter, such as SALAD, DINO-MIXVPR, EffoVPR). SALAD can converge to more than 90 on MSLS val in the first epoch. I also trained DINO-BoQ under this parameter, and the R@1 of the first epoch was 91.35, but the best result was lower than the paper.
Now I tried the parameters you recommended https://github.com/amaralibey/Bag-of-Queries/issues/7#issue-2402501570
(I implemented them in my own code, which may be errors). Can you please confirm whether the recall rates are similar with your training process?
Although I am looking forward to the results of the 30th (40th) epoch, it is worth noting that the parameters of SALAD can get a result of more than 92 on MSLS Val within 30 minutes. (A Single 24GB GPU)
I am curious whether it is the training parameters that help BoQ to obtain an optimal solution. (Although it is not necessary to compare with SALAD, it is necessary to compare with NetVLAD under the good training parameters and the same model settings to support the advancedness of BoQ.)