About DinoV2+BoQ Training.

BinuxLiu commented 3 months ago

Hello, @amaralibey Previously, I have been using the training parameters of SALAD, such as lr = 6e-5, and fine-tuning four or five layers (Some papers have obtained this parameter, such as SALAD, DINO-MIXVPR, EffoVPR). SALAD can converge to more than 90 on MSLS val in the first epoch. I also trained DINO-BoQ under this parameter, and the R@1 of the first epoch was 91.35, but the best result was lower than the paper.

Now I tried the parameters you recommended https://github.com/amaralibey/Bag-of-Queries/issues/7#issue-2402501570

(I implemented them in my own code, which may be errors). Can you please confirm whether the recall rates are similar with your training process?

2024-07-13 03:40:35   Start training epoch: 00
100%|█████████████████████████████████████████████████████████████| 391/391 [04:56<00:00,  1.32it/s]
2024-07-13 03:45:32   Finished epoch 00 in 0:04:56, average epoch loss = 0.8788
2024-07-13 03:45:32   Extracting database features for evaluation/testing
100%|█████████████████████████████████████████████████████████████| 295/295 [01:06<00:00,  4.44it/s]
2024-07-13 03:46:39   Extracting queries features for evaluation/testing
100%|███████████████████████████████████████████████████████████████| 12/12 [00:06<00:00,  1.97it/s]
2024-07-13 03:46:46   Calculating recalls
2024-07-13 03:47:04   Recalls on val set < BaseDataset, msls - #database: 18871; #queries: 740 >: R@1: 65.14, R@5: 74.46, R@10: 79.05, R@100: 92.43
2024-07-13 03:47:04   Improved: previous best R@1 = 0.0, current R@1 = 65.1
2024-07-13 03:47:05   Start training epoch: 01
100%|█████████████████████████████████████████████████████████████| 391/391 [04:45<00:00,  1.37it/s]
2024-07-13 03:51:51   Finished epoch 01 in 0:04:45, average epoch loss = 0.5006
2024-07-13 03:51:51   Extracting database features for evaluation/testing
100%|█████████████████████████████████████████████████████████████| 295/295 [01:04<00:00,  4.55it/s]
2024-07-13 03:52:56   Extracting queries features for evaluation/testing
100%|███████████████████████████████████████████████████████████████| 12/12 [00:05<00:00,  2.31it/s]
2024-07-13 03:53:02   Calculating recalls
2024-07-13 03:53:17   Recalls on val set < BaseDataset, msls - #database: 18871; #queries: 740 >: R@1: 88.92, R@5: 94.59, R@10: 95.68, R@100: 98.11
2024-07-13 03:53:17   Improved: previous best R@1 = 65.1, current R@1 = 88.9
2024-07-13 03:53:18   Start training epoch: 02
100%|█████████████████████████████████████████████████████████████| 391/391 [04:47<00:00,  1.36it/s]
2024-07-13 03:58:06   Finished epoch 02 in 0:04:47, average epoch loss = 0.3870
2024-07-13 03:58:06   Extracting database features for evaluation/testing
100%|█████████████████████████████████████████████████████████████| 295/295 [01:22<00:00,  3.59it/s]
2024-07-13 03:59:28   Extracting queries features for evaluation/testing
100%|███████████████████████████████████████████████████████████████| 12/12 [00:06<00:00,  1.90it/s]
2024-07-13 03:59:36   Calculating recalls
2024-07-13 03:59:51   Recalls on val set < BaseDataset, msls - #database: 18871; #queries: 740 >: R@1: 89.59, R@5: 95.95, R@10: 96.35, R@100: 98.38
2024-07-13 03:59:51   Improved: previous best R@1 = 88.9, current R@1 = 89.6
2024-07-13 03:59:52   Start training epoch: 03
100%|█████████████████████████████████████████████████████████████| 391/391 [04:46<00:00,  1.36it/s]
2024-07-13 04:04:41   Finished epoch 03 in 0:04:46, average epoch loss = 0.3235
2024-07-13 04:04:41   Extracting database features for evaluation/testing
100%|█████████████████████████████████████████████████████████████| 295/295 [01:10<00:00,  4.18it/s]
2024-07-13 04:05:51   Extracting queries features for evaluation/testing
100%|███████████████████████████████████████████████████████████████| 12/12 [00:06<00:00,  1.92it/s]
2024-07-13 04:05:58   Calculating recalls
2024-07-13 04:06:13   Recalls on val set < BaseDataset, msls - #database: 18871; #queries: 740 >: R@1: 91.76, R@5: 95.81, R@10: 96.76, R@100: 98.38
2024-07-13 04:06:13   Improved: previous best R@1 = 89.6, current R@1 = 91.8
2024-07-13 04:06:14   Start training epoch: 04
100%|█████████████████████████████████████████████████████████████| 391/391 [04:45<00:00,  1.37it/s]
2024-07-13 04:11:01   Finished epoch 04 in 0:04:45, average epoch loss = 0.2830
2024-07-13 04:11:01   Extracting database features for evaluation/testing
100%|█████████████████████████████████████████████████████████████| 295/295 [01:11<00:00,  4.11it/s]
2024-07-13 04:12:13   Extracting queries features for evaluation/testing
100%|███████████████████████████████████████████████████████████████| 12/12 [00:06<00:00,  1.88it/s]
2024-07-13 04:12:21   Calculating recalls
2024-07-13 04:12:38   Recalls on val set < BaseDataset, msls - #database: 18871; #queries: 740 >: R@1: 91.62, R@5: 96.35, R@10: 96.89, R@100: 98.24
2024-07-13 04:12:38   Not improved: 1 / 10: best R@1 = 91.8, current R@1 = 91.6
2024-07-13 04:12:38   Start training epoch: 05
100%|█████████████████████████████████████████████████████████████| 391/391 [04:46<00:00,  1.37it/s]
2024-07-13 04:17:26   Finished epoch 05 in 0:04:46, average epoch loss = 0.2557
2024-07-13 04:17:26   Extracting database features for evaluation/testing
100%|█████████████████████████████████████████████████████████████| 295/295 [01:21<00:00,  3.62it/s]
2024-07-13 04:18:47   Extracting queries features for evaluation/testing
100%|███████████████████████████████████████████████████████████████| 12/12 [00:06<00:00,  1.94it/s]
2024-07-13 04:18:54   Calculating recalls
2024-07-13 04:19:12   Recalls on val set < BaseDataset, msls - #database: 18871; #queries: 740 >: R@1: 92.16, R@5: 95.95, R@10: 97.03, R@100: 98.24
2024-07-13 04:19:12   Improved: previous best R@1 = 91.8, current R@1 = 92.2

Although I am looking forward to the results of the 30th (40th) epoch, it is worth noting that the parameters of SALAD can get a result of more than 92 on MSLS Val within 30 minutes. (A Single 24GB GPU)

I am curious whether it is the training parameters that help BoQ to obtain an optimal solution. (Although it is not necessary to compare with SALAD, it is necessary to compare with NetVLAD under the good training parameters and the same model settings to support the advancedness of BoQ.)

amaralibey commented 3 months ago

Hi @BinuxLiu,

In my experience, the training time is not something that is most often taken into consideration. I always try to avoid aggressive learning rate at the beginning of training so that the network doesn't get stuck into local minima.

Here are some random insights:

We are fine-tuning the DinoV2 backbone with 0.2x the learning rate (and only the last two blocks, increasing this could enhance performance).
The learning rate we are using is 2e-4 for the wramup period (3600 iterations, so the network is getting updated with very, very, small steps at the begining).
Once the warmup has finished the learning rate is multiplied by 0.1 (which makes it 2e-5 for BoQ and 4e-6 for DinoV2). So basically we are using a lr smaller than that used in the papers you mentionned.
BoQ learns sets of global queries (self.queries) which are initialized with normal(0, 1). The learning rate can have an impact on them, I would suggest you keep an eye on the initialization if you want to play with bigger learning rates (maybe use nn.init.normal_(0, 0.05)? ).

I didn't spend enough time with DinoV2 backbone (as I was only interested in ResNet50, which was the most used backbone for VPR in recent years), so I don't know what could be the best set of hyperparameters.

Have you tried training SALAD and NetVLAD with the hyperparameters I shared?

As for the results you're getting, they seem in accordance with what I get (although, I get ~0.75 R@1 at the first epoch). Here is a screenshot (these are validation results at each epoch with 280x280 images, maybe you're better validating with 322x322 to get the best possible model):

Side note: You're doing 5min per epochs? Wow!! that is at least 4x faster than my RTX8000, what GPU are you using?

BinuxLiu commented 3 months ago

2024-07-13 05:29:47   Recalls on val set < BaseDataset, msls - #database: 18871; #queries: 740 >: R@1: 93.24, R@5: 96.62, R@10: 96.76, R@100: 98.24
2024-07-13 05:29:47   Not improved: 2 / 10: best R@1 = 93.2, current R@1 = 93.2

It seems that I can probably reproduce your results. Next, I will also try to implement DINO-NetVLAD with a smaller learning rate. Previously, I used a very aggressive learning rate and converged to about 92.5%. I used 4 4090 parallel training. (The warmup I implemented now cannot be used with mixed precision acceleration. It can be faster in the future.) Thank you for your answer.

Li-Yun-star commented 1 month ago

Would you be willing to share the code about linear scheduling of learning rates? This has been bothering me for a long time, thank you very much

amaralibey / Bag-of-Queries

About DinoV2+BoQ Training. #8