amaralibey / Bag-of-Queries

BoQ: A Place is Worth a Bag of learnable Queries (CVPR 2024)
MIT License
96 stars 6 forks source link

About DinoV2+BoQ Training. #8

Closed BinuxLiu closed 4 months ago

BinuxLiu commented 4 months ago

Hello, @amaralibey Previously, I have been using the training parameters of SALAD, such as lr = 6e-5, and fine-tuning four or five layers (Some papers have obtained this parameter, such as SALAD, DINO-MIXVPR, EffoVPR). SALAD can converge to more than 90 on MSLS val in the first epoch. I also trained DINO-BoQ under this parameter, and the R@1 of the first epoch was 91.35, but the best result was lower than the paper.

Now I tried the parameters you recommended https://github.com/amaralibey/Bag-of-Queries/issues/7#issue-2402501570

(I implemented them in my own code, which may be errors). Can you please confirm whether the recall rates are similar with your training process?

2024-07-13 03:40:35   Start training epoch: 00
100%|█████████████████████████████████████████████████████████████| 391/391 [04:56<00:00,  1.32it/s]
2024-07-13 03:45:32   Finished epoch 00 in 0:04:56, average epoch loss = 0.8788
2024-07-13 03:45:32   Extracting database features for evaluation/testing
100%|█████████████████████████████████████████████████████████████| 295/295 [01:06<00:00,  4.44it/s]
2024-07-13 03:46:39   Extracting queries features for evaluation/testing
100%|███████████████████████████████████████████████████████████████| 12/12 [00:06<00:00,  1.97it/s]
2024-07-13 03:46:46   Calculating recalls
2024-07-13 03:47:04   Recalls on val set < BaseDataset, msls - #database: 18871; #queries: 740 >: R@1: 65.14, R@5: 74.46, R@10: 79.05, R@100: 92.43
2024-07-13 03:47:04   Improved: previous best R@1 = 0.0, current R@1 = 65.1
2024-07-13 03:47:05   Start training epoch: 01
100%|█████████████████████████████████████████████████████████████| 391/391 [04:45<00:00,  1.37it/s]
2024-07-13 03:51:51   Finished epoch 01 in 0:04:45, average epoch loss = 0.5006
2024-07-13 03:51:51   Extracting database features for evaluation/testing
100%|█████████████████████████████████████████████████████████████| 295/295 [01:04<00:00,  4.55it/s]
2024-07-13 03:52:56   Extracting queries features for evaluation/testing
100%|███████████████████████████████████████████████████████████████| 12/12 [00:05<00:00,  2.31it/s]
2024-07-13 03:53:02   Calculating recalls
2024-07-13 03:53:17   Recalls on val set < BaseDataset, msls - #database: 18871; #queries: 740 >: R@1: 88.92, R@5: 94.59, R@10: 95.68, R@100: 98.11
2024-07-13 03:53:17   Improved: previous best R@1 = 65.1, current R@1 = 88.9
2024-07-13 03:53:18   Start training epoch: 02
100%|█████████████████████████████████████████████████████████████| 391/391 [04:47<00:00,  1.36it/s]
2024-07-13 03:58:06   Finished epoch 02 in 0:04:47, average epoch loss = 0.3870
2024-07-13 03:58:06   Extracting database features for evaluation/testing
100%|█████████████████████████████████████████████████████████████| 295/295 [01:22<00:00,  3.59it/s]
2024-07-13 03:59:28   Extracting queries features for evaluation/testing
100%|███████████████████████████████████████████████████████████████| 12/12 [00:06<00:00,  1.90it/s]
2024-07-13 03:59:36   Calculating recalls
2024-07-13 03:59:51   Recalls on val set < BaseDataset, msls - #database: 18871; #queries: 740 >: R@1: 89.59, R@5: 95.95, R@10: 96.35, R@100: 98.38
2024-07-13 03:59:51   Improved: previous best R@1 = 88.9, current R@1 = 89.6
2024-07-13 03:59:52   Start training epoch: 03
100%|█████████████████████████████████████████████████████████████| 391/391 [04:46<00:00,  1.36it/s]
2024-07-13 04:04:41   Finished epoch 03 in 0:04:46, average epoch loss = 0.3235
2024-07-13 04:04:41   Extracting database features for evaluation/testing
100%|█████████████████████████████████████████████████████████████| 295/295 [01:10<00:00,  4.18it/s]
2024-07-13 04:05:51   Extracting queries features for evaluation/testing
100%|███████████████████████████████████████████████████████████████| 12/12 [00:06<00:00,  1.92it/s]
2024-07-13 04:05:58   Calculating recalls
2024-07-13 04:06:13   Recalls on val set < BaseDataset, msls - #database: 18871; #queries: 740 >: R@1: 91.76, R@5: 95.81, R@10: 96.76, R@100: 98.38
2024-07-13 04:06:13   Improved: previous best R@1 = 89.6, current R@1 = 91.8
2024-07-13 04:06:14   Start training epoch: 04
100%|█████████████████████████████████████████████████████████████| 391/391 [04:45<00:00,  1.37it/s]
2024-07-13 04:11:01   Finished epoch 04 in 0:04:45, average epoch loss = 0.2830
2024-07-13 04:11:01   Extracting database features for evaluation/testing
100%|█████████████████████████████████████████████████████████████| 295/295 [01:11<00:00,  4.11it/s]
2024-07-13 04:12:13   Extracting queries features for evaluation/testing
100%|███████████████████████████████████████████████████████████████| 12/12 [00:06<00:00,  1.88it/s]
2024-07-13 04:12:21   Calculating recalls
2024-07-13 04:12:38   Recalls on val set < BaseDataset, msls - #database: 18871; #queries: 740 >: R@1: 91.62, R@5: 96.35, R@10: 96.89, R@100: 98.24
2024-07-13 04:12:38   Not improved: 1 / 10: best R@1 = 91.8, current R@1 = 91.6
2024-07-13 04:12:38   Start training epoch: 05
100%|█████████████████████████████████████████████████████████████| 391/391 [04:46<00:00,  1.37it/s]
2024-07-13 04:17:26   Finished epoch 05 in 0:04:46, average epoch loss = 0.2557
2024-07-13 04:17:26   Extracting database features for evaluation/testing
100%|█████████████████████████████████████████████████████████████| 295/295 [01:21<00:00,  3.62it/s]
2024-07-13 04:18:47   Extracting queries features for evaluation/testing
100%|███████████████████████████████████████████████████████████████| 12/12 [00:06<00:00,  1.94it/s]
2024-07-13 04:18:54   Calculating recalls
2024-07-13 04:19:12   Recalls on val set < BaseDataset, msls - #database: 18871; #queries: 740 >: R@1: 92.16, R@5: 95.95, R@10: 97.03, R@100: 98.24
2024-07-13 04:19:12   Improved: previous best R@1 = 91.8, current R@1 = 92.2

Although I am looking forward to the results of the 30th (40th) epoch, it is worth noting that the parameters of SALAD can get a result of more than 92 on MSLS Val within 30 minutes. (A Single 24GB GPU)

I am curious whether it is the training parameters that help BoQ to obtain an optimal solution. (Although it is not necessary to compare with SALAD, it is necessary to compare with NetVLAD under the good training parameters and the same model settings to support the advancedness of BoQ.)

amaralibey commented 4 months ago

Hi @BinuxLiu,

In my experience, the training time is not something that is most often taken into consideration. I always try to avoid aggressive learning rate at the beginning of training so that the network doesn't get stuck into local minima.

Here are some random insights:

I didn't spend enough time with DinoV2 backbone (as I was only interested in ResNet50, which was the most used backbone for VPR in recent years), so I don't know what could be the best set of hyperparameters.

Have you tried training SALAD and NetVLAD with the hyperparameters I shared?

As for the results you're getting, they seem in accordance with what I get (although, I get ~0.75 R@1 at the first epoch). Here is a screenshot (these are validation results at each epoch with 280x280 images, maybe you're better validating with 322x322 to get the best possible model): image

Side note: You're doing 5min per epochs? Wow!! that is at least 4x faster than my RTX8000, what GPU are you using?

BinuxLiu commented 4 months ago
2024-07-13 05:29:47   Recalls on val set < BaseDataset, msls - #database: 18871; #queries: 740 >: R@1: 93.24, R@5: 96.62, R@10: 96.76, R@100: 98.24
2024-07-13 05:29:47   Not improved: 2 / 10: best R@1 = 93.2, current R@1 = 93.2

It seems that I can probably reproduce your results. Next, I will also try to implement DINO-NetVLAD with a smaller learning rate. Previously, I used a very aggressive learning rate and converged to about 92.5%. I used 4 4090 parallel training. (The warmup I implemented now cannot be used with mixed precision acceleration. It can be faster in the future.) Thank you for your answer.

Li-Yun-star commented 2 months ago

Would you be willing to share the code about linear scheduling of learning rates? This has been bothering me for a long time, thank you very much