Training result error - Githubissues

sugy0114 commented 3 weeks ago

Thank you for your wonderful work. I encountered a problem during the training process, and as the epoch increased, re@1 The AP value actually decreased This is the output result of my different epochs: ------------------------------[Epoch: 1]------------------------------ 100%|████████████████████████████████████████████████████████████████████████| 295/295 [04:40<00:00, 1.05it/s, loss=0.8647, loss_avg=1.0110, lr=0.000999] Epoch: 1, Train Loss = 1.011, Lr = 0.000999

------------------------------[Evaluate]------------------------------ Extract Features: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 6/6 [00:02<00:00, 2.08it/s] 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 402/402 [01:32<00:00, 4.35it/s] Compute Scores: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 701/701 [00:03<00:00, 197.38it/s] Recall@1: 93.5806 - Recall@5: 96.5763 - Recall@10: 97.1469 - Recall@top1: 99.2867 - AP: 89.7420

Shuffle Dataset: 42428it [00:00, 224071.11it/s] Original Length: 37854 - Length after Shuffle: 37760 Break Counter: 512 Pairs left out of last batch to avoid creating noise: 94 First Element ID: 0945 - Last Element ID: 1646

------------------------------[Epoch: 25]------------------------------ 100%|████████████████████████████████████████████████████████████████████████| 295/295 [04:45<00:00, 1.04it/s, loss=0.8206, loss_avg=0.8082, lr=0.000313] Epoch: 25, Train Loss = 0.808, Lr = 0.000313

------------------------------[Evaluate]------------------------------ Extract Features: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 6/6 [00:02<00:00, 2.12it/s] 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 402/402 [01:38<00:00, 4.10it/s] Compute Scores: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 701/701 [00:03<00:00, 197.02it/s] Recall@1: 77.8887 - Recall@5: 85.8773 - Recall@10: 88.7304 - Recall@top1: 98.5735 - AP: 54.7721

Shuffle Dataset: 42376it [00:00, 223377.40it/s] Original Length: 37854 - Length after Shuffle: 37760 Break Counter: 512 Pairs left out of last batch to avoid creating noise: 94 First Element ID: 0956 - Last Element ID: 1375

Skyy93 commented 3 weeks ago

This is not an error. This is in line with our results in the paper. After one epoch the training reaches its maximum of performance with this model. There are other works from our team that show that DINOv2 provides an performance boost.

sugy0114 commented 3 weeks ago

@Skyy93 Thank you for your reply. Does this mean that the epoch should be set to 1 during training? Will it not cause underfitting?

Skyy93 / Sample4Geo

Training result error #30