Open MohammedSB opened 12 months ago
I'm also looking for this, but I guess I'll not find the exact learning rate that was used.
In the evaluation code you can find all learning rates they considered in order to choose the best one:
learning_rates=[1e-5, 2e-5, 5e-5, 1e-4, 2e-4, 5e-4, 1e-3, 2e-3, 5e-3, 1e-2, 2e-2, 5e-2, 0.1],
In the paper, you can find this:
It would be awesome if the results of the grid search are published somewhere. I'm still looking for them.
These learning rate values are for linear probing, not fine-tuning. For fine-tuning, what worked for me was doing a grid search for values {1e-5, 1e-6, 1e-7}, and lower learning rates worked better for larger models.
But I used "standard" fine-tuning recipe as opposed to the one in Beit/Deit, which DINOv2 uses, so you should probably use that one if you are just starting.
@MohammedSB can you explain what you mean by "standard" fine-tuning?
I just mean that I did not use layer-wise LR scaling, weight decay, LR warmup, mixup, or any of the other techniques in Deit III fine-tuning recipe, which DINOv2 uses. https://github.com/facebookresearch/deit/blob/main/main.py
I basically just used an LR decay scheduler and that is it.
Understood. What was the LR that worked best for you? Also, any chance you tried freezing the initial layers and only fine-tuning the later ones?
You can scan these values {5e-5 1e-5, 5e-6, 1e-6, 1e-7}. Like I said, a lower learning rate worked better for me for larger models. The optimal LR changed from dataset to dataset.
I didn't try freezing initial layers, though I suspect it won't bring much benefit. Honestly, if you want to fine-tune the model just use the deit codebase I shared, it is really strong ViT fine-tuning recipe.
Hi all, I run standard finetuning or end-to-end finetuning (i.e. adding a linear classifier atop DINOv2 backbone) and achieved 5% accuracy improvement when finetuning on 16-shot in Semi-Aves dataset. I used the training strategy in the paper SWAT, which uses a larger learning rate of 1e-4 for classifier and a smaller learning rate of 1e-6 for the backbone. This learning rate setting shows to preserve the pretrained features and helps stabilize finetuning.
With regards to this sentence:
In Table 5, we show that the Top-1 accuracy on the validation set of ImageNet-1k improves by more than +2% when the backbone is fine-tuned.
What was the learning rate used for finetuning the backbone?
May I ask how to obtain the information mentioned by the author /labels.txt file? Thank you very much!
With regards to this sentence:
What was the learning rate used for finetuning the backbone?