I want to be sure about the learning rates you have used for fine-tuning.
In the paper you state:
And, in this repo, it is written that for 100 epoch pre-trained ViT-B you use 4e^-3 learning rate (but use 2*e^-3 for 1600 epoch pre-trained model):
I want to be sure because your paper says that the MAE fine-tune setting was mostly used, but MAE only uses the learning rate of 4*e^-3 (base learning rate of e^-3 multiplied by total_batchsize/256.).
Also, when is the 3*e^-3 learning rate used for fine-tuning?
I want to be sure about the learning rates you have used for fine-tuning. In the paper you state:
And, in this repo, it is written that for 100 epoch pre-trained ViT-B you use 4e^-3 learning rate (but use 2*e^-3 for 1600 epoch pre-trained model):
I want to be sure because your paper says that the MAE fine-tune setting was mostly used, but MAE only uses the learning rate of 4*e^-3 (base learning rate of e^-3 multiplied by total_batchsize/256.).
Also, when is the 3*e^-3 learning rate used for fine-tuning?