What learning rate was used for fine-tuning

I want to be sure about the learning rates you have used for fine-tuning. In the paper you state:

And, in this repo, it is written that for 100 epoch pre-trained ViT-B you use 4e^-3 learning rate (but use 2*e^-3 for 1600 epoch pre-trained model):

I want to be sure because your paper says that the MAE fine-tune setting was mostly used, but MAE only uses the learning rate of 4*e^-3 (base learning rate of e^-3 multiplied by total_batchsize/256.).

Also, when is the 3*e^-3 learning rate used for fine-tuning?

Haoqing-Wang / LocalMIM

What learning rate was used for fine-tuning #6