Open lucasliunju opened 2 years ago
Hi @lucasliunju
Yes, I did run MAE pretraining + linear probe experiments on Base and Large architectures, although without gradient accumulation (I have to run those experiments too, but haven't had a chance yet).
Base reached 63% and ViT-L/16 got 69% accuracy in linear probe experiments. This is in comparison to official results in paper for ViT-L/16, which reached a linear probe accuracy of 73.5%. I do have the pretrained weights which I do intend to release publicly, I just haven't had time to do so yet.
I believe running gradient accumulation will close this gap, but I'm not certain when and if I'll have the capacity to do those experiments.
Dear SarthakYadav,
Thanks for your reply. Maybe I can help you to test it. May I ask what your mean about gradient accumulation is? I noticed current batch size is 128*8.
Best, Yong
@SarthakYadav
Thank you for awesome work! May I ask training loss or validation loss value when training on ImageNet1K?
Thank you!
Thank you very much for your contribution. I think that will help the whole jax community about MAE training.
May I ask whether the repo can reproduce the results on the MAE paper, such as the comparison between this repo and official results?
Thanks for your contribution again!
Best, Lucas