Pre-trained results of MAE

lucasliunju commented 2 years ago

Thank you very much for your contribution. I think that will help the whole jax community about MAE training.

May I ask whether the repo can reproduce the results on the MAE paper, such as the comparison between this repo and official results?

Thanks for your contribution again!

Best, Lucas

SarthakYadav commented 2 years ago

Hi @lucasliunju

Yes, I did run MAE pretraining + linear probe experiments on Base and Large architectures, although without gradient accumulation (I have to run those experiments too, but haven't had a chance yet).

Base reached 63% and ViT-L/16 got 69% accuracy in linear probe experiments. This is in comparison to official results in paper for ViT-L/16, which reached a linear probe accuracy of 73.5%. I do have the pretrained weights which I do intend to release publicly, I just haven't had time to do so yet.

I believe running gradient accumulation will close this gap, but I'm not certain when and if I'll have the capacity to do those experiments.

lucasliunju commented 2 years ago

Dear SarthakYadav,

Thanks for your reply. Maybe I can help you to test it. May I ask what your mean about gradient accumulation is? I noticed current batch size is 128*8.

Best, Yong

snoop2head commented 4 months ago

@SarthakYadav

Thank you for awesome work! May I ask training loss or validation loss value when training on ImageNet1K?

Thank you!

SarthakYadav / jax-mae

Pre-trained results of MAE #1