FlyEgle / MAE-pytorch

Masked Autoencoders Are Scalable Vision Learners
245 stars 36 forks source link

An pytorch implementation of Masked Autoencoders Are Scalable Vision Learners

This is a coarse version for MAE, only make the pretrain model, the finetune and linear is comming soon.

Note: My vit code not fully base on the Timm or BEIT, so the result may be less than them.

Update

1. Introduction

This repo is the MAE-vit model which impelement with pytorch, no reference any reference code so this is a non-official version. Because of the limitation of time and machine, I only trained the vit-tiny, vit-base/16 for model pretrain. mae

2. Enveriments

3. Model Config

Pretrain Config

Finetune Config

Wait for the results

TODO:

4. Results

Show the pretrain result on the imaget val dataset, left is the mask image, middle is the reconstruction image, right is the origin image.

Large models work significantly better than small models.

You can download to test the restruction result. Put the ckpt in weights folder.

5. Training & Inference

6. TODO

There may be have some problems with the implementation, welcome to make discussion and submission code.

License

This project is released under the MIT license