Lupin1998 / Awesome-MIM

[Survey] Masked Modeling for Self-supervised Representation Learning on Vision and Beyond (https://arxiv.org/abs/2401.00897)
https://openmixup.readthedocs.io/en/latest/awesome_selfsup/MIM.html
Apache License 2.0
299 stars 14 forks source link
awesome-list awesome-mim bert computer-vision deep-learning generative-models gpt mae masked-autoencoder masked-image-modeling masked-modeling pre-training representation-learning self-supervised-learning vision-transformer

Awesome Masked Modeling for Self-supervised Vision Represention and Beyond

Awesome PRs Welcome Maintenance GitHub stars GitHub forks

Introduction

We summarize awesome Masked Image Modeling (MIM) and relevent Masked Modeling methods proposed for self-supervised representation learning. Welcome to add relevant masked modeling paper to our project!

This project is a part of our survey on masked modeling methods (arXiv). The list of awesome MIM methods is summarized in chronological order and is on updating. If you find any typos or any missed paper, please feel free to open an issue or send a pull request. Currently, our survey is on updating and here is the latest version.

Research in self-supervised learning can be broadly categorized into Generative and Discriminative paradigms. We reviewed major SSL research since 2008 and found that SSL has followed distinct developmental trajectories and stages across time periods and modalities. Since 2018, SSL in NLP has been dominated by generative masked language modeling, which remains mainstream. In computer vision, discriminative contrastive learning dominated from 2018 to 2021 before masked image modeling gained prominence after 2022.

Table of Contents

Fundamental MIM Methods

The overview of the basic MIM framework, containing four building blocks with their internal components and functionalities. All MIM research can be summarized as innovations upon these four blocks, i.e., Masking, Encoder, Target, and Head. Frameworks of masked modeling in other modalities are similar to this framework.

MIM for Transformers

(back to top)

MIM with Constrastive Learning

(back to top)

MIM for Transformers and CNNs

(back to top)

MIM with Advanced Masking

(back to top)

MIM for Multi-Modality

MIM for Vision Generalist Model

(back to top)

Image Generation

(back to top)

MIM for CV Downstream Tasks

Object Detection and Segmentation

Video Rrepresentation

(back to top)

Knowledge Distillation and Few-shot Classification

Efficient Fine-tuning

Medical Image

Face Recognition

Scene Text Recognition (OCR)

Remote Sensing Image

3D Representation Learning

Low-level Vision

Depth Estimation

(back to top)

Audio and Speech

AI for Science

Protein

Chemistry

Physics

(back to top)

Time Series and Neuroscience Learning

Reinforcement Learning

(back to top)

Tabular Data

Analysis and Understanding of Masked Modeling

(back to top)

Survey

Contribution

Feel free to send pull requests to add more links with the following Markdown format. Note that the abbreviation, the code link, and the figure link are optional attributes.

* **TITLE**<br>
*AUTHER*<br>
PUBLISH'YEAR [[Paper](link)] [[Code](link)]
   <details close>
   <summary>ABBREVIATION Framework</summary>
   <p align="center"><img width="90%" src="https://github.com/Lupin1998/Awesome-MIM/raw/master/link_to_image" /></p>
   </details>

The main maintainer is Siyuan Li (@Lupin1998). We thank all contributors for Awesome-MIM, and current contributors include:

Citation

If you find this repository and our survey helpful, please consider citing our paper:

@article{Li2023MIMSurvey,
  title={Masked Modeling for Self-supervised Representation Learning on Vision and Beyond},
  author={Siyuan Li and Luyuan Zhang and Zedong Wang and Di Wu and Lirong Wu and Zicheng Liu and Jun Xia and Cheng Tan and Yang Liu and Baigui Sun and Stan Z. Li},
  journal={ArXiv},
  year={2023},
  volume={abs/2401.00897},
}

Related Project

Paper List of Masked Image Modeling

Project of Self-supervised Learning

(back to top)