chinhsuanwu / mobilevit-pytorch

A PyTorch implementation of "MobileViT: Light-weight, General-purpose, and Mobile-friendly Vision Transformer"
https://arxiv.org/abs/2110.02178
MIT License
501 stars 70 forks source link
mobilenetv2 mobilevit vision-transformer vit

MobileViT

Overview

This is a PyTorch implementation of MobileViT specified in "MobileViT: Light-weight, General-purpose, and Mobile-friendly Vision Transformer", arXiv 2021.

img

👉 Check out CoAtNet if you are interested in other Convolution + Transformer models.

Usage

import torch
from mobilevit import mobilevit_xxs

img = torch.randn(1, 3, 256, 256)
vit = mobilevit_xxs()
out = vit(img)

Citation

@article{mehta2021mobilevit,
  title={MobileViT: Light-weight, General-purpose, and Mobile-friendly Vision Transformer},
  author={Mehta, Sachin and Rastegari, Mohammad},
  journal={arXiv preprint arXiv:2110.02178},
  year={2021}
}

Credits

Code adapted from MobileNetV2 and ViT.