Omnivorous modeling for visual modalities

This repository contains PyTorch pretrained models, inference examples for the following papers:

Omnivore A single vision model for many different visual modalities, CVPR 2022 [bib]

``` @inproceedings{girdhar2022omnivore, title={{Omnivore: A Single Model for Many Visual Modalities}}, author={Girdhar, Rohit and Singh, Mannat and Ravi, Nikhila and van der Maaten, Laurens and Joulin, Armand and Misra, Ishan}, booktitle={CVPR}, year={2022} } ```

OmniMAE Single Model Masked Pretraining on Images and Videos [bib]

``` @article{girdhar2022omnimae, title={OmniMAE: Single Model Masked Pretraining on Images and Videos}, author={Girdhar, Rohit and El-Nouby, Alaaeldin and Singh, Mannat and Alwala, Kalyan Vasudev and Joulin, Armand and Misra, Ishan}, journal={arXiv preprint arXiv:2206.08356}, year={2022} } ```

OmniVision Our training pipeline supporting the multi-modal vision research.[bib]

Contributing

We welcome your pull requests! Please see CONTRIBUTING and CODE_OF_CONDUCT for more information.

License

Omnivore is released under the CC-BY-NC 4.0 license. See LICENSE for additional details. However the Swin Transformer implementation is additionally licensed under the Apache 2.0 license (see NOTICE for additional details).

facebookresearch / omnivore

readme

Omnivorous modeling for visual modalities

Contributing

License