OpenGVLab / UniFormerV2

[ICCV2023] UniFormerV2: Spatiotemporal Learning by Arming Image ViTs with Video UniFormer
https://arxiv.org/abs/2211.09552
Apache License 2.0
291 stars 18 forks source link

UniFormerV2

This repo is the official implementation of "UniFormerV2: Spatiotemporal Learning by Arming Image ViTs with Video UniFormer". By Kunchang Li, Yali Wang, Yinan He, Yizhuo Li, Yi Wang, Limin Wang and Yu Qiao.

Update

11/14/2023

Thanks for Innat'help @innat. Now our models also support Keras! 😄

07/14/2023

UniFormerV2 has been accepted by ICCV2023! 🎉

02/13/2023

UniFormerV2 has been integrated into MMAction2. Training code will be provided soon! 😄

11/20/2022

We give a video demo in hugging face. Have a try! 😄

11/19/2022

We give a blog in Chinese Zhihu.

11/18/2022

All the code, models and configs are provided. Don't hesitate to open an issue if you have any problem! 🙋🏻

Introduction

In UniFormerV2, we propose a generic paradigm to build a powerful family of video networks, by arming the pre-trained ViTs with efficient UniFormer designs. It inherits the concise style of the UniFormer block. But it contains brand- new local and global relation aggregators, which allow for preferable accuracy-computation balance by seamlessly integrating advantages from both ViTs and UniFormer. teaser It gets the state-of-the-art recognition performance on 8 popular video benchmarks, including scene-related Kinetics-400/600/700 and Moments in Time, temporal-related Something-Something V1/V2, untrimmed ActivityNet and HACS. In particular, it is the first model to achieve 90% top-1 accuracy on Kinetics-400.

PWC PWC PWC PWC PWC PWC PWC PWC

Model Zoo

All the models can be found in MODEL_ZOO.

Instructions

See INSTRUCTIONS for more details about:

Cite Uniformer

If you find this repository useful, please use the following BibTeX entry for citation.

@misc{li2022uniformerv2,
      title={UniFormerV2: Spatiotemporal Learning by Arming Image ViTs with Video UniFormer}, 
      author={Kunchang Li and Yali Wang and Yinan He and Yizhuo Li and Yi Wang and Limin Wang and Yu Qiao},
      year={2022},
      eprint={2211.09552},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

License

This project is released under the MIT license. Please see the LICENSE file for more information.

Acknowledgement

This repository is built based on UniFormer and SlowFast repository.