facebookresearch / EgoVLPv2

Code release for "EgoVLPv2: Egocentric Video-Language Pre-training with Fusion in the Backbone" [ICCV, 2023]
MIT License
85 stars 11 forks source link

EgoVLPv2: Egocentric Video-Language Pre-training with Fusion in the Backbone

PWC PWC PWC PWC PWC

EgoVLPv2: Egocentric Video-Language Pre-training with Fusion in the Backbone
Shraman Pramanick, Yale Song, Sayan Nag, Kevin Qinghong Lin, Hardik Shah, Mike Z. Shou, Rama Chellappa, Pengchuan Zhang
ICCV, 2023
arxiv | project page

TL;DR: We introduce the second generation of egocentric video-language pre-training (EgoVLPv2), a significant improvement from the previous generation, by incorporating cross-modal fusion directly into the video and language backbones.

EgoVLPv2

📢 News

📁 Repository Structure

The contents of this repository are structured as follows:

EgoVLPv2
    ├── EgoVLPv2
    │   ├── Pre-training on EgoClip version of Ego4D
    │   ├── Validation on EgoMCQ 
    │   ├── Zero-Shot and fine-tuning on EK-100 MIR
    │   ├── Zero-shot and fine-tuning on Charades-Ego
    │   └── Feature extraction on EgoMQ
    ├── EgoTaskQA
    │   └── Fine-tuning on EgoTaskQA direct and indirect splits
    ├── EgoNLQ
    │   └── Feature extraction and head-tuning on EgoNLQ 
    ├── QFVS
    │   └── Feature extraction and head-tuning on QFVS
    └── EgoMQ
        └── Head-tuning on EgoMQ 

Each directory contains data settings, training/inference scripts, and checkpoints. Notably, we provided pre-extracted video and text features to power Ego4D NLQ & MQ challenges.

🛠️ Environment Preparation

conda create -n python=3.8.13 egovlpv2 pip
conda activate egovlpv2
pip install -r requirements.txt

✉️ Contact

This repository is created and maintained by Shraman. Questions and discussions are welcome via spraman3@jhu.edu. We are willing to merge results if EgoVLPv2 is transferred to other egocentric tasks or datasets.

🙏 Acknowledgements

The codebase for this work is built on the EgoVLP, LAVILA, FIBER, and VSLNet repository. We would like to thank the respective authors for their contribution, and the Meta AI team for discussions and feedback.

📄 License

EgoVLPv2 is licensed under a MIT License.

🎓 Citing EgoVLPv2

@article{pramanick2023egovlpv2,
  title={EgoVLPv2: Egocentric Video-Language Pre-training with Fusion in the Backbone},
  author={Pramanick, Shraman and Song, Yale and Nag, Sayan and Lin, Kevin Qinghong and Shah, Hardik and Shou, Mike Zheng and Chellappa, Rama and Zhang, Pengchuan},
  journal={arXiv preprint arXiv:2307.05463},
  year={2023}
}