JonnyS1226 / ego4d_asl

code for Ego4D Workshop@CVPR 2023 - 1st in MQ & 2nd in NLQ challenge
9 stars 2 forks source link

Ego4D-ASL

Techical report | 1-st in MQ challenge and 2-nd in NLQ challenge in Ego4D workshop at CVPR 2023.

This report presents ReLER submission to two tracks in the Ego4D Episodic Memory Benchmark in CVPR 2023, including Natural Language Queries and Moment Queries. This solution inherits from our proposed Action Sensitivity Learning framework (ASL) to better capture discrepant information of frames. Further, we incorporate a series of stronger video features and fusion strategies. Our method achieves an average mAP of 29.34, ranking 1st in Moment Queries Challenge, and garners 19.79 mean R1, ranking 2nd in Natural Language Queries Challenge. Our code will be released.

Changelog

Installation

Data Preparation

Train on MQ (train-set)

Validate on MQ (val-set)

Train on MQ (Train + val)

Submission (to Leaderboard test-set server)

Acknowledgement

Our model are based on Actionformer. Thanks for their contributions.

Cite

@article{shao2023action,
  title={Action Sensitivity Learning for the Ego4D Episodic Memory Challenge 2023},
  author={Shao, Jiayi and Wang, Xiaohan and Quan, Ruijie and Yang, Yi},
  journal={arXiv preprint arXiv:2306.09172},
  year={2023}
}

@InProceedings{Shao_2023_ICCV,
    author    = {Shao, Jiayi and Wang, Xiaohan and Quan, Ruijie and Zheng, Junjun and Yang, Jiang and Yang, Yi},
    title     = {Action Sensitivity Learning for Temporal Action Localization},
    booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
    month     = {October},
    year      = {2023},
    pages     = {13457-13469}
}