Xiaojun Tang, Junsong Fan, Chuanchen Luo, Zhaoxiang Zhang, Man Zhang, and Zongyuan Yang
Weakly-supervised temporal action localization (WTAL) is a practical yet challenging task. Due to large-scale datasets, most existing methods use a network pretrained in other datasets to extract features, which is not suitable enough for WTAL. To address this problem, researchers design several enhance-feature modules, especially modeling the spatiotemporal relationship between snippets, and contribute to the performance of the localization module. However, all of them neglect the adverse effect of ambiguous snippets, which would reduce the discriminability of others. Considering this problem, we propose a graph network to explicitly model different snippets. Specially, we define pseudo-action snippets, pseudo-background snippets, and ambiguous snippets through simple judgments according to action weights. Based on them, we propose Discriminability-Driven Graph Network (DDG-Net) to spread complementary information between discriminative snippets and enhance the discriminability of ambiguous snippets through one-way acception. Additionally, we propose feature consistency loss to fully explore the ability of the graph convolution model and prevent the assimilation of features. Extensive experiments on THUMOS14 and ActivityNet1.2 benchmarks demonstrate the effectiveness of DDG-Net, establishing new state-of-the-art results on both datasets.
Here we list our used requirements and dependencies.
We use the 2048-d features provided by MM 2021 paper: Cross-modal Consensus Network for Weakly Supervised Temporal Action Localization. You can get access of the dataset from here. The annotations are included within this package.
We also use the features provided in MM2021-CO2-Net. The features can be obtained from here. The annotations are included within this package.
Download the pretrained models from Google Drive, and put them into "./download_ckpt/".
Change "path/to/thumos" in the script into your own path to the dataset, and run:
bash ./test_thumos.sh
Change "path/to/activity" in the script into your own path to the dataset, and run:
bash ./test_activitynet.sh
Change "path/to/thumos" into your own path to the dataset, and run:
bash ./train_thumos.sh
Change "path/to/activity" into your own path to the dataset, and run:
bash ./train_activity.sh
If you find the code useful in your research, please cite:
@InProceedings{Tang_2023_ICCV,
author = {Tang, Xiaojun and Fan, Junsong and Luo, Chuanchen and Zhang, Zhaoxiang and Zhang, Man and Yang, Zongyuan},
title = {DDG-Net: Discriminability-Driven Graph Network for Weakly-supervised Temporal Action Localization},
booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
month = {October},
year = {2023},
pages = {6622-6632}
}
See MIT License
This repo contains modified codes from:
This repo uses the features and annotations from: