XiaojunTang22 / ICCV2023-DDGNet

DDG-Net: Discriminability-Driven Graph Network for Weakly-supervised Temporal Action Localization
MIT License
16 stars 2 forks source link
iccv2023 weakly-supervised-temporal-action-localization

DDG-Net: Discriminability-Driven Graph Network for Weakly-supervised Temporal Action Localization

Xiaojun Tang, Junsong Fan, Chuanchen Luo, Zhaoxiang Zhang, Man Zhang, and Zongyuan Yang

arXiv ICCV2023

Table of Contents

  1. Introduction
  2. Preparation
  3. Testing
  4. Training
  5. Citation

Introduction

Weakly-supervised temporal action localization (WTAL) is a practical yet challenging task. Due to large-scale datasets, most existing methods use a network pretrained in other datasets to extract features, which is not suitable enough for WTAL. To address this problem, researchers design several enhance-feature modules, especially modeling the spatiotemporal relationship between snippets, and contribute to the performance of the localization module. However, all of them neglect the adverse effect of ambiguous snippets, which would reduce the discriminability of others. Considering this problem, we propose a graph network to explicitly model different snippets. Specially, we define pseudo-action snippets, pseudo-background snippets, and ambiguous snippets through simple judgments according to action weights. Based on them, we propose Discriminability-Driven Graph Network (DDG-Net) to spread complementary information between discriminative snippets and enhance the discriminability of ambiguous snippets through one-way acception. Additionally, we propose feature consistency loss to fully explore the ability of the graph convolution model and prevent the assimilation of features. Extensive experiments on THUMOS14 and ActivityNet1.2 benchmarks demonstrate the effectiveness of DDG-Net, establishing new state-of-the-art results on both datasets.

avatar

Preparation

Requirements and Dependencies:

Here we list our used requirements and dependencies.

THUMOS14 Dataset:

We use the 2048-d features provided by MM 2021 paper: Cross-modal Consensus Network for Weakly Supervised Temporal Action Localization. You can get access of the dataset from here. The annotations are included within this package.

ActivityNet-v1.2 Dataset:

We also use the features provided in MM2021-CO2-Net. The features can be obtained from here. The annotations are included within this package.

Testing

Download the pretrained models from Google Drive, and put them into "./download_ckpt/".

Test on THUMOS-14

Change "path/to/thumos" in the script into your own path to the dataset, and run:

bash ./test_thumos.sh

Test on ActivityNet-v1.2

Change "path/to/activity" in the script into your own path to the dataset, and run:

bash ./test_activitynet.sh

Training

Change "path/to/thumos" into your own path to the dataset, and run:

bash ./train_thumos.sh

Change "path/to/activity" into your own path to the dataset, and run:

bash ./train_activity.sh

Citation

If you find the code useful in your research, please cite:

@InProceedings{Tang_2023_ICCV,
    author    = {Tang, Xiaojun and Fan, Junsong and Luo, Chuanchen and Zhang, Zhaoxiang and Zhang, Man and Yang, Zongyuan},
    title     = {DDG-Net: Discriminability-Driven Graph Network for Weakly-supervised Temporal Action Localization},
    booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
    month     = {October},
    year      = {2023},
    pages     = {6622-6632}
}

License

See MIT License

Acknowledgement

This repo contains modified codes from:

This repo uses the features and annotations from: