RenHuan1999 / CVPR2023_P-MIL

The official implementation of 'Proposal-based Multiple Instance Learning for Weakly-supervised Temporal Action Localization' (CVPR 2023)
https://renhuan1999.github.io/P-MIL/
MIT License
37 stars 3 forks source link
deep-learning pytorch temporal-action-localization weakly-supervised-learning

Proposal-based Multiple Instance Learning for Weakly-supervised Temporal Action Localization (CVPR 2023)

Huan Ren, Wenfei Yang, Tianzhu Zhang, Yongdong Zhang (USTC)

arxiv CVPR2023 project

Requirements

Required packages are listed in requirements.txt. You can install by running:

conda create -n P-MIL python=3.8
conda activate P-MIL
conda install pytorch==1.8.0 torchvision==0.9.0 torchaudio==0.8.0 cudatoolkit=11.1 -c pytorch -c conda-forge
pip3 install -r requirements.txt

Data Preparation

  1. Prepare THUMOS14 dataset.

    • We recommend using features and annotations provided by W-TALC or CO2-Net.
    • You can also get access of it from Google Drive.
  2. Prepare proposals generated from pre-trained S-MIL model.

    • We recommend using their official codes (such as CO2-Net) to generate proposals.
    • You can just download the proposals used in our paper from Google Drive.
  3. Place the features and annotations inside a data/Thumos14reduced/ folder and proposals inside a proposals folder. Make sure the data structure is as below.

    ├── data
        └── Thumos14reduced
            ├── Thumos14reduced-I3D-JOINTFeatures.npy
            └── Thumos14reduced-Annotations
                ├── Ambiguous_test.txt
                ├── classlist.npy
                ├── duration.npy
                ├── extracted_fps.npy
                ├── labels_all.npy
                ├── labels.npy
                ├── original_fps.npy
                ├── segments.npy
                ├── subset.npy
                └── videoname.npy
    ├── proposals
        ├── detection_result_base_test.json
        ├── detection_result_base_train.json

Running

Training

CUDA_VISIBLE_DEVICES=0 python main.py --run_type train

Testing

The pre-trained model can be downloaded from Google Drive, which is then placed inside a checkpoints folder.

CUDA_VISIBLE_DEVICES=0 python main.py --run_type test --pretrained_ckpt checkpoints/best_model.pkl

Results

The experimental results on THUMOS14 are as below. Note that the performance of checkpoints we provided is slightly different from the orignal paper!

Method \ mAP@IoU (%) 0.1 0.2 0.3 0.4 0.5 0.6 0.7 AVG
P-MIL 70.8 66.5 57.8 48.6 39.8 27.0 14.3 46.4

Citation

@InProceedings{Ren_2023_CVPR,
    author    = {Ren, Huan and Yang, Wenfei and Zhang, Tianzhu and Zhang, Yongdong},
    title     = {Proposal-Based Multiple Instance Learning for Weakly-Supervised Temporal Action Localization},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month     = {June},
    year      = {2023},
    pages     = {2394-2404}
}

Acknowledgement

We referenced the repos below for the code.