Proposal-based Multiple Instance Learning for Weakly-supervised Temporal Action Localization (CVPR 2023)

Huan Ren, Wenfei Yang, Tianzhu Zhang, Yongdong Zhang (USTC)

Requirements

Python 3.8
Pytorch 1.8.0
CUDA 11.1

Required packages are listed in requirements.txt. You can install by running:

conda create -n P-MIL python=3.8
conda activate P-MIL
conda install pytorch==1.8.0 torchvision==0.9.0 torchaudio==0.8.0 cudatoolkit=11.1 -c pytorch -c conda-forge
pip3 install -r requirements.txt

Data Preparation

Prepare THUMOS14 dataset.
- We recommend using features and annotations provided by W-TALC or CO2-Net.
- You can also get access of it from Google Drive.
Prepare proposals generated from pre-trained S-MIL model.
- We recommend using their official codes (such as CO2-Net) to generate proposals.
- You can just download the proposals used in our paper from Google Drive.
Place the features and annotations inside a data/Thumos14reduced/ folder and proposals inside a proposals folder. Make sure the data structure is as below.

    ├── data
        └── Thumos14reduced
            ├── Thumos14reduced-I3D-JOINTFeatures.npy
            └── Thumos14reduced-Annotations
                ├── Ambiguous_test.txt
                ├── classlist.npy
                ├── duration.npy
                ├── extracted_fps.npy
                ├── labels_all.npy
                ├── labels.npy
                ├── original_fps.npy
                ├── segments.npy
                ├── subset.npy
                └── videoname.npy
    ├── proposals
        ├── detection_result_base_test.json
        ├── detection_result_base_train.json

Running

Training

CUDA_VISIBLE_DEVICES=0 python main.py --run_type train

Testing

The pre-trained model can be downloaded from Google Drive, which is then placed inside a checkpoints folder.

CUDA_VISIBLE_DEVICES=0 python main.py --run_type test --pretrained_ckpt checkpoints/best_model.pkl

Results

The experimental results on THUMOS14 are as below. Note that the performance of checkpoints we provided is slightly different from the orignal paper!

Method \ mAP@IoU (%)	0.1	0.2	0.3	0.4	0.5	0.6	0.7	AVG
P-MIL	70.8	66.5	57.8	48.6	39.8	27.0	14.3	46.4

Citation

@InProceedings{Ren_2023_CVPR,
    author    = {Ren, Huan and Yang, Wenfei and Zhang, Tianzhu and Zhang, Yongdong},
    title     = {Proposal-Based Multiple Instance Learning for Weakly-Supervised Temporal Action Localization},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month     = {June},
    year      = {2023},
    pages     = {2394-2404}
}

Acknowledgement

We referenced the repos below for the code.

RenHuan1999 / CVPR2023_P-MIL

readme