Foreground-Action Consistency Network for Weakly Supervised Temporal Action Localization
Linjiang Huang (CUHK), Liang Wang (CASIA), Hongsheng Li (CUHK)
We argue that existing methods for weakly-supervised temporal activity localization cannot guarantee the foreground-action consistency, that is, the foreground and actions are mutually inclusive. Therefore, we propose a novel method named Foreground-Action Consistency Network (FAC-Net) to address this issue. The experimental results on THUMOS14 are as below.
Method \ mAP(%) | @0.1 | @0.2 | @0.3 | @0.4 | @0.5 | @0.6 | @0.7 | AVG | ||
---|---|---|---|---|---|---|---|---|---|---|
UntrimmedNet | 44.4 | 37.7 | 28.2 | 21.1 | 13.7 | - | - | - | ||
STPN | 52.0 | 44.7 | 35.5 | 25.8 | 16.9 | 9.9 | 4.3 | 27.0 | ||
W-TALC | 55.2 | 49.6 | 40.1 | 31.1 | 22.8 | - | 7.6 | - | ||
AutoLoc | - | - | 35.8 | 29.0 | 21.2 | 13.4 | 5.8 | - | - | - |
CleanNet | - | - | 37.0 | 30.9 | 23.9 | 13.9 | 7.1 | - | ||
MAAN | 59.8 | 50.8 | 41.1 | 30.6 | 20.3 | 12.0 | 6.9 | 31.6 | ||
CMCS | 57.4 | 50.8 | 41.2 | 32.1 | 23.1 | 15.0 | 7.0 | 32.4 | ||
BM | 60.4 | 56.0 | 46.6 | 37.5 | 26.8 | 17.6 | 9.0 | 36.3 | ||
RPN | 62.3 | 57.0 | 48.2 | 37.2 | 27.9 | 16.7 | 8.1 | 36.8 | ||
DGAM | 60.0 | 54.2 | 46.8 | 38.2 | 28.8 | 19.8 | 11.4 | 37.0 | ||
TSCN | 63.4 | 57.6 | 47.8 | 37.7 | 28.7 | 19.4 | 10.2 | 37.8 | ||
EM-MIL | 59.1 | 52.7 | 45.5 | 36.8 | 30.5 | 22.7 | 16.4 | 37.7 | ||
BaS-Net | 58.2 | 52.3 | 44.6 | 36.0 | 27.0 | 18.6 | 10.4 | 35.3 | ||
A2CL-PT | 61.2 | 56.1 | 48.1 | 39.0 | 30.1 | 19.2 | 10.6 | 37.8 | ||
ACM-BANet | 64.6 | 57.7 | 48.9 | 40.9 | 32.3 | 21.9 | 13.5 | 39.9 | ||
HAM-Net | 65.4 | 59.0 | 50.3 | 41.1 | 31.0 | 20.7 | 11.1 | 39.8 | ||
UM | 67.5 | 61.2 | 52.3 | 43.4 | 33.7 | 22.9 | 12.1 | 41.9 | ||
FAC-Net (Ours) | 67.6 | 62.1 | 52.6 | 44.3 | 33.4 | 22.5 | 12.7 | 42.2 |
Prepare THUMOS'14 dataset.
Place the features and annotations inside a dataset/Thumos14reduced/
folder.
You can easily train the model by running the provided script.
Refer to train_options.py
. Modify the argument of dataset-root
to the path of your dataset
folder.
Run the command below.
$ python train_main.py --run-type 0 --model-id 1 # rgb stream
$ python train_main.py --run-type 1 --model-id 2 # flow stream
Make sure you use different model-id
for RGB and optical flow.
Models are saved in ./ckpt/dataset_name/model_id/
The trained model can be found here. Please change the file name to xxx.pkl (e.g., 100.pkl) and put it into ./ckpt/dataset_name/model_id/
. You can evaluate the model referring to the two stream evaluation process.
$ python train_main.py --pretrained --run-type 2 --model-id 1 --load-epoch 100 # rgb stream
$ python train_main.py --pretrained --run-type 3 --model-id 2 --load-epoch 100 # flow stream
load-epoch
refers to the epoch of the best model. The best model would not always occur at 100 epoch, please refer to the log in the same folder of saved models to set the load epoch of the best model.
Make sure you set the right model-id
that corresponds to the model-id
during training.
$ python test_main.py --rgb-model-id 1 --flow-model-id 2 --rgb-load-epoch 100 --flow-load-epoch 100
We referenced the repos below for the code.
If you find this code useful, please cite our paper.
@InProceedings{Huang_2021_ICCV,
author = {Huang, Linjiang and Wang, Liang and Li, Hongsheng},
title = {Foreground-Action Consistency Network for Weakly Supervised Temporal Action Localization},
booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
month = {October},
year = {2021},
pages = {8002-8011}
}
If you have any question or comment, please contact the first author of the paper - Linjiang Huang (ljhuang524@gmail.com).