The inference code for FrameExit paper presented in CVPR 2021 (oral paper).
Amir Ghodrati*,1, Babak Ehteshami Bejnordi*,1, Amirhossein Habibian1, "FrameExit: Conditional Early Exiting for Efficient Video Recognition", CVPR 2021 [arxiv].
* Equal contribution
1 Qualcomm AI Research (Qualcomm AI Research is an initiative of Qualcomm Technologies, Inc)
If you find our work useful for your research, please cite:
@inproceedings{ghodrati2021,
title={FrameExit: Conditional Early Exiting for Efficient Video Recognition},
author={Ghodrati, Amir and Bejnordi, Babak Ehteshami and Habibian, Amirhossein},
booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
year={2021}
}
This repository has been tested in Ubuntu 16.04 and Cuda 10. Clone this repository and follow these steps:
conda create -n frameexit python=3.6
conda activate frameexit
conda install pytorch=1.3.1 torchvision=0.4.2 pillow pyyaml
pip install fire==0.3.1 pytorch-ignite==0.3.0
Our pretrained models can be downloaded from here and should be placed inside the directory resources/checkpoints
:
activitynet_checkpoint_25gmac.pth
minikinetics_checkpoint_19.7gmac.pth
To get help:
python3.6 inference.py config/activitynet_inference_2d.yml --help
To get the results on ActivityNet1.3, run the following command:
python3.6 inference.py config/activitynet_inference_2d.yml --data.path_frame <path/to/activitynet/frames> --checkpoint.init <path/to/activitynet/model>
For Mini-Kinetics:
python3.6 inference.py config/minikinetics_inference_2d.yml --data.path_frame <path/to/minikinetics/frames> --checkpoint.init <path/to/minikinetics/model>
where <path/to/.../frames>
is the path to the extracted frames, and <path/to/.../model>
points to the model file.
After running the above commands, the results should look like:
Name | mAP | Top-1 Acc | GFLOPs |
---|---|---|---|
ActivityNet | 0.7612 | - | 24.7 |
Mini-Kinetics | - | 0.7331 | 19.7 |
Please note that the results could be slightly different as some videos in the validation set may no longer be available.
In our paper, in addition to Resnet-50, we report the results using the Efficientnet-B3 and X3D-S backbones (Left table). We also report results on the HVU dataset (Right table):
The accuracy vs. efficiency curves on ActivityNet is shown below:
To reproduce the curves for FrameExit (in black), we use the following data points:
Gflops_resnet50_FrameExit = [17.96, 19.89, 23.78, 26.56, 29.09, 41.2]
map_resnet50_FrameExit = [72.72, 73.90, 75.37, 76.05, 76.41, 77.30]
Gflops_efficientnet_FrameExit = [10.03, 10.98, 14.85, 18.00]
map_efficientnet_FrameExit = [78.57, 79.41, 80.85, 81.18]