cypw / PyTorch-MFNet

MIT License
252 stars 56 forks source link

Multi-Fiber Networks for Video Recognition

This repository contains the code and trained models of:

Yunpeng Chen, Yannis Kalantidis, Jianshu Li, Shuicheng Yan, Jiashi Feng. "Multi-Fiber Networks for Video Recognition" (PDF).

Implementation

We use MXNet \@92053bd for image classification and PyTorch 0.4.0a0\@a83c240 for video classification.

Normalization

The inputs are substrated by mean RGB = [ 124, 117, 104 ], and then multiplied by 0.0167.

Usage

Train motion from scratch:

python train_kinetics.py

Fine-tune with pre-trained model:

python train_ucf101.py

or

python train_hmdb51.py

Evaluate the trained model:

cd test
# the default setting is to test trained model on ucf-101 (split1)
python evaluate_video.py

Results

Image Recognition (ImageNet-1k)

Single Model, Single Crop Validation Accuracy:

Model Params FLOPs Top-1 Top-5 MXNet Model
ResNet-18 (reproduced) 11.7 M 1.8 G 71.4 % 90.2 % GoogleDrive
ResNet-18 (MF embedded) 9.6 M 1.6 G 74.3 % 92.1 % GoogleDrive
MF-Net (N=16) 5.8 M 861 M 74.6 % 92.0 % GoogleDrive

Video Recognition (UCF-101, HMDB51, Kinetics)

Model Params Target Dataset Top-1
MF-Net (3D) 8.0 M Kinetics 72.8 %
MF-Net (3D) 8.0 M UCF-101 96.0 %*
MF-Net (3D) 8.0 M HMDB51 74.6 %*

* accuracy averaged on slip1, slip2, and slip3.

Trained Models

Model Target Dataset PyTorch Model
MF-Net (2D) ImageNet-1k GoogleDrive
MF-Net (3D) Kinetics GoogleDrive
MF-Net (3D) UCF-101 (split1) GoogleDrive
MF-Net (3D) HMDB51 (split1) GoogleDrive

Other Resources

ImageNet-1k Trainig/Validation List:

ImageNet-1k category name mapping table:

Kinetics Dataset:

UCF-101 Dataset:

HMDB51 Dataset:

FAQ

Do I need to convert the raw videos to specific format?

How can I make the training faster?

Citation

If you use our code/model in your work or find it is helpful, please cite the paper:

@inproceedings{chen2018multifiber,
  title={Multi-Fiber networks for Video Recognition},
  author={Chen, Yunpeng and Kalantidis, Yannis and Li, Jianshu and Yan, Shuicheng and Feng, Jiashi},
  booktitle={European Conference on Computer Vision (ECCV)},
  year={2018}
}