deepinx/enhanced-ssh-mxnet

The MXNet Enhanced SSH (ESSH) for Face Detection and Alignment

The Single Stage Headless (SSH) face detector was introduced in ICCV 2017 paper. This repository includes code for training and evaluating the Enhance SSH (ESSH) face detector, which adds localization of five semantic facial landmarks to the original SSH method and also improves accuracy. You can use this ESSH method for face detection and 2D-5P face alignment.

Pre-trained models can be downloaded on BaiduCloud or GoogleDrive.

Evaluation on WIDER FACE:

Model	Easy-Set	Medium-Set	Hard-Set
Original Caffe SSH	0.93123	0.92106	0.84582
Insightface SSH Model	0.93489	0.92281	0.84525
ESSH-VGG16 Model	0.94228	0.93207	0.87105
ESSH-Resnet50 Model	0.96100	0.95122	0.89610

Note: More accurate pre-trained models will be released soon.

Environment

This repository has been tested under the following environment:

Python 2.7
Ubuntu 18.04
Mxnet-cu90 (==1.3.0)
Cython 0.29.6
MATLAB R2016b

Installation

Prepare the environment.
Clone the repository.
Type make to build necessary cxx libs.

Testing

Download the pre-trained model and place it in ./model/.
You can use python test.py to test the pre-trained models.

Training

First, you should train an original SSH model on the WIDER dataset.
- Download the WIDER face training images from BaiduCloud or GoogleDrive and the face annotations from the dataset website. These files should be decompressed into data/widerface directory.
- Download MXNet VGG16 ImageNet pretrained model from here and put it under model directory.
- Edit config.py and type python train.py or using the following command to train your SSH model.
```
python train.py --network ssh --prefix model/sshb --dataset widerface --gpu 0 --pretrained model/vgg16 --lr 0.004 --lr_step 30,40,50
```
Then, use the above SSH model as the pre-training model to train the final ESSH model on CelebA Dataset.
- Download the CelebA dataset from BaiduCloud or GoogleDrive and decompressed it into data/celeba directory.
- Download our re-annotated bounding box labels from BaiduCloud or GoogleDrive and replace Anno/list_bbox_celeba.txt with this file. Note that our bounding box annotations are more accurate than the original labels, so be sure to download and replace it.
- Edit config.py and type python train.py or using the following command to train the ESSH model.
```
python train.py --network essh --prefix model/e2e --dataset celeba --gpu 0 --pretrained model/sshb --lr 0.004 --lr_step 10,15
```

Evaluation

The evaluation is based on the official WIDER evaluation tool which requires MATLAB. You need to download the validation images and the annotations (if not downloaded for training) from the WIDER dataset website. To evaluate pre-trained models on validation set of the WIDER dataset, you can use python test_on_wider.py to obtain the performance in “easy”, “medium”, and “hard” subsets respectively. We give some examples below.

Evaluate SSH model on validation set of the WIDER dataset without an image pyramid.

python test_on_wider.py --dataset widerface --method_name SSH --prefix model/sshb --gpu 0 --output ./output --thresh 0.05

Evaluate ESSH model on validation set of the WIDER dataset with an image pyramid.

python test_on_wider.py --dataset widerface --method_name ESSH-Pyramid --prefix model/essh --gpu 0 --output ./output --pyramid --thresh 0.05

Results

Results of face detection and 2D-5P face alignment (inferenced from ESSH-Resnet50 model) are shown below.

License

MIT LICENSE

Reference

@inproceedings{Najibi2017SSH,
  title={SSH: Single Stage Headless Face Detector},
  author={Najibi, Mahyar and Samangouei, Pouya and Chellappa, Rama and Davis, Larry S.},
  booktitle={IEEE International Conference on Computer Vision},
  year={2017},
}

@inproceedings{yang2016wider,
  author = {Yang, Shuo and Luo, Ping and Loy, Chen Change and Tang, Xiaoou},
  booktitle = {IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
  title = {WIDER FACE: A Face Detection Benchmark},
  year = {2016}}

  @inproceedings{liu2015faceattributes,
  author = {Ziwei Liu and Ping Luo and Xiaogang Wang and Xiaoou Tang},
  title = {Deep Learning Face Attributes in the Wild},
  booktitle = {Proceedings of International Conference on Computer Vision (ICCV)},
  month = December,
  year = {2015} 
}

Acknowledgment

The code is adapted based on an intial fork from the SSH and the mxnet-SSH repository.