glefundes / mobile-face-gaze

Lightweight gaze estimation with PyTorch.
GNU General Public License v2.0
94 stars 19 forks source link
deep-learning gaze gaze-estimation pytorch

Mobile FaceGaze

GitHub forks GitHub stars

Pytorch implementarion of a modified version of the MPIIFaceGaze architecture presented in the paper It's Written All Over Your Face: Full-Face Appearance-Based Gaze Estimation for CNN-based gaze estimation in the wild.

Table of Contents

  1. Introduction.
  2. Implementation Details.
    1. Quick Start.
  3. Installation.
  4. Usage.
    1. Running the Demo.
    2. Training from scratch.
  5. References.
  6. TODO.

Introduction:

This is a lightweight version of the MPIIFaceGaze CNN architecture for gaze estimation in the wild. The original code and weights were made available for the Caffe framework, so I decided to reimplement it in PyTorch. As a bonus, I made changes so that the model would be smaller without suffering from too much loss of performance.

Quick Start:

Check out the Step-by-Step tutorial notebook for a clear view of each step in the processing pipeline! :)

Implementation Details:

In reimplementing the architecture as proposed in the original paper I got models with a whopping 700MB+! This was just not feasible for me, so I set out to modify the architecture in hopes that I could make it lightweight enough without huge impacts on performance. Below is a table of the changes made in order to reduce the model to the final 17.7MB achieved.

Backbone Size of feat. vector Input resolution # of Params.
Original AlexNet 4096 448x448 196.60M
This Repo MobileNetV2* 512 112x112 4.39M

*I also changed MobileNetV2's final convolution layer's output to 256.

Quantitative evaluation by LOOCV on the MPIIFaceGaze dataset revealed no more than 1 degree increase in Mean Average Error when compared to my reimplementation of the original. Qualitative evaluation on images in the wild and webcam tests show results to be satisfactory.

Installation:

Code was tested on Python 3.5, but should work with later releases.

Clone the repository, then:

$ pip install -r requirements.txt

All project dependencies should be taken care of for you. For GPU support please refer to PyTorch's installation guide

Usage:

Running the demo:

There is a packaged demo code that uses images from the webcam to detect faces and infer gaze angles. Please note that the code is tested to run on CPU as well as GPU devices. If you are trying to run the code on GPU without success, please double check you environment for all correct CUDNN, CUDA and PyTorch packages.

The demo can be run out-of-the-box with:

$ python3 cam_demo.py

Faces are detected by a modified version of TropComplique's implementation of MTCNN(Added GPU support and repackaged the detection function into the FaceDetector class).

Afterwards, facial regions are normalized for scale and rotation using the eye keypoints as reference for the slope. The normalization function was adapted from the imutils package.

The normalized faces are then passed through the gaze estimation network. The inferred gaze angles are drawn on screen with opencv helper functions.

twins

On higher resolutions the FPS can drop quite a bit. The MTCNN is the obvious bottleneck here. If you wish to replace it with some other more powerful face detector, it really shouldn't be a problem. All you need to do is make sure the face normalization function is adapted to work with the new method's output format.

Training from scratch:

Weights pre-trained on a GTX Titan X are located in the models/weights folder. If for some reason you with to train the network yourself, here's what you need to do:

References:

@inproceedings{zhang2017s,
  title={It's written all over your face: Full-face appearance-based gaze estimation},
  author={Zhang, Xucong and Sugano, Yusuke and Fritz, Mario and Bulling, Andreas},
  booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops},
  pages={51--60},
  year={2017}
}

TODO: