XsLangley / TDGAN

This is the code repository of model TDGAN. Paper: Facial Expression Recognition with Two-branch Disentangled Generative Adversarial Network.
22 stars 3 forks source link
baum2-i casia-dataset ckplus disentangled-representations expression-recognition gan pytorch raf rafd tfeid

TDGAN: Two-branch Disentangled Generative Adversarial Network

This is the PyTorch implementation of model TDGAN for facial expression recognition (FER), which is an algorithm presented in the following paper:

Xie, Siyue, Haifeng Hu, and Yizhen Chen. "Facial Expression Recognition with Two-branch Disentangled Generative Adversarial Network." IEEE Transactions on Circuits and Systems for Video Technology (2020).

Abstract: Facial Expression Recognition (FER) is a challenging task in computer vision as features extracted from expressional images are usually entangled with other facial attributes, e.g., poses or appearance variations, which are adverse to FER. To achieve a better FER performance, we propose a model named Two-branch Disentangled Generative Adversarial Network (TDGAN) for discriminative expression representation learning. Different from previous methods, TDGAN learns to disentangle expressional information from other unrelated facial attributes. To this end, we build the framework with two independent branches, which are specific for facial and expressional information processing respectively. Correspondingly, two discriminators are introduced to conduct identity and expression classification. By adversarial learning, TDGAN is able to transfer an expression to a given face. It simultaneously learns a discriminative representation that is disentangled from other facial attributes for each expression image, which is more effective for FER task. In addition, a self-supervised mechanism is proposed to improve representation learning, which enhances the power of disentangling. Quantitative and qualitative results in both in-the-lab and in-the-wild datasets demonstrate that TDGAN is competitive to the state-of-the-art methods.

Note: The face image in the above pipeline diagram is provided by my friend Dr. Shiyuan Li. He once said he has longed for seeing his face being presented in some public publications so I took his figure as an illustrative example in this paper. (Of course, I obtained his permission before placing the image.) But he still complained that the image I chose was far from his satisfactory as he doesn't look very handsome in the image.

The paper is available at here.

Cite This work

@article{xie2020facial,
  title={Facial expression recognition with two-branch disentangled generative adversarial network},
  author={Xie, Siyue and Hu, Haifeng and Chen, Yizhen},
  journal={IEEE Transactions on Circuits and Systems for Video Technology},
  volume={31},
  number={6},
  pages={2359--2371},
  year={2020},
  publisher={IEEE}
}

Quick Start: one evaluation step with a pretrained model

  1. Download the model file from:
    • Google Drive: click here
    • Baidu Wangpan: click here, (auth code: uyii).
  2. Extract the pretrained model and place it under the directory: ./Dataset/examples
  3. Go for a quick demo through command: python main.py. This command will perform an evaluation step and generate an image under directory ./Dataset/examples, with a file name of generated_example.png.

Requirements

Model Description

The framework of the model: avatar

TDGAN learns image representations through expression transferring. The inputs are two images: one facial image with identity label and one expressional image with expression label. The goal of the generator is to transfer the expression from the expressional image to the face image. Therefore, TDGAN includes two separate branch for extracting corresponding image representations and then using a deconv-based decoder to generate the image. To make sure that the generated image fulfills our expectation, there are two discriminators for TDGAN to evaluate the generated image. One is a face discriminator, which conducts identity classification (so that TDGAN can know whether the face appearance in the generated image is the same as that of the input facial image). The other is an expression discriminator, which conducts expression classification (so that TDGAN can know whether the expression in the generated image is the same as that of the input expressional image). Note that the expected identity information can only be extracted from the input facial image, while the expected expression information can only be extracted from the input expressional image. Therefore, by adversarial training, the face branch will tend to extract only identity (facial appearance) related features while the expression branch will tend to extract only expression related features. In other word, the expression branch is induced to disentangle expressional features from other features. We can therefore take such expression-specific features for expression classification task.

Code Descriptions

Directory Tree

TDGAN   
│   LoadData.py  
│   main.py  
│   models.py    
│   README.md
│   trainer.py
│   util.py
└───Dataset
│   └───CASIA_WebFace
|   |   |   casia_data_examples.npz
│   |   └───img
│   |       └───0000045
│   |       └───0000099
│   └───examples
│   |   |   expression.jpg
│   |   |   face.jpg
│   └───RAF
|       |   RAF_examples.npz
│       └───img

Directory and Files Descriptions

Datasets

Experiments are conducted on three in-the-lab datasets (CK+, TFEID, RaFD) and two in-the-wild datasets (BAUM-2i, RAF-DB).

All the datasets are publicly accessible in the following links:

Preprocessing and Setup

In the preprocessing stage, faces in the input images (including both face and expression images) are first detected by the MTCNN and then resized to 128x128x1. Data augmentation (random cropping and horizontal flipping) is also adopted in the training stage. To fairly compared with other methods, in CK+, TFEID, RaFD and BAUM-2i, we conduct subject-independent 10-fold cross-validation to evaluate our model. In RAF-DB, TDGAN is trained and evaluated in the predefined training and testing sets.

Results on Expression Classification Task (main task)

Dataset Accuracy (%)
CK+ 97.53
TFEID 97.20
RaFD 99.32
BAUM-2i 65.76
RAF-DB 81.91

Visualization Results

The purpose of TDGAN is to recognize facial expressions. However, since we use GAN framework for disentangling, TDGAN can also generate some interesting images as by-product. By observing on the generated images, we can also evaluate whether different attributes are disentangled with each other in some extent. Here we conduct two kinds of visualization experiments, i.e., expression transferring (interpolation) and face interpolation.

But note that our main task is expression recognition, we don't care about the quality of the generated images. Also, the following images are cherry-picked, i.e., it is not guaranteed that the model can always successfully transfer a given expression to another face.

Expression Transferring/ Interpolation

Given a face image, we can modify the expression in the face based on what expression images are given. Also, we can fix the face image, and interpolate between two expression images to see how the expression in the generated face is gradually changed from one to another. This shows that TDGAN can disentangle expressional features from facial attributes as it can modify the expression but keep the facial appearance untouched.

Given a person's facial image, we change his expression from fear to anger: avatar

The animation to show the transition process:

avatar

Given a person's facial image, we change his expression from neutral to happiness: avatar

The animation to show the transition process:

avatar

Face Interpolation

Given two face images, we can interpolate between these two face so that we can observe the transition process from one face to another face. This also shows that TDGAN is able to disentangle facial appearance features from expression features as the facial appearance is changed but the expression is kept untouched.

Given an expression image (Happiness) and two person's image, we change the facial appearance from one to another, but keep the expression unchanged: avatar

The animation to show the transition process:

avatar

Given an expression image (Neutral) and two person's image, we change the facial appearance from one to another, but keep the expression unchanged: avatar

The animation to show the transition process:

avatar