a514514772 / Real-Time-Facial-Expression-Recognition-with-DeepLearning

A real-time facial expression recognition system with webcam streaming and CNN
MIT License
257 stars 113 forks source link

detection not very precise (mostly sad) #4

Closed ewagner70 closed 6 years ago

ewagner70 commented 6 years ago

I'm using tensorflow backend and I've converted your weights via the folowing code:

import sys, os
sys.path.append("../")
from keras import backend as K
from keras.utils.conv_utils import convert_kernel
import tensorflow as tf
ops = []
import model.myVGG as vgg

model = vgg.VGG_16('my_model_weights_83.h5')
model.load_weights('my_model_weights_83.h5')
for layer in model.layers:
   if layer.__class__.__name__ in ['Convolution1D', 'Convolution2D', 'Convolution3D', 'AtrousConvolution2D']:
      original_w = K.get_value(layer.W)
      converted_w = convert_kernel(original_w)
      ops.append(tf.assign(layer.W, converted_w).op)
K.get_session().run(ops)
model.save_weights('my_model_weights_83_tf.h5')

It runs and it shows me the predictions, but they're mostly wrong (always sad, angry and very rarely happy, etc.).

then i downloaded fer2013 and ran fer20134atagen.py and model_training.py with tensorflow backend and the results improved somewhat.

Anything hints on how to improve training and the predictions? Or Is there maybe a better better trained_model to download?

a514514772 commented 6 years ago

Hi @ewagner70,

The prediction model is implemented in a basic way and the method is even out-of-date, so the result should be not that good.

To improve the prediction accuracy, here are my suggestions,

  1. Try deeper models In this project, I only take first few layers of VGG16 and train from scratch (risks of overfitting or not general enough). There are some modern nets such as DenseNet or ResNet. You can use them as the "feature extractor" (i.e., pretrained one) and only retrain the last one or two layers.

  2. Using more training data In this case, you can either do data augmentation or use more training data (more and different training dataset). It is helpful to improve generalizability of the prediction model.

  3. Making testing environments more similar to the training one The reason that the model performs not well on testing dataset might be the difference between training and testing environments. For example, due to the viewing angle, contrast, the area of screen occupied by face, or even races, the model might feel confused about the prediction criteria.

  4. Modern methods If my memory serves me well, this filed is already well-studied. You can find lots of excellent and insightful papers online.

Best Regards, Hui-Po Wang