Yulypso / Emotion-Recognition

The goal of this project is to recognize human emotions through learning algorithm such as Support Vector Machines using faces images.
1 stars 1 forks source link
emotion-recognition knn-classification machine-learning python support-vector-machines

Emotion-Recognition

Author

Linkedin: Thierry Khamphousone

Getting Started

Setup

> git clone https://github.com/Yulypso/Emotion-Recognition.git
> cd Emotion-Recognition
> python3 -m venv .venv

# for MacOs/Linux
> source .venv/bin/activate

#for Windows
> py -3 -m venv .venv
> .venv\scripts\activate

# to install requirements 
> pip3 install -r requirements.txt

Check Dependency Graph


Note: In Visual Studio Code, don't forget to select the correct Python interpreter.

[CMD + SHIFT + P] > select Interpreter > Python 3.9.0 64-bits ('.venv') [./.venv/bin/python]


Trainset and testset images

  1. Download trainset and testset images: here
  2. Move download images to their respective folders
    • Dataset/trainset/[downloaded images]
    • Dataset/testset/[downloaded images]


Run the code

> cd project/Emotion-Recognition

# feature extractions
> python3 feature_extraction.py

# model training
> python3 training.py

# class prediction
> python3 eval.py

Stop the code

#don't forget to deactivate the virtual environement (.venv)
> deactivate


Table of Contents


Introduction

The project carried out relates to the recognition of emotions by computer which is a machine learning or machine learning theme that could be useful, particularly in the commercial field where the analysis of a customer's emotions would allow the improvement of the services offered. by a company. Within education, the recognition of emotions makes it possible to recognize pupils and students within a class who have not understood the concept taught by the teacher and to refer them subsequently to additional help.

Recognition of emotions can be achieved through facial and vocal expression, as well as through body language. The work carried out focuses only on the analysis of facial expressions where several images of different people express an emotion. These images are grouped together in a database.

An emotion is defined as a "sudden turmoil, transient agitation caused by a keen feeling of fear, surprise, joy, etc." " (Larousse) For the project, these feelings are classified into seven categories, namely "joy", "anger", "disgust", "sadness", "fear", "surprise" and "emotion. neutral ”.

The analysis of images or video streams generally follow a pipeline in a machine learning approach:

  1. Image collecting
  2. Face detection and feature point placement
  3. Feature extractions
  4. Feature preprocessing (image processing)
  5. Training the model
  6. Classifications and predictions


Data analysis

The first step of the project is to analyze the data with images of faces as well as points of interest contained in a CSV file. So I first decided to develop a display of the images with their characteristic points. (Figure 1)


Figure 1 - Images of faces representing an emotion with the points of interest represented by white dots. (a) joy, (b) neutral emotion


I noticed that some images were not suitable for machine learning because the feature points did not match the face or because a strand of hair could interfere with the preprocessing of the extracted images / features. (Figure 2)


Figure 2 - Face images not suitable for machine learning


Feature descriptions

After filtering out the images that were not fit for analysis, I added them to a list of excluded images within my program so that they would not be taken into account when extracting features.

The filtered database has 705 images, or a sample size of 705.

In order to retrieve the characteristics of the emotions on the faces, we can ask ourselves the question "Where are the most prominent characteristic features of an emotion on a face?".

I defined 11 areas of the face that seemed to me to be the most prominent, with the most variations depending on the emotions. (Figure 3)


Figure 3 - Representation of the 11 areas of interest chosen for feature extraction

The classes carry the following labels:

Due to lack of data, we will not work on the emotion (2)


Some explanations :

⇒ The areas around the eyes, the shape of the mouth as well as the areas around the noses allow to recognize joy.

⇒ The area between the eyebrows can detect anger.

⇒ The whites of the eyes, The shape of the nose and that of the mouth make it possible to detect surprise.

⇒ The shape of the eyebrows, the mouth, the eyes as well as the area between the eyebrows make it possible to detect sadness.

Each of the 11 areas will be cropped, preprocessed, resized and extracted as "features" for machine learning. (Figure 4)


Figure 4 - image resizing size chart


Each feature is an image that we will "flatten" and obtain a table at one dimension.

The feature size is 26508 (flattened) for an individual in the sample.


Extraction method

The position of the characteristic points generated automatically, depend on the position of the face in the photo as well as on its shape.

On a neutral face centered in the image, the characteristic points look like Figure 5.


Figure 5 - Characteristic points (landmarks) of the face


Figure 6 - Table of coordinates and sizes of the areas of the face to be extracted


The coordinates of the starting point, the length and the height of the cropped images are grouped in the table. (Figure 6)

Since coordinates and distances depend on feature points and not on fixed pixel values in the image, we can easily retrieve facial components regardless of their position in the image. (Figure 7)


Figure 7 - Facial components, (a) left eyebrow, (b) between eyebrows, (c) right eyebrow, (d) left eye side, (e) left eye, (f) right eye, (g) right eye side, (h ) nose left side, (i) nose, (j) nose right side, (k) mouth


Preprocessing - Segmentation

Each of the extracted images was transformed at the segmentation stage.

Processing steps:

Left eyebrow and right eyebrow:

Between the eyebrows:

Left eye and Right eye:

Left eye side and Right eye side:

Nose:

Nose on the left side and Nose on the right side:

Mouth:

The feature extraction was performed for the training images with the trainset.csv worksheet as well as for the test images with the testset.csv worksheet.

The data corresponding to the features have been saved respectively in the features_train.csv and features_test.csv files


Presentation of the chosen model

K-Nearest-Neighbors

In order to choose a model, I tried several approaches, including building a model with the K-Nearest Neighbors algorithm first.


Figure 8 - Graph representing the recognition rate as a function of the number of neighbors k


We can see that with the Knn method, we obtain a maximum recognition rate of 0.79, or 79% for a number of neighbors equal to 7.

By trying to predict data with the trained model, we obtain an accuracy of 81% and the associated confusion matrix is shown in figure 9. (Training base 70% of the data, Test base 30% of the data)


Figure 9 - Confusion matrix for the Knn algorithm "Neutral emotion" (0), "anger" (1), "disgust" (3), "Fear" (4),, "joy" (5), "sadness" (6), "surprise" (7)


Support Vector Machine

The Support Vector Machine is the classification algorithm I have chosen to train my emotional recognition model.

The principle of SVM is to seek to separate data by drawing a decision boundary such that the distance between the different classes is maximum. We will seek the greatest margin. (Figure 10)

This assumes that the data is linearly separable, which is rarely the case. This is why I chose to use the linear kernel in order to be able to project the features in a vector space of a larger dimension and thus to be able to make this data linearly separable.

On the other hand, the fact of drawing a decision boundary with the greatest margin between the classes will make it possible to generalize our model and make it better when making predictions.


Figure 10 - Decision boundaries for the Support Vector Machine algorithm (a), (b), (c), possible decision boundaries (d) SVM decision boundary such that the margin between classes is maximum


In the figure I produced, the black line corresponds to the decision border drawn by the SVM algorithm and allows us to generalize the model unlike the lines (b) and (c) which are borders "very close" to the data.

By trying to predict data with the trained SVM model, we obtain an accuracy of 96.6% and the associated confusion matrix is shown in figure 11. (Training base 70% of the data, Test base 30% of the data)


Figure 11 - Confusion matrix for the SVM algorithm "Neutral emotion" (0), "anger" (1), "disgust" (3), "Fear" (4),, "joy" (5), "sadness" (6), "surprise" (7)


I estimate 96.6% accuracy and therefore having 7 misclassified images out of 222 images seems fine to me.

We can read in our confusion matrix that our model was able to classify perfectly all the emotions except 7 images which were classified as false positives for the class "neutral emotion".

This could be explained by the fact that people do not express their emotions in the same way on the face.


Results obtained

After training the chosen model with the Support Vector Machines algorithm, we can try to lassify the test images which therefore do not have a label. These images are unknown to the model.

The test base consists of 126 images representing faces of different people with an emotion. We had previously extracted the features of these test images (features_test.csv) at the same time as for the training images (features_train.csv).

By performing the prediction of the classes for each of the 126 images, we obtain the results in the file predictions.csv comprising one column and 126 rows, i.e. one row per image.


Figure 12 - Class predictions for the 126 images of the test base (predictions.csv)


Figure 13 - Images of the test base without labels (a) image 121, (b) image 122, (c) image 124, (d) image 126


Conclusion

The recognition of emotion through images has been achieved by extracting specific areas of the face where we can observe variations according to the expression of the emotion. I was able to obtain an accuracy of 96.6% by the model trained with the "Support Vector Machine" algorithm which I think is a good score.


Bibliography

Andrew Ng: Machine Learning by Stanford University https://www.coursera.org/learn/machine-learning/home/welcome

Scikit learn: sklearn.svm.SVC https://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html

OpenCV: Image processing https://opencv-python-tutroals.readthedocs.io/en/latest/py_tutorials/py_imgproc/py_table_of_contents_imgproc/py_table_of_contents_imgproc.html

*Zdzisław Kowalczuk and Piotr Chudziak**: Identification of Emotions Based on Human Facial Expressions Using a Color-Space Approach

Khadija Lekdioui: Recognition of emotional states by visual analysis of the face and machine learning