Emotion-Recognition

Author

Getting Started

Setup

> git clone https://github.com/Yulypso/Emotion-Recognition.git
> cd Emotion-Recognition
> python3 -m venv .venv

# for MacOs/Linux
> source .venv/bin/activate

#for Windows
> py -3 -m venv .venv
> .venv\scripts\activate

# to install requirements 
> pip3 install -r requirements.txt

Check Dependency Graph

Note: In Visual Studio Code, don't forget to select the correct Python interpreter.

[CMD + SHIFT + P] > select Interpreter > Python 3.9.0 64-bits ('.venv') [./.venv/bin/python]

Trainset and testset images

Download trainset and testset images: here
Move download images to their respective folders
- Dataset/trainset/[downloaded images]
- Dataset/testset/[downloaded images]

Run the code

> cd project/Emotion-Recognition

# feature extractions
> python3 feature_extraction.py

# model training
> python3 training.py

# class prediction
> python3 eval.py

Stop the code

#don't forget to deactivate the virtual environement (.venv)
> deactivate

Introduction
Data analysis
Feature descriptions
Extraction method
Preprocessing and Segmentation
K-Nearest Neighbors
Support Vector Machine
Results obtained
Conclusion
Biblography

Introduction

The project carried out relates to the recognition of emotions by computer which is a machine learning or machine learning theme that could be useful, particularly in the commercial field where the analysis of a customer's emotions would allow the improvement of the services offered. by a company. Within education, the recognition of emotions makes it possible to recognize pupils and students within a class who have not understood the concept taught by the teacher and to refer them subsequently to additional help.

Recognition of emotions can be achieved through facial and vocal expression, as well as through body language. The work carried out focuses only on the analysis of facial expressions where several images of different people express an emotion. These images are grouped together in a database.

An emotion is defined as a "sudden turmoil, transient agitation caused by a keen feeling of fear, surprise, joy, etc." " (Larousse) For the project, these feelings are classified into seven categories, namely "joy", "anger", "disgust", "sadness", "fear", "surprise" and "emotion. neutral ”.

The analysis of images or video streams generally follow a pipeline in a machine learning approach:

Image collecting
Face detection and feature point placement
Feature extractions
Feature preprocessing (image processing)
Training the model
Classifications and predictions

Data analysis

The first step of the project is to analyze the data with images of faces as well as points of interest contained in a CSV file. So I first decided to develop a display of the images with their characteristic points. (Figure 1)

Figure 1 - Images of faces representing an emotion with the points of interest represented by white dots. (a) joy, (b) neutral emotion

I noticed that some images were not suitable for machine learning because the feature points did not match the face or because a strand of hair could interfere with the preprocessing of the extracted images / features. (Figure 2)

Figure 2 - Face images not suitable for machine learning

Feature descriptions

After filtering out the images that were not fit for analysis, I added them to a list of excluded images within my program so that they would not be taken into account when extracting features.

The filtered database has 705 images, or a sample size of 705.

In order to retrieve the characteristics of the emotions on the faces, we can ask ourselves the question "Where are the most prominent characteristic features of an emotion on a face?".

I defined 11 areas of the face that seemed to me to be the most prominent, with the most variations depending on the emotions. (Figure 3)

Figure 3 - Representation of the 11 areas of interest chosen for feature extraction

The classes carry the following labels:

"Joy" (5)
"anger" (1)
"disgust" (3)
"sadness" (6)
"fear" (4)
"surprise" (7)
"neutral emotion" (0)

Due to lack of data, we will not work on the emotion (2)

Some explanations :

⇒ The areas around the eyes, the shape of the mouth as well as the areas around the noses allow to recognize joy.

⇒ The area between the eyebrows can detect anger.

⇒ The whites of the eyes, The shape of the nose and that of the mouth make it possible to detect surprise.

⇒ The shape of the eyebrows, the mouth, the eyes as well as the area between the eyebrows make it possible to detect sadness.

Each of the 11 areas will be cropped, preprocessed, resized and extracted as "features" for machine learning. (Figure 4)

Figure 4 - image resizing size chart

Each feature is an image that we will "flatten" and obtain a table at one dimension.

The feature size is 26508 (flattened) for an individual in the sample.

Extraction method

The position of the characteristic points generated automatically, depend on the position of the face in the photo as well as on its shape.

On a neutral face centered in the image, the characteristic points look like Figure 5.

Figure 5 - Characteristic points (landmarks) of the face

Figure 6 - Table of coordinates and sizes of the areas of the face to be extracted

The coordinates of the starting point, the length and the height of the cropped images are grouped in the table. (Figure 6)

Since coordinates and distances depend on feature points and not on fixed pixel values in the image, we can easily retrieve facial components regardless of their position in the image. (Figure 7)

Figure 7 - Facial components, (a) left eyebrow, (b) between eyebrows, (c) right eyebrow, (d) left eye side, (e) left eye, (f) right eye, (g) right eye side, (h ) nose left side, (i) nose, (j) nose right side, (k) mouth

Preprocessing - Segmentation

Each of the extracted images was transformed at the segmentation stage.

Processing steps:

⇒ Left eyebrow and right eyebrow:

Median blur kernel 5x5
Convert to grayscale (1 color channel)
Negative
Morphological opening (dilation + erosion) kernel 5x5
Substraction (negative, morphological opening)
Otsu thresholding
3x3 kernel dilation
Median blur kernel 3x3
Resize

⇒ Between the eyebrows:

Convert to grayscale (1 color channel) o Gaussian blur kernel 7x7
Laplacian
Resize

⇒ Left eye and Right eye:

Negative
Convert to grayscale (1 color channel) o Adaptive Threshold
Resize

⇒ Left eye side and Right eye side:

Convert to grayscale (1 color channel) o Gaussian blur kernel 7x7
Laplacian
Median blur kernel 3x3
Negative
Resize

⇒ Nose:

CLAHE (Contrast Limited Adaptive Histogram Equalization) o Convert to grayscale (1 color channel)
3x3 kernel erosion
Median blur kernel 5x5
Otsu threshold
Negative
3x3 kernel dilation
Resize

⇒ Nose on the left side and Nose on the right side:

Convert to grayscale (1 color channel) o Gaussian blur kernel 7x7
Adaptive threshold
Opening kernel 3x3
Median blur kernel 5x5
Negative
Resize

⇒ Mouth:

Convert to grayscale (1 color channel)
Gaussian blur kernel 7x7
Adaptive threshold
Opening kernel 3x3
Median blur 5x5
Negative
Resize

The feature extraction was performed for the training images with the trainset.csv worksheet as well as for the test images with the testset.csv worksheet.

The data corresponding to the features have been saved respectively in the features_train.csv and features_test.csv files

Presentation of the chosen model

K-Nearest-Neighbors

In order to choose a model, I tried several approaches, including building a model with the K-Nearest Neighbors algorithm first.

Figure 8 - Graph representing the recognition rate as a function of the number of neighbors k

We can see that with the Knn method, we obtain a maximum recognition rate of 0.79, or 79% for a number of neighbors equal to 7.

By trying to predict data with the trained model, we obtain an accuracy of 81% and the associated confusion matrix is shown in figure 9. (Training base 70% of the data, Test base 30% of the data)

Figure 9 - Confusion matrix for the Knn algorithm "Neutral emotion" (0), "anger" (1), "disgust" (3), "Fear" (4),, "joy" (5), "sadness" (6), "surprise" (7)

Support Vector Machine

The Support Vector Machine is the classification algorithm I have chosen to train my emotional recognition model.

The principle of SVM is to seek to separate data by drawing a decision boundary such that the distance between the different classes is maximum. We will seek the greatest margin. (Figure 10)

This assumes that the data is linearly separable, which is rarely the case. This is why I chose to use the linear kernel in order to be able to project the features in a vector space of a larger dimension and thus to be able to make this data linearly separable.

On the other hand, the fact of drawing a decision boundary with the greatest margin between the classes will make it possible to generalize our model and make it better when making predictions.

Figure 10 - Decision boundaries for the Support Vector Machine algorithm (a), (b), (c), possible decision boundaries (d) SVM decision boundary such that the margin between classes is maximum

In the figure I produced, the black line corresponds to the decision border drawn by the SVM algorithm and allows us to generalize the model unlike the lines (b) and (c) which are borders "very close" to the data.

By trying to predict data with the trained SVM model, we obtain an accuracy of 96.6% and the associated confusion matrix is shown in figure 11. (Training base 70% of the data, Test base 30% of the data)

Figure 11 - Confusion matrix for the SVM algorithm "Neutral emotion" (0), "anger" (1), "disgust" (3), "Fear" (4),, "joy" (5), "sadness" (6), "surprise" (7)

I estimate 96.6% accuracy and therefore having 7 misclassified images out of 222 images seems fine to me.

We can read in our confusion matrix that our model was able to classify perfectly all the emotions except 7 images which were classified as false positives for the class "neutral emotion".

This could be explained by the fact that people do not express their emotions in the same way on the face.

Results obtained

After training the chosen model with the Support Vector Machines algorithm, we can try to lassify the test images which therefore do not have a label. These images are unknown to the model.

The test base consists of 126 images representing faces of different people with an emotion. We had previously extracted the features of these test images (features_test.csv) at the same time as for the training images (features_train.csv).

By performing the prediction of the classes for each of the 126 images, we obtain the results in the file predictions.csv comprising one column and 126 rows, i.e. one row per image.

Figure 12 - Class predictions for the 126 images of the test base (predictions.csv)

Figure 13 - Images of the test base without labels (a) image 121, (b) image 122, (c) image 124, (d) image 126

Conclusion

The recognition of emotion through images has been achieved by extracting specific areas of the face where we can observe variations according to the expression of the emotion. I was able to obtain an accuracy of 96.6% by the model trained with the "Support Vector Machine" algorithm which I think is a good score.

Bibliography

Andrew Ng: Machine Learning by Stanford University https://www.coursera.org/learn/machine-learning/home/welcome

Scikit learn: sklearn.svm.SVC https://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html

OpenCV: Image processing https://opencv-python-tutroals.readthedocs.io/en/latest/py_tutorials/py_imgproc/py_table_of_contents_imgproc/py_table_of_contents_imgproc.html

*Zdzisław Kowalczuk and Piotr Chudziak**: Identification of Emotions Based on Human Facial Expressions Using a Color-Space Approach

Khadija Lekdioui: Recognition of emotional states by visual analysis of the face and machine learning

Yulypso / Emotion-Recognition

readme