TadasBaltrusaitis / OpenFace

OpenFace – a state-of-the art tool intended for facial landmark detection, head pose estimation, facial action unit recognition, and eye-gaze estimation.
Other
6.95k stars 1.85k forks source link

Output Quality (Gaze Direction Underestimation, Default Face Measures) #969

Open LinaJunc opened 3 years ago

LinaJunc commented 3 years ago

Hello Tadas,

We are using OpenFace to analyze the gaze directions of persons recorded with a standard video recorder. During the analysis, two questions regarding the quality of the output came up.

First, when looking at the representation of the gaze vectors in the output video we noticed that the extreme gazes are nearly always underestimated. For example, a gaze to the far left of the camera is interpreted as a gaze slightly left to the camera. The rough directions seem to mostly be correct but the gaze angle is often too small. Which could be the reasons for this issue? Could the video's resolution or the number of frames per second have an effect?

Second, we were wondering which kind of face you used to calculate the default measures of an "average" face. Does the analysis work equally well for people of different sex, age and ethnicity?

Thank you in advance!

TadasBaltrusaitis commented 3 years ago

Hi Lina,

First of all I would just like to warn you that gaze is a really difficult thing to predict from unconstrained webcams and without per-user calibration. The accuracy further drops when the person is not facing the camera (due in part to the eye region getting smaller).

Due to operating in an uncallibrated manner (both in terms of camera and person) OpenFace is best at detecting changes in relative gaze rather than absolute gaze. If you had a way for a person to look at particular targets before data recording you might be able to mitigate that a bit.

I don't know the exact reason for under-estimation, especially without having seen the exact videos, but it could be a combination of factors, such as image resolution, quality of data, glasses, illumination etc. As I mentioned before, if you have a way to calibrate and work out the amount of underestimation you might be able to map from predicted values to actual gaze (which would likely be a mapping very specific to your recording setup)

Second, we were wondering which kind of face you used to calculate the default measures of an "average" face. Does the analysis work equally well for people of different sex, age and ethnicity?

The average face is computed from a video of a particular subject each time, so should not be affected too much by sec, age, and ethnicity, although I have not done any studies to verify that.

Thanks, Tadas

ashoorie commented 3 years ago

Thanks @TadasBaltrusaitis for this interesting work. On the same note about accuracy of eye gaze estimation, I am getting pretty good results for estimating eye gaze in x direction, but not reliable results in y direction. The figures below show the gaze angle estimation when looking at the four corners of the monitor (the orange line is the desired result and blue dots are what OpenFace predicts). The camera parameters are calibrated using OpenCV. Here, I only show the best results I got and still have some inaccuracy in y direction. image

Here is another example: image Again, here we have good accuracy in x direction, but very poor accuracy in y direction. I also plotted the same data points on xy plane (each point is the average of all data points for gaze_angle_x and gaze_angle_y while looking at a specific corner). As you could see, one of the corners is unrecognizable. image

I know gaze estimation using webcam only has limitations, but considering the good result in x direction made me think, there might be a solution to get better result in y direction? Or maybe I am missing something in using OpenFace properly?

Thank you!

TadasBaltrusaitis commented 3 years ago

Great analysis, you are indeed right that accuracy in X is higher than that in Y. This is quite typical of gaze estimation systems. The reason for this is that there is just fewer pixels to work with on the iris for y axis estimation and they tend to be occluded by the upper and lower eyelids, further the dynamic range of eye gaze in Y is lower in general and the errors become more apparent. There's no easy solution to this problem unfortunatelly.