TadasBaltrusaitis / OpenFace

OpenFace – a state-of-the art tool intended for facial landmark detection, head pose estimation, facial action unit recognition, and eye-gaze estimation.
Other
6.86k stars 1.84k forks source link

Inaccurate pupil (and gaze) estimations #958

Open AndyIMac opened 3 years ago

AndyIMac commented 3 years ago

Describe the bug Pupil detection is not aligned with where the pupil is, meaning the gaze estimations can be wildly off.

Hi Tadas, please see the attached screenshot for an illustration of the issue. I was trying to make clear to myself how gaze was calculated. I made 4 separate recordings of similar movements. In the top two recordings, I translated the camera from side to side (first to my right, then my left) - equivalent to moving my head in the opposite direction. In the bottom two, I rotated my head (yaw) from side to side (first turning to my right then left). In the left images, I was looking straight forwards with respect to my head. In the right images, I was focused on the camera. The plots in the middle are of the x-gaze angles, with arrows showing when the screenshots are from.

I had expected that these plots would form two pairs. If the gaze angle was wrt the camera, the right plots would be flat at 0. If the gaze was wrt the world, top left and bottom right would be 0. However, all these plots look the same. Upon closer inspection of the videos, it's clear that the eye-gaze lasers are tracking with the movement of my head because the circles tracking pupil position are not in the correct place. This can most clearly be seen with the bottom right image.

Any feedback about this would be much appreciated

Image of problem Gaze data and images

paulSerre commented 3 years ago

If the gaze angle was wrt the camera, the right plots would be flat at 0

I don't know why are you saying that since you're moving your head. I mean, if you look at the same point by moving your head, angles cannot be the same?

A scheme to be clearer :

image

What you said would be true if gaze was tracked wrt head pose which is not the case

AndyIMac commented 3 years ago

Thanks for the reply paulSerre. That's essentially what I was trying to figure out.

If the gaze angle was wrt the camera, the right plots would be flat at 0. If the gaze was wrt the world, top left and bottom right would be 0.

I can think of 3 ways the reference frame for gaze angle could be defined: 1) fixed to the head (i.e. gaze=0 means in the direction of the nose), 2) fixed to camera (i.e. gaze=0 means eyes pointed towards camera), 3) actually aligned in the world (i.e. gaze=0 means looking in a vector parallel to the middle of the camera).

From the documentation, I believed scenario 3 was the most likely. In this case, both top left and bottom right above would have a gaze of 0. On the other hand, if it was scenario 2, then both of the right images would be 0. If scenario 1 (which I though was least likely), then the left images would be 0.

The issue is that none of these plots shows a flat line. If you look particularly at the images in the bottom right, you can clearly see that the pupils are not being correctly identified. Regardless of how the angles are defined, it seems clear from those images that there's something weird going on. For some reason, the estimates of pupil locations were being dragged along with the head rotations. This means that the gaze angles aren't correct here no matter which reference frame they are using.

paulSerre commented 3 years ago

I just realized that this is the x-axis, so indeed the angle should be zero. Did you try it by pausing at each pose? By moving continuously, I wonder if there is a little offset. Indeed, for a video, OpenFace relies on the previous facial landmarks (otherwise the fps would be terribly bad), and so most of the time those landmarks take some time to stabilize.

AndyIMac commented 3 years ago

We're actually having similar problems with longer videos where only the eyes move, so the pausing probably won't affect things much. I should have said that this has been processed on the offline version. It's my understanding that the offline version works on a frame by frame basis, treating each as if it were an independent still image. We're constricted to using Windows computers, but we didn't see much difference between the offline output and the online output using colabs. (We're also not allowed to use colabs for processing our real participant data, hence all the testing!)

jsf2167 commented 2 years ago

Also wondering this and getting similar results from a camera placed very close to the face.