google-ai-edge / mediapipe

Cross-platform, customizable ML solutions for live and streaming media.
https://ai.google.dev/edge/mediapipe
Apache License 2.0
27.62k stars 5.17k forks source link

How to get the 2D Gaze point on the screenwith mediapipe? #4215

Closed akatav closed 1 year ago

akatav commented 1 year ago

I'm very new to eye tracking and mediapipe. Thanks for this great piece of work. Could someone tell me how to get the 2D coordinates of the gaze point on the computer screen ? I'm not sure if this is supported OOTB or some other package like FaceAnalyzer must be used alongside.

ayushgdev commented 1 year ago

Hello @akatav This requirement might include MediaPipe Iris tracking as a part of the solution but not OOTB. You will need following components to create a gaze point estimation:

  1. The intersection between the normal of the eye R-roll and the screen surface gives the point of gaze. So you need to find the normal from the eye.
  2. Since both the eyes have a certain distance, if the subject focuses on the left, the left eye will have lower eye roll than the right eye; and vice versa. Thus, the normal of the plane of eye R-roll can vary with eye movement and you will need to account for the change in the roll.
  3. Distance between the eyes will need to be captured which can be done via MediaPipe since Iris tracking solution give landmark coordinates of the irises and output of depth(distance from camera)

Usually usage of a binocular camera is required to gauge the eye distance from the camera. However, since the common use case of such a gaze point estimation solution will be in the wild with mobile/desktop screens, you shall need to use monocular camera system. You might want to read research papers with respect to eye tracking with binocular systems and monocular systems. You can start this for tracking with mobile device (not an endorsed recommendation). Further, since you can use MediaPipe, we can eliminate the requirement of binocular systems and the distance from the camera can be estimated with less than 10% error rate.

akatav commented 1 year ago

Thank you @ayushgdev for the really helpful and informative answer.

okay, i will try it out as per the steps you have given. thanks again. if i do get it working, perhaps, i will submit a patch here for your review :)

thanks so much!

ayushgdev commented 1 year ago

Sure @akatav . If you are satisfied with the resolution, can we close the issue?

akatav commented 1 year ago

ok, @ayushgdev . I think WebGazer (brown univ) also does a good job using some form of ridge regression. although, another issue i am seeking answer to is whether calibration is always necessary. How to account for head movements and adjust the gaze points accordingly, is one other question I am interested in.

ayushgdev commented 1 year ago

@akatav As in the points mentioned in last comment,

  1. The intersection between the normal of the eye R-roll and the screen surface gives the point of gaze. So you need to find the normal from the eye.

Taking normal of iris automatically takes care of the head movements to a large extent. There can be two scenarios:

  1. Only iris are rolling to move the gaze, like reading an article. In such case, the normal will start to shift automatically. Here mediapipe can help to track Iris movements but the app you are developing needs to take care of calculating the normal to it.
  2. Only head is moving but the subject is still looking at the same word in the article(like looking sidewise). In such case, the iris has not moved at all so the normal will still remain the same and thus the gaze point.
  3. Head is moving and eyes move with it, always pointing to the front of the face. In such situation, at the extremes, the subject would not be looking at the screen at all. MediaPipe wont detect any iris and this can be taken care of by checking if there are any landmarks or not.
google-ml-butler[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you.

google-ml-butler[bot] commented 1 year ago

Are you satisfied with the resolution of your issue? Yes No