Tobias-Fischer / rt_gene

RT-GENE: Real-Time Eye Gaze and Blink Estimation in Natural Environments
http://www.imperial.ac.uk/personal-robotics
Other
361 stars 67 forks source link

Calibrating output gaze to screen coordinates #98

Closed ShreshthSaxena closed 3 years ago

ShreshthSaxena commented 3 years ago

Hi, I'm trying to map the gaze vector saved by estimate_gaze_standalone.py to Image/Screen coordinates. Is this something already available? if not can you help with the steps that should help achieve this ?

Tobias-Fischer commented 3 years ago

Hi,

No, this is not implemented in the code. You need to get the reference between the camera and the screen to do this, or use some kind of calibration. @ahmed-alhindawi played with this before - do you have any tips? There are some other great repositories out there that might be better suited for this use case.

ShreshthSaxena commented 3 years ago

thanks for response. Can you point me to some of these repositories ?

ahmed-alhindawi commented 3 years ago

The way Tobii/Pupil Labs perform the conversion is through an online optimisation process of predefined non-linear functions at the calibration stage. A good starting point would be something like:

screen_x = ax*theta + bx*theta^2 + cx screen_y = ay*phi + by*phi^2 + cy

The calibration procedure would then give you the screen_x and screen_y given the eye angles (phi and theta). Finding the appropriate coefficients is then fairly straightforward (e.g. with scipy.optim) but I'm afraid I don't know a specific repository.

I hope that helps.

1cookspe commented 11 months ago

@ahmed-alhindawi Thank you for the information! In this case, would phi and theta be the angles in the normalized space, or in the camera space? My concern is that keeping the angles in the normalized space and then transferring them to the screen space may remove information regarding the head pose of the user in the camera space. Any insight you have is greatly appreciated!

ahmed-alhindawi commented 11 months ago

Sorry for the delay.

Having a completely unconstrained head-pose relative to a screen is an area of active research. You can try an approach like this (https://ieeexplore.ieee.org/abstract/document/9746437, slight bias in that this is my paper).

However, if you are happy constraining the head pose, then a calibration technique like I described above would work well. To answer your question directly, if the head pose is constrained (on a chin rest, or there is no parallax between the head pose and the scene - scene camera mounted on a forehead for example), then phi and theta would be in the person's frame of reference.