About eye and camera coordinate system in dataset

TeresaKumo commented 2 years ago

@erkil1452 Hi, thanks for your great job, but when I try to use your dataset, I have some problems.

In https://github.com/erkil1452/gaze360/tree/master/dataset, it said:

gaze_dir = M * (target_pos3d - person_eyes3d) where M depends on a normal direction between eyes and the camera Is M the conversion matrix M=SR just like Revisiting data normalization for appearance-based gaze estimation and Learning-by-synthesis for appearance-based 3d gaze estimation ?
If 1 is true, when I use a raw image in dataset, do I still need to do some data normalization?
How can I get the camera parameters in dataset so that I can compute M matrix
In your paper, I also found an illuminating part "Estimating attention in a supermarket", if it is possible, could you please tell me how to convert the gaze vector to the shelf

Thanks a lot!

erkil1452 commented 2 years ago

Hi Teresa,

1+2) We do not do any face normalization. You can use the images and gaze labels directly. The matrix M just rotates the coordinates so that regardless of where the subject was standing, the gaze will always be [0,0,-1] if they look into the camera. If we used the global Ladybug coordinate system directly, the direction towards camera would be constantly changing.

3) You do not need camera parameters to compute the M. M is only a function of the person_eyes3d. It is a rotation matrix (around X and Y axes) that puts the person_eyes3d on a positive z axis => M @ person_eyes3d = |person_eyes3d| * (0, 0, 1). The Y axis of the rotated coordinate system stays in ZX plane (that means there is no roll = z rotation happening).

4) Assuming you use the Ladybug coordinate system (or in your case probably just general camera coordinate system), you can first converg the gaze_dir back into the original coordinates using M^{-1} @ gaze_dir. Then you have a ray defined as person_eyes3d + t * (M^{-1} @ gaze_dir) and you can cast it into your scene to find an intersection. In our case we represent the shelve as a simple vertical flat plane orthogonal to the camera view.

TeresaKumo commented 2 years ago

Hi Teresa

1+2) We do not do any face normalization. You can use the images and gaze labels directly. The matrix M just rotates the coordinates so that regardless of where the subject was standing, the gaze will always be [0,0,-1] if they look into the camera. If we used the global Ladybug coordinate system directly, the direction towards camera would be constantly changing.

You do not need camera parameters to compute the M. M is only a function of the person_eyes3d. It is a rotation matrix (around X and Y axes) that puts the person_eyes3d on a positive z axis => M @ person_eyes3d = |person_eyes3d| * (0, 0, 1). The Y axis of the rotated coordinate system stays in ZX plane (that means there is no roll = z rotation happening).

Assuming you use the Ladybug coordinate system (or in your case probably just general camera coordinate system), you can first converg the gaze_dir back into the original coordinates using M^{-1} @ gaze_dir. Then you have a ray defined as person_eyes3d + t * (M^{-1} @ gaze_dir) and you can cast it into your scene to find an intersection. In our case we represent the shelve as a simple vertical flat plane orthogonal to the camera view.

Thanks for your explanation! I will try it on my camera.