erkil1452 / gaze360

Code for the Gaze360: Physically Unconstrained Gaze Estimation in the Wild Dataset
http://gaze360.csail.mit.edu
Other
225 stars 42 forks source link

About eye and camera coordinate system in dataset #30

Closed TeresaKumo closed 2 years ago

TeresaKumo commented 2 years ago

@erkil1452 Hi, thanks for your great job, but when I try to use your dataset, I have some problems.

  1. In https://github.com/erkil1452/gaze360/tree/master/dataset, it said:

    gaze_dir = M * (target_pos3d - person_eyes3d) where M depends on a normal direction between eyes and the camera Is M the conversion matrix M=SR just like Revisiting data normalization for appearance-based gaze estimation and Learning-by-synthesis for appearance-based 3d gaze estimation ?

  2. If 1 is true, when I use a raw image in dataset, do I still need to do some data normalization?
  3. How can I get the camera parameters in dataset so that I can compute M matrix
  4. In your paper, I also found an illuminating part "Estimating attention in a supermarket", if it is possible, could you please tell me how to convert the gaze vector to the shelf

Thanks a lot!

erkil1452 commented 2 years ago

Hi Teresa,

1+2) We do not do any face normalization. You can use the images and gaze labels directly. The matrix M just rotates the coordinates so that regardless of where the subject was standing, the gaze will always be [0,0,-1] if they look into the camera. If we used the global Ladybug coordinate system directly, the direction towards camera would be constantly changing.

3) You do not need camera parameters to compute the M. M is only a function of the person_eyes3d. It is a rotation matrix (around X and Y axes) that puts the person_eyes3d on a positive z axis => M @ person_eyes3d = |person_eyes3d| * (0, 0, 1). The Y axis of the rotated coordinate system stays in ZX plane (that means there is no roll = z rotation happening).

4) Assuming you use the Ladybug coordinate system (or in your case probably just general camera coordinate system), you can first converg the gaze_dir back into the original coordinates using M^{-1} @ gaze_dir. Then you have a ray defined as person_eyes3d + t * (M^{-1} @ gaze_dir) and you can cast it into your scene to find an intersection. In our case we represent the shelve as a simple vertical flat plane orthogonal to the camera view.

TeresaKumo commented 2 years ago

Hi Teresa

1+2) We do not do any face normalization. You can use the images and gaze labels directly. The matrix M just rotates the coordinates so that regardless of where the subject was standing, the gaze will always be [0,0,-1] if they look into the camera. If we used the global Ladybug coordinate system directly, the direction towards camera would be constantly changing.

  1. You do not need camera parameters to compute the M. M is only a function of the person_eyes3d. It is a rotation matrix (around X and Y axes) that puts the person_eyes3d on a positive z axis => M @ person_eyes3d = |person_eyes3d| * (0, 0, 1). The Y axis of the rotated coordinate system stays in ZX plane (that means there is no roll = z rotation happening).
  2. Assuming you use the Ladybug coordinate system (or in your case probably just general camera coordinate system), you can first converg the gaze_dir back into the original coordinates using M^{-1} @ gaze_dir. Then you have a ray defined as person_eyes3d + t * (M^{-1} @ gaze_dir) and you can cast it into your scene to find an intersection. In our case we represent the shelve as a simple vertical flat plane orthogonal to the camera view.

Thanks for your explanation! I will try it on my camera.

Ahmednull commented 2 years ago

Hi Teresa,

1+2) We do not do any face normalization. You can use the images and gaze labels directly. The matrix M just rotates the coordinates so that regardless of where the subject was standing, the gaze will always be [0,0,-1] if they look into the camera. If we used the global Ladybug coordinate system directly, the direction towards camera would be constantly changing.

  1. You do not need camera parameters to compute the M. M is only a function of the person_eyes3d. It is a rotation matrix (around X and Y axes) that puts the person_eyes3d on a positive z axis => M @ person_eyes3d = |person_eyes3d| * (0, 0, 1). The Y axis of the rotated coordinate system stays in ZX plane (that means there is no roll = z rotation happening).
  2. Assuming you use the Ladybug coordinate system (or in your case probably just general camera coordinate system), you can first converg the gaze_dir back into the original coordinates using M^{-1} @ gaze_dir. Then you have a ray defined as person_eyes3d + t * (M^{-1} @ gaze_dir) and you can cast it into your scene to find an intersection. In our case we represent the shelve as a simple vertical flat plane orthogonal to the camera view.

what do you mean with @ in your answer???

erkil1452 commented 2 years ago

@Ahmedsolimannull Matrix multiplication.

Ahmednull commented 2 years ago

Could you please give me a direct equation to calculate M from person_eyes3d and target_3d??? I tried to calculate it, but I failed.

erkil1452 commented 2 years ago
def getGazeDirection(...):
        # Gaze direction in the Ladybug global coordinate system
        gazeDirLB = target3D - eyes3D
        gazeDirLB /= np.linalg.norm(gazeDirLB)

        # This is direction from camera to eyes in camera coordinates.
        # Only approximate for Ladybug but should work for normal single sensor cameras.
        dirEyes = eyes3D / np.linalg.norm(eyes3D)         

        # Convert to the local camera coordinate system
        gazeCS = self.getLadybugToEyeMatrix(dirEyes)
        gazeDir = np.matmul(gazeCS, gazeDirLB)
        gazeDir /= np.linalg.norm(gazeDir) # not really necessary
        return gazeDir

def getLadybugToEyeMatrix(self, dirEyes):
        # Define left? hand coordinate system in the eye plane orthogonal to the camera ray
        upVector = np.array([0,0,1], np.float32)
        zAxis = dirEyes.flatten()
        xAxis = np.cross(upVector, zAxis)
        xAxis /= np.linalg.norm(xAxis)
        yAxis = np.cross(zAxis, xAxis)
        yAxis /= np.linalg.norm(yAxis) # not really necessary
        gazeCS = np.stack([xAxis, yAxis, zAxis], axis=0)
        return gazeCS

getLadybugToEyeMatrix returns the matrix M

Ahmednull commented 2 years ago
def getGazeDirection(...):
        # Gaze direction in the Ladybug global coordinate system
        gazeDirLB = target3D - eyes3D
        gazeDirLB /= np.linalg.norm(gazeDirLB)

        # This is direction from camera to eyes in camera coordinates.
        # Only approximate for Ladybug but should work for normal single sensor cameras.
        dirEyes = eyes3D / np.linalg.norm(eyes3D)         

        # Convert to the local camera coordinate system
        gazeCS = self.getLadybugToEyeMatrix(dirEyes)
        gazeDir = np.matmul(gazeCS, gazeDirLB)
        gazeDir /= np.linalg.norm(gazeDir) # not really necessary
        return gazeDir

def getLadybugToEyeMatrix(self, dirEyes):
        # Define left? hand coordinate system in the eye plane orthogonal to the camera ray
        upVector = np.array([0,0,1], np.float32)
        zAxis = dirEyes.flatten()
        xAxis = np.cross(upVector, zAxis)
        xAxis /= np.linalg.norm(xAxis)
        yAxis = np.cross(zAxis, xAxis)
        yAxis /= np.linalg.norm(yAxis) # not really necessary
        gazeCS = np.stack([xAxis, yAxis, zAxis], axis=0)
        return gazeCS

getLadybugToEyeMatrix returns the matrix M

Thank you for your help.

Ahmednull commented 2 years ago

I read in the paper this sentense: We express the gaze in the observing camera’s Cartesian eye coordinate system E = [Ex, Ey, Ez]. E is defined so that the origin is pe, Ez has the same direction as gL.

And in the dataset description: M depends on a normal direction between eyes and the camera. which means that Ez is in the direction from the camera to the eyes. do you explain these, please?

Thanks for your patience

erkil1452 commented 2 years ago

Yes, the x/y/zAxis in the code above are Ex, Ey and Ez. Ez is the dirEyes, ie. the direction from the camera to the eyes. The matrix M (or E) has the effect of rotating coordinate system such that it aligns the axes as described.

Ahmednull commented 2 years ago

Yes, the x/y/zAxis in the code above are Ex, Ey and Ez. Ez is the dirEyes, ie. the direction from the camera to the eyes. The matrix M (or E) has the effect of rotating coordinate system such that it aligns the axes as described.

Thank you for your answer.