Closed TeresaKumo closed 2 years ago
Hi Teresa,
1+2) We do not do any face normalization. You can use the images and gaze labels directly. The matrix M just rotates the coordinates so that regardless of where the subject was standing, the gaze will always be [0,0,-1] if they look into the camera. If we used the global Ladybug coordinate system directly, the direction towards camera would be constantly changing.
3) You do not need camera parameters to compute the M. M is only a function of the person_eyes3d. It is a rotation matrix (around X and Y axes) that puts the person_eyes3d on a positive z axis => M @ person_eyes3d = |person_eyes3d| * (0, 0, 1). The Y axis of the rotated coordinate system stays in ZX plane (that means there is no roll = z rotation happening).
4) Assuming you use the Ladybug coordinate system (or in your case probably just general camera coordinate system), you can first converg the gaze_dir back into the original coordinates using M^{-1} @ gaze_dir. Then you have a ray defined as person_eyes3d + t * (M^{-1} @ gaze_dir) and you can cast it into your scene to find an intersection. In our case we represent the shelve as a simple vertical flat plane orthogonal to the camera view.
Hi Teresa
1+2) We do not do any face normalization. You can use the images and gaze labels directly. The matrix M just rotates the coordinates so that regardless of where the subject was standing, the gaze will always be [0,0,-1] if they look into the camera. If we used the global Ladybug coordinate system directly, the direction towards camera would be constantly changing.
- You do not need camera parameters to compute the M. M is only a function of the person_eyes3d. It is a rotation matrix (around X and Y axes) that puts the person_eyes3d on a positive z axis => M @ person_eyes3d = |person_eyes3d| * (0, 0, 1). The Y axis of the rotated coordinate system stays in ZX plane (that means there is no roll = z rotation happening).
- Assuming you use the Ladybug coordinate system (or in your case probably just general camera coordinate system), you can first converg the gaze_dir back into the original coordinates using M^{-1} @ gaze_dir. Then you have a ray defined as person_eyes3d + t * (M^{-1} @ gaze_dir) and you can cast it into your scene to find an intersection. In our case we represent the shelve as a simple vertical flat plane orthogonal to the camera view.
Thanks for your explanation! I will try it on my camera.
Hi Teresa,
1+2) We do not do any face normalization. You can use the images and gaze labels directly. The matrix M just rotates the coordinates so that regardless of where the subject was standing, the gaze will always be [0,0,-1] if they look into the camera. If we used the global Ladybug coordinate system directly, the direction towards camera would be constantly changing.
- You do not need camera parameters to compute the M. M is only a function of the person_eyes3d. It is a rotation matrix (around X and Y axes) that puts the person_eyes3d on a positive z axis => M @ person_eyes3d = |person_eyes3d| * (0, 0, 1). The Y axis of the rotated coordinate system stays in ZX plane (that means there is no roll = z rotation happening).
- Assuming you use the Ladybug coordinate system (or in your case probably just general camera coordinate system), you can first converg the gaze_dir back into the original coordinates using M^{-1} @ gaze_dir. Then you have a ray defined as person_eyes3d + t * (M^{-1} @ gaze_dir) and you can cast it into your scene to find an intersection. In our case we represent the shelve as a simple vertical flat plane orthogonal to the camera view.
what do you mean with @ in your answer???
@Ahmedsolimannull Matrix multiplication.
Could you please give me a direct equation to calculate M from person_eyes3d and target_3d??? I tried to calculate it, but I failed.
def getGazeDirection(...):
# Gaze direction in the Ladybug global coordinate system
gazeDirLB = target3D - eyes3D
gazeDirLB /= np.linalg.norm(gazeDirLB)
# This is direction from camera to eyes in camera coordinates.
# Only approximate for Ladybug but should work for normal single sensor cameras.
dirEyes = eyes3D / np.linalg.norm(eyes3D)
# Convert to the local camera coordinate system
gazeCS = self.getLadybugToEyeMatrix(dirEyes)
gazeDir = np.matmul(gazeCS, gazeDirLB)
gazeDir /= np.linalg.norm(gazeDir) # not really necessary
return gazeDir
def getLadybugToEyeMatrix(self, dirEyes):
# Define left? hand coordinate system in the eye plane orthogonal to the camera ray
upVector = np.array([0,0,1], np.float32)
zAxis = dirEyes.flatten()
xAxis = np.cross(upVector, zAxis)
xAxis /= np.linalg.norm(xAxis)
yAxis = np.cross(zAxis, xAxis)
yAxis /= np.linalg.norm(yAxis) # not really necessary
gazeCS = np.stack([xAxis, yAxis, zAxis], axis=0)
return gazeCS
getLadybugToEyeMatrix
returns the matrix M
def getGazeDirection(...): # Gaze direction in the Ladybug global coordinate system gazeDirLB = target3D - eyes3D gazeDirLB /= np.linalg.norm(gazeDirLB) # This is direction from camera to eyes in camera coordinates. # Only approximate for Ladybug but should work for normal single sensor cameras. dirEyes = eyes3D / np.linalg.norm(eyes3D) # Convert to the local camera coordinate system gazeCS = self.getLadybugToEyeMatrix(dirEyes) gazeDir = np.matmul(gazeCS, gazeDirLB) gazeDir /= np.linalg.norm(gazeDir) # not really necessary return gazeDir def getLadybugToEyeMatrix(self, dirEyes): # Define left? hand coordinate system in the eye plane orthogonal to the camera ray upVector = np.array([0,0,1], np.float32) zAxis = dirEyes.flatten() xAxis = np.cross(upVector, zAxis) xAxis /= np.linalg.norm(xAxis) yAxis = np.cross(zAxis, xAxis) yAxis /= np.linalg.norm(yAxis) # not really necessary gazeCS = np.stack([xAxis, yAxis, zAxis], axis=0) return gazeCS
getLadybugToEyeMatrix
returns the matrix M
Thank you for your help.
I read in the paper this sentense: We express the gaze in the observing camera’s Cartesian eye coordinate system E = [Ex, Ey, Ez]. E is defined so that the origin is pe, Ez has the same direction as gL.
And in the dataset description: M depends on a normal direction between eyes and the camera. which means that Ez is in the direction from the camera to the eyes. do you explain these, please?
Thanks for your patience
Yes, the x/y/zAxis in the code above are Ex, Ey and Ez. Ez is the dirEyes, ie. the direction from the camera to the eyes. The matrix M (or E) has the effect of rotating coordinate system such that it aligns the axes as described.
Yes, the x/y/zAxis in the code above are Ex, Ey and Ez. Ez is the dirEyes, ie. the direction from the camera to the eyes. The matrix M (or E) has the effect of rotating coordinate system such that it aligns the axes as described.
Thank you for your answer.
@erkil1452 Hi, thanks for your great job, but when I try to use your dataset, I have some problems.
Thanks a lot!