Open rogeriochaves opened 3 years ago
Did you calibrate your camera to obtain its intrinsic parameters and more importantly, the extrinsic parameters (rotation, translation) between your camera and your monitor as described https://github.com/NVlabs/few_shot_gaze/blob/master/demo/README.md, step 2b?
thanks for replying! no, to be honest I skipped this step at first, but now I tried to follow through again. I couldn't make the newer Ver. 2 work, neither the matlab version (I'm using octave) nor the C version. But I could make the Ver. 1 Matlab version work (I think?), but I don't know what to do with the outputs
Average reprojection error by TNM : 0.169550 pixel.
==== Parameters by TNM ====
R =
-0.914077 -0.030128 0.404420
-0.020261 0.999384 0.028656
-0.405034 0.018000 -0.914124
T =
285.23
100.70
159.83
n1 =
0.333519
0.048892
-0.941475
n2 =
0.484883
0.098207
-0.869048
n3 =
0.047996
0.011300
-0.998784
d1 = 351.78
d2 = 318.01
d3 = 377.73
points =
285.234 100.701 159.834
125.271 97.155 88.953
282.221 200.639 161.634
285.234 100.701 159.834
points =
84.2177 71.2328 727.2730 1.0000
-84.5563 66.3955 681.2629 1.0000
79.7462 170.9574 733.1905 1.0000
84.2177 71.2328 727.2730 1.0000
points =
-32.1676 36.4152 728.7079 1.0000
-176.3115 36.0734 629.4738 1.0000
-41.7646 135.0200 742.3086 1.0000
-32.1676 36.4152 728.7079 1.0000
points =
262.8758 95.4368 625.1049 1.0000
96.8575 90.4655 680.2247 1.0000
259.9411 195.3936 625.2806 1.0000
262.8758 95.4368 625.1049 1.0000
points =
184.726 85.967 443.553
20.357 81.775 385.108
16.615 181.607 388.967
180.984 185.798 447.412
184.726 85.967 443.553
points =
126.533 68.558 444.271
-25.520 66.614 359.213
-31.825 165.886 366.914
120.228 167.830 451.971
126.533 68.558 444.271
points =
274.055 98.069 392.469
111.064 93.810 384.589
108.090 193.758 385.577
271.081 198.016 393.457
274.055 98.069 392.469
how should I tweak monitor.py based on this, can you give me an example? It's a lot of numbers and documentation is not clear
FYI, I'm using a Macbook Pro webcam if that makes things simpler
I am not 100% sure, but I imagine that you need to update https://github.com/NVlabs/few_shot_gaze/blob/master/demo/monitor.py#L28 (and its inverse) based on these values that you determined:
T =
285.23
100.70
159.83
I imagine that this is the translation from the screen to camera coordinate system in millimeters.
So for example, you could probably define (yet again, I'm not 100% sure):
def monitor_to_camera(self, x_pixel, y_pixel):
x_cam_mm = 285.23 + ((int(self.w_pixels/2) - x_pixel)/self.w_pixels) * self.w_mm
y_cam_mm = 100.7 + (y_pixel/self.h_pixels) * self.h_mm
z_cam_mm = 159.83
return x_cam_mm, y_cam_mm, z_cam_mm
and a corresponding camera_to_monitor
I am not 100% sure, but I imagine that you need to update https://github.com/NVlabs/few_shot_gaze/blob/master/demo/monitor.py#L28 (and its inverse) based on these values that you determined:
T = 285.23 100.70 159.83
I imagine that this is the translation from the screen to camera coordinate system in millimeters.
So for example, you could probably define (yet again, I'm not 100% sure):
def monitor_to_camera(self, x_pixel, y_pixel): x_cam_mm = 285.23 + ((int(self.w_pixels/2) - x_pixel)/self.w_pixels) * self.w_mm y_cam_mm = 100.7 + (y_pixel/self.h_pixels) * self.h_mm z_cam_mm = 159.83 return x_cam_mm, y_cam_mm, z_cam_mm
and a corresponding
camera_to_monitor
Not exactly. The code in the following places: https://github.com/NVlabs/few_shot_gaze/blob/2b0ea42ecba456ede03a60c11a94dd62a45dc287/demo/frame_processor.py#L178 https://github.com/NVlabs/few_shot_gaze/blob/2b0ea42ecba456ede03a60c11a94dd62a45dc287/demo/frame_processor.py#L218 https://github.com/NVlabs/few_shot_gaze/blob/2b0ea42ecba456ede03a60c11a94dd62a45dc287/demo/monitor.py#L28 https://github.com/NVlabs/few_shot_gaze/blob/2b0ea42ecba456ede03a60c11a94dd62a45dc287/demo/monitor.py#L38 assume that the z axis of the camera and the z axis of the monitor are parallel and there is no translation in the z direction, i.e. z=0. However, from the R and T given by @rogeriochaves, it can be seen that neither of the two assumptions stands. In order to correctly apply the calibration results, you need to
BTW, the R and T given by the calibration process actually describes the relationship between the chessboard pattern displayed on the monitor and the camera. It may not equal to the relationship between the monitor and the camera. You need to find the relationship between the chessboard pattern and the monitor as well.
I am not 100% sure, but I imagine that you need to update https://github.com/NVlabs/few_shot_gaze/blob/master/demo/monitor.py#L28 (and its inverse) based on these values that you determined:
T = 285.23 100.70 159.83
I imagine that this is the translation from the screen to camera coordinate system in millimeters. So for example, you could probably define (yet again, I'm not 100% sure):
def monitor_to_camera(self, x_pixel, y_pixel): x_cam_mm = 285.23 + ((int(self.w_pixels/2) - x_pixel)/self.w_pixels) * self.w_mm y_cam_mm = 100.7 + (y_pixel/self.h_pixels) * self.h_mm z_cam_mm = 159.83 return x_cam_mm, y_cam_mm, z_cam_mm
and a corresponding
camera_to_monitor
Not exactly. The code in the following places: https://github.com/NVlabs/few_shot_gaze/blob/2b0ea42ecba456ede03a60c11a94dd62a45dc287/demo/frame_processor.py#L178
assume that the z axis of the camera and the z axis of the monitor are parallel and there is no translation in the z direction, i.e. z=0. However, from the R and T given by @rogeriochaves, it can be seen that neither of the two assumptions stands. In order to correctly apply the calibration results, you need to
- apply a full coordinate transformation in monitor.py by using not only the translation vector T but also the rotation matrix R.
- change the way of calculating the POR by not assuming z=0, you should find the intersection between the gaze vector and the monitor plane (usually the xy plane of the monitor).
BTW, the R and T given by the calibration process actually describes the relationship between the chessboard pattern displayed on the monitor and the camera. It may not equal to the relationship between the monitor and the camera. You need to find the relationship between the chessboard pattern and the monitor as well.
so is the tnm monitor calibration needed for a default laptop webcam (the assumptions of z=0 and Δy = 10 mm fits)? I've got the model to run but I'm wondering if there's some way to improve accuracy further by calibration ?
I am not 100% sure, but I imagine that you need to update https://github.com/NVlabs/few_shot_gaze/blob/master/demo/monitor.py#L28 (and its inverse) based on these values that you determined:
T = 285.23 100.70 159.83
I imagine that this is the translation from the screen to camera coordinate system in millimeters. So for example, you could probably define (yet again, I'm not 100% sure):
def monitor_to_camera(self, x_pixel, y_pixel): x_cam_mm = 285.23 + ((int(self.w_pixels/2) - x_pixel)/self.w_pixels) * self.w_mm y_cam_mm = 100.7 + (y_pixel/self.h_pixels) * self.h_mm z_cam_mm = 159.83 return x_cam_mm, y_cam_mm, z_cam_mm
and a corresponding
camera_to_monitor
Not exactly. The code in the following places: https://github.com/NVlabs/few_shot_gaze/blob/2b0ea42ecba456ede03a60c11a94dd62a45dc287/demo/frame_processor.py#L178
assume that the z axis of the camera and the z axis of the monitor are parallel and there is no translation in the z direction, i.e. z=0. However, from the R and T given by @rogeriochaves, it can be seen that neither of the two assumptions stands. In order to correctly apply the calibration results, you need to
- apply a full coordinate transformation in monitor.py by using not only the translation vector T but also the rotation matrix R.
- change the way of calculating the POR by not assuming z=0, you should find the intersection between the gaze vector and the monitor plane (usually the xy plane of the monitor).
BTW, the R and T given by the calibration process actually describes the relationship between the chessboard pattern displayed on the monitor and the camera. It may not equal to the relationship between the monitor and the camera. You need to find the relationship between the chessboard pattern and the monitor as well.
so is the tnm monitor calibration needed for a default laptop webcam (the assumptions of z=0 and Δy = 10 mm fits)? I've got the model to run but I'm wondering if there's some way to improve accuracy further by calibration ?
Every laptop hardware configuration is different but the assumption of z=0 should be OK to use. But you need to at least measure Δy and Δx using a ruler if you really don't want to do the cailbration. (Δy = the distance between the camera and the upper edge of the monitor; Δx = the distance between the camera and the left edge of the monitor, usually equal to monitor width / 2.)
However, a good calibration won't help improve accuray in this case. I believe the accuracy is limited by the image resolution. I did an experiment and it turned out that you almost cannot recognize the eye movement in images taken for two target points that are less than 2cm apart on the screen. Increasing image resolution might be a solution but this will also increase the complexity of the neural network and you need to build a high resolution training dataset as well. So I think this still remains an open problem.
Hello there!
For some reason the predicted PoR is way off screen, to try to debug it, on an already trained network I ran the person calibration again, then saved the
gaze_n_vector
variable used during training andg_cnn
variable used during prediction onframe_processor.py
, and if I plot them separate I get this:Leaving the clear error aside, if I plot them together I get this:
now if I fit a linear regression I get a coef of almost exactly 0.1 for both
now by applying those I get a prediction that makes more sense
why is that? Is some part of the calculation missing during prediction on
frame_processor.py
? Why is PoR always 10x bigger?