erkil1452 / gaze360

Code for the Gaze360: Physically Unconstrained Gaze Estimation in the Wild Dataset
http://gaze360.csail.mit.edu
Other
225 stars 42 forks source link

some question about camera calibration #36

Closed jxncyym closed 2 years ago

jxncyym commented 2 years ago

@erkil1452 hello, I'm sorry to trouble you again. In the issue https://github.com/erkil1452/gaze360/issues/34, you told me "For the ball I can imagine using multiple calibrated cameras (see multiview stereo camera calibration) and triangulating the ball position" . In the internet, I find a website, here is the link: https://sites.google.com/site/prclibo/toolbox, there are some code to calibrate multiple cameras, the images used to calibrate cameras also gived in the website. I don't know whether these code can be used to calibrate the cameras as you said above. And whether I need to do some changes about the images input to the code. I think my RGB camera should be pinhole, and in my scenario there are two cameras will be used, one camera is placed in front of the driver's left and the other in front of his right, the pictures taken by the two cameras have some overlap, so are there only two images need to be input to the code? Another question is the Triangulation method can be used to compute the ball 3d coordinate, I think it also can be used to compute the eye 3d coordinate, so I think don't need the method used in MPIIGaze to get the eye 3d coordinate(For getting 3D position of the person you can either rely on face scale as a cue (MPIIGaze uses that I believe) or you can use additional depth camera (eg Kinect Azure).) I don't know whether I'm right ? There are some implementation of Triangulation in opencv:cv::triangulatePoints,but the fuction only can compute the 3d coordinate of the point matched in two images. I can't promise the centers of the ball or centers the eye will be matched in two images. so I also can't promise I can compute the ball 3d coordinate and the eye 3d coordinate. Is that right?

erkil1452 commented 2 years ago

"I can't promise the centers of the ball or centers the eye will be matched in two images"

Yes this is a problem. You need at least two views to determine a 3D location. So you need to set your cameras accordingly. Alternatively you can use RGBD camera (e.g. Kinect Fusion) and avoid most of these issues (but deal with its own quirks such as light sensitivity, small FOV, limited range,...).

For the calibration, I am not an expert in this. I usually use OpenCV to calibrate cameras using a checkerboard pattern. This has two phases. First, you need to calibrate the intrinsics (focal length, distortion coeffs etc). This you do for each camera separately and you need many images of the same checkerboard in different poses. Then you calibrate the poses of the cameras. For that you can use just few images and you can just manually mark corresponding points in both views, so it does not have to be checkerboard. The more varied the point the better, hence bigger viewport overlap is more robust. Alternatively, you can do everything at once. I know that COLMAP does that but that usually assumes you have many cameras. I am not sure you can do much with only two. So you best chance is to do it step by step and follow a tutorial online. There are many for the OpenCV.