Small FoV advice - Githubissues

RainerKuemmerle / g2o

g2o: A General Framework for Graph Optimization

3.08k stars 1.11k forks source link

Hi all,

I am trying to do visual odometry of an aircraft with a downward-looking camera with ~42° FOV. Just for a quick summary, I am optimizing an SE3 pose using the 'sba' vertices and edges. All 3d points are determined by projecting keypoints through the image plane to a ground plane (not multi-view triangulation, and this is a valid assumption given the videos are high altitude).

When I set the intrinsics correctly based on the camera parameters, my tracking has very little lateral translation and rotates towards the camera movement direction, until it spins 360°. I know this is in part due to the naive triangulation I am doing which compounds errors, but when I incorrectly lower the intrinsics, this rotation behaviour is nullified and I get the expected stable translation. Of course this is at the expense of the true pose because the depth is not correct.

My work around is having a prior on the SE3 pose with non-zero information matrix on the rotational component forcing it downward-looking. This is not ideal because I need to balance the strength of the prior and even then it still unstable.

I have also played with and modified SVO and I see the same behaviour: tracking with true intrinsics diverges, but lowering the focal length gets at least the desired translation.

Could it be that my camera has imperceptible distortion that is amplified by higher intrinsics or something? Is there some well-formulated way to balance the info components of rotation/translation based on focal length? Something else?

What I've done is to add a penalty term which explicitly penalises for objects which fall behind the image plane. Using the notation from OpenCV (https://docs.opencv.org/2.4/modules/calib3d/doc/camera_calibration_and_3d_reconstruction.html), the error vector is:

error[0] = m_u - u error[1] = m_v - v error[2] = exp(10*(0.5-z_pred))

where (m_u, m_v) are the measured pixel coordinates, (u, v) are the predicted measurement coordinates, and z is the predicted depth of the feature in the camera-fixed coordinate frame. In my case, z is expressed in metres.

The effect of error[2] is basically zero until z is less than 0.5. It then ramps up very sharply. I found this provided a continuous way to prevent the optimiser from turning the camera around (which then flips all the points upside down but, in some cases, seems to produce a smaller error solution).

I have also spoken to a colleague of mine who does photogrammetry. They write their own custom bundle adjustment solvers, and he suggested the work here:

https://pdfs.semanticscholar.org/b8eb/e0764c809b88dd5b87af7e31fbea9d601710.pdf

I have to admit I haven't tried it yet, and I suspect some of this may be redundant with the matrix decompositions already used inside g2o. However, I haven't done a thorough study of it.

RainerKuemmerle / g2o

Small FoV advice #353