ethz-asl / kalibr

The Kalibr visual-inertial calibration toolbox
Other
4.18k stars 1.37k forks source link

Extrinsic Matrices for 3+ cameras #660

Open Esilocke opened 6 months ago

Esilocke commented 6 months ago

Hi, I am trying to calibrate a set of 3 cameras using Kalibr but I don't understand how the t_cn_cnm1 works.

Here is my output from the yaml:

cam0:
  cam_overlaps: [2]
  camera_model: pinhole
  distortion_coeffs: [0.08281109680087015, -0.18398758284732086, 0.005051133958382585, 0.008640260762534649]
  distortion_model: radtan
  intrinsics: [600.9179442618838, 599.2578387754106, 341.60463277099274, 240.44821507600693]
  resolution: [640, 480]
  rostopic: /camera1/camera/color/image_raw
cam1:
  T_cn_cnm1:
  - [-0.7414133359661033, 0.05853003394345349, -0.6684912118930222, 0.8883474427844454]
  - [-0.2840167333857112, 0.8751932181958348, 0.3916265133784187, -0.8209689603305266]
  - [0.6079808881934803, 0.48021981003557634, -0.6322564144719063, 2.9847293349813726]
  - [0.0, 0.0, 0.0, 1.0]
  cam_overlaps: [2]
  camera_model: pinhole
  distortion_coeffs: [0.07602073028981758, -0.1603628223680752, 0.007191248340315871, -0.0014278084685302008]
  distortion_model: radtan
  intrinsics: [588.505837757169, 586.457269642278, 333.6385186873257, 247.9353654769998]
  resolution: [640, 480]
  rostopic: /camera2/camera/color/image_raw
cam2:
  T_cn_cnm1:
  - [-0.3274874647344716, -0.35079256500893524, 0.8773235074795588, -1.5046245875845434]
  - [0.3490605026933797, 0.8179228902762989, 0.45733894544583203, -0.7879559451298809]
  - [-0.8780140806964479, 0.45601175631392726, -0.14541166460175797, 2.1602821465549202]
  - [0.0, 0.0, 0.0, 1.0]
  cam_overlaps: [0, 1]
  camera_model: pinhole
  distortion_coeffs: [0.09318340757433892, -0.23029164273790026, 0.0001545848848756635, -0.0008750487004188804]
  distortion_model: radtan
  intrinsics: [607.4367039915398, 607.1537122370588, 318.5112711269077, 242.77602440822233]
  resolution: [640, 480]
  rostopic: /camera3/camera/color/image_raw

From the wiki, T_cn_cnm1 is supposed to represent the transform with respect to the last camera in the chain. For a 2-camera setup this is intuitive to understand but I am a little confused in this particular scenario.

Cam 1 does not overlap cam 0 thus what is this T_cn_cnm1 for cam 1? If it is to the last camera in the cam_overlaps, then for cam 2, the cam_overlaps is [0, 1], thus does the T_cn_cnm1 for cam 2 represent the transformation matrix of cam 2 to cam 1 coordinates? In this case what is the transform of my cam 0, since the cam 1 <-> cam 2 matrices are circular with each other.

goldbattle commented 6 months ago

I believe it doesn't relate to the overlaps, but instead is directly the cam0 -> cam1, and then cam1 -> cam2 sequential order. I believe the .txt file should allow you to check this. You can multiple them together to check they get to identity for the cam0.

Camchain example:

cam0:
  cam_overlaps: [1, 2]
  camera_model: pinhole
  distortion_coeffs: [-0.3044477343700008, 0.06831190064580549, 0.0016385158708163779,
    -0.0008662676729043089]
  distortion_model: radtan
  intrinsics: [542.4693569385074, 542.5943654005414, 524.0797458796575, 374.9032424633794]
  resolution: [1024, 768]
  rostopic: /bb2/left/image_mono
cam1:
  T_cn_cnm1:
  - [0.9996636322595407, -0.0023396338089671878, 0.025829217009016997, -0.11851975874888755]
  - [0.0023291690482534857, 0.9999971927692536, 0.0004352300012703644, -9.924984662223608e-08]
  - [-0.02583016277927054, -0.00037492299114023263, 0.9996662753757097, -0.0002488380896967549]
  - [0.0, 0.0, 0.0, 1.0]
  cam_overlaps: [0, 2]
  camera_model: pinhole
  distortion_coeffs: [-0.3172271683629877, 0.0796622417650886, 0.0016817111445300336,
    0.0003884697800337926]
  distortion_model: radtan
  intrinsics: [542.1195972116055, 541.3994254954196, 512.4177276233356, 373.65549791432295]
  resolution: [1024, 768]
  rostopic: /bb2/right/image_mono
cam2:
  T_cn_cnm1:
  - [0.9992669676356307, 0.020129473755802423, -0.0325627345072751, 0.01773871315621785]
  - [-0.01905646952339181, 0.9992758637360629, 0.032933252554748896, 0.05903083063685694]
  - [0.0332020836933597, -0.03228858065702636, 0.9989269588901679, -0.024070661488148848]
  - [0.0, 0.0, 0.0, 1.0]
  cam_overlaps: [0, 1]
  camera_model: pinhole
  distortion_coeffs: [0.033130172812662916, -0.3296327708784002, 1.6123381603976683,
    -2.924050525810986]
  distortion_model: equidistant
  intrinsics: [787.0594937497326, 785.6673160989883, 426.17865094797287, 351.9821746436108]
  resolution: [808, 608]
  rostopic: /bkfy/image_mono

And the result.txt

Camera-system parameters:
    cam0 (/bb2/left/image_mono):
     type: <class 'aslam_cv.libaslam_cv_python.DistortedPinholeCameraGeometry'>
     distortion: [-0.30444773  0.0683119   0.00163852 -0.00086627] +- [ 0.00034358  0.00023751  0.0000591   0.00005848]
     projection: [ 542.46935694  542.5943654   524.07974588  374.90324246] +- [ 0.14351632  0.13509232  0.26855242  0.26294191]
     reprojection error: [-0.000090, -0.000011] +- [2.218856, 1.355786]

    cam1 (/bb2/right/image_mono):
     type: <class 'aslam_cv.libaslam_cv_python.DistortedPinholeCameraGeometry'>
     distortion: [-0.31722717  0.07966224  0.00168171  0.00038847] +- [ 0.00038583  0.00029257  0.00005692  0.00005716]
     projection: [ 542.11959721  541.3994255   512.41772762  373.65549791] +- [ 0.1469293   0.14012416  0.26526817  0.26407153]
     reprojection error: [0.000088, -0.000024] +- [2.199116, 1.318062]

    cam2 (/bkfy/image_mono):
     type: <class 'aslam_cv.libaslam_cv_python.EquidistantDistortedPinholeCameraGeometry'>
     distortion: [ 0.03313017 -0.32963277  1.61233816 -2.92405053] +- [ 0.00476357  0.0295189   0.06283652  0.11189452]
     projection: [ 787.05949375  785.6673161   426.17865095  351.98217464] +- [ 0.20393965  0.20041723  0.59358168  0.56565581]
     reprojection error: [-0.000884, 0.000017] +- [3.896984, 2.296730]

    baseline T_1_0:
     q: [ 0.00020256 -0.01291593 -0.0011673   0.99991588] +- [ 0.00058683  0.00062264  0.00007151]
     t: [-0.11851976 -0.0000001  -0.00024884] +- [ 0.0000535   0.0000535   0.00016074]

    baseline T_2_1:
     q: [ 0.01631062  0.01644641  0.00979959  0.99968367] +- [ 0.00079731  0.00089289  0.00008208]
     t: [ 0.01773871  0.05903083 -0.02407066] +- [ 0.00006255  0.00005788  0.00019487]
Esilocke commented 6 months ago

Hi, thank you for the response. I checked the multiplication and found that it was not very satisfactory:

array([[ 0.85017879,  0.00311549, -0.52648486,  0.51364746],
       [ 0.05465371,  0.99405782,  0.09413832, -0.23722033],
       [ 0.52364968, -0.10880876,  0.8449566 ,  0.32570204],
       [ 0.        ,  0.        ,  0.        ,  1.        ]])

Meanwhile yours gives a very good result:

array([[ 0.99983302,  0.01695077, -0.00682733, -0.10154685],
       [-0.0167145 ,  0.99930589,  0.03329208,  0.05906141],
       [ 0.00738692, -0.0331724 ,  0.99942235, -0.02479179],
       [ 0.        ,  0.        ,  0.        ,  1.        ]])

I wanted to combine the pointclouds together but the results seemed really off, hence I thought that the camchain matrices were being parsed incorrectly, however it seems that the result is still quite wrong, should I take this as a bad calibration (a bad convergence)?

image

The two pointcloud groups are supposed to be overlapping together but there is a blank void in the middle, and the multiplication matrix is quite far from the identity matrix.

goldbattle commented 6 months ago

Please pose your result txt and pdf report. How many connections do you have between each camera, and how do the reprojection errors look? How are you recovering this pointcloud? Stereo matching? Why would it be the middle here isn't clear to me.

Esilocke commented 6 months ago

Hi, please find the files attached. I do not have the txt file any more but the contents should be in the PDF's first page. 300923_morning-report-cam.pdf

This is a 3-camera Realsense d435 setup where the cameras are pointed at each other in a roughly 60-degree angle, hence the actual capture scene is quite small. Based on the reprojection errors of around +- 2px I had assumed that this was small enough hence the calibration was good.

I am projecting the pointcloud from the RGBD image using the intrinsic parameters from the yaml files, but using the pointcloud directly is also a viable option since we had saved the depth/registered_points directly. We just want to align these pointclouds so that they make sense.

goldbattle commented 5 months ago

It looks like the camera1 to camera2 link has ~4k observations, while the camera2 to camera0 has 41k. Maybe this could be a problem, I am not sure. The 3d picture doesn't make much sense to me as the z-axis of all sensors should be pointing in a common direction correct?

You can see that the cam0 has a weird spike with the majority of at very high azimuthal angle also, not sure what to think about this: image

Additionally, the reprojection errors don't look that gaussian so it might be worth trying to recollect and varying the board position in the image frame a bit more (use the -bag-freq to downsample your bag it looks like many observations are close to each other) image

Esilocke commented 5 months ago

Thanks for your suggestions! The observation differences is mainly because of the camera placement, because the placement itself (+ the resolution presumably) caused the algorithm to fail to detect the chessboard between two of the cameras (likely 1->2) unless I was standing in one very specific angle, or that the intrinsic step itself couldn't be completed with a lot of NaNs without sticking the board right in front of the camera.

The 3D system does look really weird now that you mention it, though I could possibly think it would be ok if the Y and Z axis were swapped on that outlier camera. Are these results because of the kalibr algorithm itself, because I am quite sure we were using the same camera coordinate systems throughout all 3 cameras.

daewoode commented 5 months ago

why is the transformation called T_cn_cnm1 when it's the transformation from cam0 to cam1? T_cn_cnm1 = Transformation_cam-n_to_cam-n-minus-1? so isn't it the transformation from cam1 to cam0 in the right handed coordinate system?

goldbattle commented 3 months ago

why is the transformation called T_cn_cnm1 when it's the transformation from cam0 to cam1? T_cn_cnm1 = Transformation_cam-n_to_cam-n-minus-1? so isn't it the transformation from cam1 to cam0 in the right handed coordinate system?

This is the naming convention used. T_1_0 is the T_0_to_1 rotation. https://github.com/ethz-asl/minkindr/wiki/Common-Transformation-Conventions