amy-tabb / calico

code for: Calibration of Asynchronous Camera Networks: CALICO
MIT License
76 stars 11 forks source link

Calico gives large reprojection error for my 'Camera Network' dataset #6

Closed TabassumNova closed 9 months ago

TabassumNova commented 1 year ago

Hi, I am using docker image for calibration. We have six 20MP cameras arranged on two sides of a metal rectangular prism (similar setting as Net-1/Net-2 dataset). For calibration, I am using a cube which has different charuco patterns on its 5 sides. Here attached my dataset with the output that I got from calico (https://1drv.ms/f/s!At7UW7zuEoCCiqJPH_tKyUTKYJvF2w?e=rWStOt). I am getting very large reprojection error. I am not sure why. To me, the image qualities are good. I am quite new to computer vision. Please help!

amy-tabb commented 1 year ago

Hello,

Thanks for all of the files ! Would you be able to upload the output image files as well? If not, check that the patterns are identified correctly in the output. There was a bug in OpenCV, corrected, but now that I think about it I should check that the Docker image is up-to-date.

Thanks for writing! This kind of information and failures is really useful to me.

A

TabassumNova commented 1 year ago

Hi, Thank you so much for your quick reply. I have uploaded the output data folder in the same link (https://1drv.ms/f/s!At7UW7zuEoCCiqJPH_tKyUTKYJvF2w?e=lgAK5j). I think the detection is working fine. FYI, I have tried the docker image to calibrate the published "Net-2-base" dataset. It works fine in that case and gives very small reprojection error.
ext2

Oguked commented 1 year ago

I have the same issue. Feature detection works fine, but the reprojection error is huge and the extrinsic parameters are way off

amy-tabb commented 1 year ago

Hello @TabassumNova ,

I played around with this dataset, the problem from my view is that you do not have views of the calibration object with different orientations. The only change is in translation. This will result in no-good initial poses, which are then refined. In particular, patterns 3 has this problem in the current dataset.

Example from cam 1 from your dataset:

Screenshot from 2023-04-21 13-29-35

Example from the first camera from one of my datasets (Net-2):

Screenshot from 2023-04-21 13-33-52

And the images for internal calibration, same camera:

Screenshot from 2023-04-21 13-34-33

If you're able to collect a new dataset and try it out, keep me posted -- I'm curious to see if doing so will resolve your problem.

Best A

amy-tabb commented 1 year ago

Hello @Oguked ,

I've written a response concerning the OP's dataset (above). Are you able to share a failing dataset? Do you happen to have the same problem as above, that the image poses are not varied enough?

Best A

Oguked commented 1 year ago

Hi @amy-tabb,

thank you for your reply. Yeah, my dataset also doesn't have much variety. I will try again with more varied poses.

I have another suspicion/question: Do all sides of the calibration object (e.g. the 5 sides of Nova's cube), wich have been described in the configuration files, need to be present in the image data?

BR Oguz

amy-tabb commented 1 year ago

Hi @Oguked

I have another suspicion/question: Do all sides of the calibration object (e.g. the 5 sides of Nova's cube), wich have been described in the configuration files, need to be present in the image data?

No, not all images in the configuration files have to be present in the image data. Whichever patterns are visible will be used for the calibration. I think some of my datasets have 4 patterns in the configuration file, and only two are in the dataset.

Best A

TabassumNova commented 1 year ago

Hi @amy-tabb ,

Thank you so much for your reply. I am working on getting good poses. I will keep you posted.

TabassumNova commented 1 year ago

Hi @amy-tabb ,

I am facing another problem here. I cannot load more than 12 images for each camera. That's why I am not getting lots of variety of poses. I have six 20MP cameras. The image size is 5472x3648 (8.7 Mb). When I was trying to run the program with more than 12 images, it stops in the middle. I am working on 16-core 3.50GHz processor. Can you please suggest me what should I do here?

amy-tabb commented 1 year ago

Hello @TabassumNova ,

Hmmm what kind of RAM do you have?

Then, I personally would resave the images and/or save them as 1-channel versus 3-channel images since they are effectively greyscale. When I worked with your previous dataset, I saved them using KolourPaint -- leaving the size and format alone -- and the size dropped to 1MB / image.

That being said, I've not previously had problems with dataset size and running the code, but I also usually run the code, not the Docker container. There may be some setting with how much memory Docker is allowed.

Let me know how it goes! A

TabassumNova commented 1 year ago

Thank you so much @amy-tabb . Changing the docker setting helps. Now I can load 30 images per camera

TabassumNova commented 1 year ago

Hi @amy-tabb,

Here attached my new dataset with output (https://1drv.ms/f/s!At7UW7zuEoCCisBBuX3PH_KdFVGjqA?e=A8Lzts). I am still getting large error. Can you please check this? I am also sharing the large dataset, the previous one is a subset of this one (https://1drv.ms/f/s!At7UW7zuEoCCirV1tT8LquZOtoaWeA?e=0ghYw2)

amy-tabb commented 1 year ago

Hello @TabassumNova --

Apologies on the delay. I ran your dataset here and it seems to be working pretty well (with a cavaet). I increased k to 30 in the command line arguments.

Here's the reconstruction accuracy error: SQRT -- RAE w/ BA, average, stddev, median : 0.167753, 0.220555, 0.111621

0.17 to me looks good, the application may require something else.

Here's reprojection error,

minimization1: Reprojection error, rrmse: 10.4355

Then, in total_results.txt I took also took a look at the at this information -- the error per equation. One camera and one pattern will combine to create a foundational relationship. Scanning, the larger error after minimization is from the last camera:

5749.12 5451.34 81.7537 418.168 1493.52 6053.3 509.756 3671.18 336.758 6249.36 

(full section)

Algebraic error, per foundational relationship. 
initial: 3.83386 43.6885 68.6827 2.10658 8.98767 37.6939 11.1296 43.5604 86.6963 8.7933 8.54431 84.0225 1.54147 13.9715 31.7345 1.48674 0.395838 30.3513 3.74598 33.5361 7.21277 65.3313 0 35.2774 6.15023 63.9085 55.0408 24.1811 89.4644 4.48248 41.6506 3.73695 24.0905 68.8155 0.901738 19.9123 72.3204 2.50817 23.7353 7.0044 64.0096 58.4244 19.6793 109.083 6.07125 12.1859 24.1583 6.88061 41.9817 2.8167 68.8067 32.9217 103.535 2.44511 38.3312 9.23032 23.9384 8.36506 59.8081 2.22641 44.7144 5.35396 3.18726 66.7036 2.17124 2.52153 0 72.7499 85.1613 100.841 139.743 3.87516 4.68825 6.23241 2.20887 6.06345 4.34235 76.5181 88.3004 121.783 149.714 27.5631 47.7 5.71034 0.912045 123.884 163.654 11.6998 7.89533 44.4615 72.2925 12.1239 13.8786 0.238651 0.515494 0 82.8049 140.986 5.63281 2.83256 10.2216 103.818 201.288 34.7378 25.3192 202.232 7.72578 77.4239 23.2875 4.65115 10.4445 20.9887 20.4581 16.3762 12.5384 12.5893 19.1537 44.0135 0.324022 12.6622 27.3906 2.95818 34.0757 0.539272 9.54507 0 691.259 2134.1 61.6814 131.652 135.856 2639.42 5420.51 704.551 161.472 5527.12 3.82267 1707.92 210.102 
minimization1: 1.04325 0.710406 1.10377 0.826573 2.32451 1.81123 4.08413 0.610493 1.0074 0.237525 1.77378 1.58396 0.611579 0.563375 0.396929 0.0887843 0.475474 0.0281935 0.426127 0.196222 0.883137 4.19683 1.36276 0.151427 2.24319 0.588134 0.202881 0.109591 1.60117 0.653488 0.763667 3.52582 8.75873 5.74943 1.61482 5.49829 9.19553 1.32852 5.58753 1.34003 1.17539 2.81265 0.491766 0.668295 0.186335 0.058392 2.66312 1.29121 0.701742 4.72367 0.411323 0.456134 1.28533 6.73382 6.9719 1.81766 1.69801 3.96476 2.36304 2.6917 2.72421 2.14731 0.276986 6.13471 4.24255 22.4517 1.99223 40.7485 0.0330901 45.7618 4.42871 20.7696 0.101226 7.19941 6.65715 6.36049 9.10833 50.4641 0.166905 55.2148 0.891676 36.8806 2.72265 9.22918 0.972323 58.6043 6.93741 34.8135 2.76476 30.1264 0.074315 45.8553 0.373291 19.8339 6.62463 2.48558 0.6072 1.11807 0.0572173 0.2771 3.42105 0.402469 4.98583 0.183533 10.4968 7.64147 3.49906 0.227652 1.52473 0.0920907 2.46338 2.48763 2.69136 1.92426 2.78619 2.88325 2.66203 2.1822 1.76189 0.320147 3.32812 5.9932 4.75205 3.55109 3.97854 4269.95 988.455 295.708 4613.9 5749.12 5451.34 81.7537 418.168 1493.52 6053.3 509.756 3671.18 336.758 6249.36 

From there, I deleted the last camera's directory from input -- 08320222. And the reconstruction accuracy error is smaller:

SQRT -- RAE w/ BA, average, stddev, median : 0.109011, 0.111192, 0.0900913

and the reprojection error looks right:

minimization1: Reprojection error, rrmse: 2.03146

And the reconstruction of patterns and cameras sort of matches what I would expect from the images:

Screenshot from 2023-05-10 09-03-43

Ok, so it seems that your last camera is only getting views of one pattern, and at that, the views 1) only show a small part of the pattern and 2) the pattern's pose only changes by translation, not rotation. So just like before with your smaller dataset, all of the cameras need different views to perform the estimation.

I hope that helps! Best A

TabassumNova commented 1 year ago

Thank you so much @amy-tabb for your reply. I will look into it and keep you updated.

TabassumNova commented 1 year ago

Hi Amy,

I have tried as you suggested. I am also getting Reprojection error, rrmse: 2.03133 after minimization. But the translation vectors that I got from "camera_cali_minimization1.txt" seems confusing. The orientation and distances between cameras do not match. Here I am adding one image of our setup.

This is what I am understanding. The extrinsic parameters of the cameras are calculated with respect to world coordinate system. And the world coordinate system is defined by the coordinate system of a reference pattern p observed at reference time t. So the world coordinate system was on my calibration object at time t. Please correct me if I am wrong.

IMG_8673

amy-tabb commented 1 year ago

Hello,

Here's a link to the output from when I ran the dataset:

https://ars-usda.box.com/s/rwpjqchndat59ekum5kssmy7g0rddc76

Ok, first things -- that the camera positions do not match exactly the physical setup. Using the dataset above, I computed the camera centers for cameras 2, 3, 5: camera 2: 08320220 camera 3: 08320221 camera 5: 36220113

The distance from camera 3 to camera 2 is 90cm in the above figure, and between camera 3 and 5 is 100cm.

Recall the camera center c = = -R.transpose()*t. So C2 = -189.772 944.483 -69.393 C3 = -664.74 757.84 772.07 C5 = 72.543 248.681 891.033

The distance between camera 2 and 3 is= 984.12mm. So this estimated distance is +84mm different from your measurements.

The distance between camera 3 and 5 is= 903.88mm. This estimated pose is -93mm difference from your measurements.

Ok, this is not too bad given the images.

Why? So in camera calibration you're trying to minimize a cost function that nonlinear cost function, and there are several sets of parameters will return similar error with the types of poses you have. For instance, if you only have fronto-parallel views, one could adjust the focal length (for all poses) and translation vectors (one per object pose in this case) to get similar-valued error.

How to fix this? Get even more poses, but with the calibration object at different depths from the camera, where the calibration objects shows up in more regions of every camera -- center and corners. Also, tilted the calibration object such that the object's plane(s) are not fronto-parallel to the image plane, but instead is at a 45-angle to the image plane.

This link talks about how to get a good dataset, in the section headed 'Data Collection'. https://www.skiprobotics.com/articles/2021-08-13-camera-calibration/

The relevant paragraphs are here:

With camera calibration there is ambiguity between the calibration target's distance from the camera and the focal length which cannot be resolved by frontoparallel images of the target. We need images in the dataset which capture control points with a wide range of camera frame z-values in order to observe the focal length. A good way to get this type of data is by holding the target at a 45 degrees tilt from the camera's optical axis. Higher angles tend to reduce accuracy because the foreshortening impacts our ability to precisely locate control point corners in the image.

The distortion function introduces two additional observability considerations. First, we need each target observation to span a large enough area to make the lens' distorting effects visible. We recommend having the calibration pattern span 25% to 50% of each image. Second, we need observation of the calibration pattern around the edges of the image. Lens distortion is typically minimal at the image center -- this is where most lenses behave like a pinhole camera.

Finally, you have greyscale images which are saved as color images (3 channels) and consequently are huge. I highly recommend creating a script to convert these images to greyscale, it may save you a lot of headache because you can run camera calibration with more images and not overload your RAM. You will likely need a lot of images because you will need a variety of views in each camera.

That's a lot! Let me know if you have questions about the above. Best A

TabassumNova commented 1 year ago

Hi Amy,

Thank you so much for your detailed explanation. I realize I need to work more to get good poses and your insights will be valuable in doing so. I don't see any output file in the above link. Could you please check again?

amy-tabb commented 1 year ago

Apologies. Here is the link and it is edited in the above reply as well.

https://ars-usda.box.com/s/rwpjqchndat59ekum5kssmy7g0rddc76