decadenza / DirectStereoRectification

"Rectifying Homographies for Stereo Vision: Analytical Solution for Minimal Distortion": algorithm to compute the optimal rectifying homographies that minimise perspective distortion.
GNU General Public License v3.0
27 stars 5 forks source link

Rectification with comparatively low overlap between views #9

Closed KevinCain closed 11 months ago

KevinCain commented 1 year ago
  1. I have both an X and Y offset between cameras on a fixed rig, where one camera is fronto-planar and the other is at a ~20^ angle relative to the first. Can you confirm this is suitable for DSR?

I see that, unlike OpenCV methods, DSR supports any two arbitrary camera translations, as noted in https://github.com/decadenza/DirectStereoRectification/issues/5, and it seems camera normals can differ, as noted in https://github.com/decadenza/DirectStereoRectification/issues/6.

  1. Given the camera rotation above, we have less overlap that we'd like between views. This is true for chessboard calibration but also capture, since I've found that rectification towards the image edges using SimpleStereo+DSR has unavoidable disparity, as noted in https://github.com/decadenza/DirectStereoRectification/issues/2.

Given this tricky rig, it would be helpful to be able to use self-calibration of some kind as an input to DSR. Is there anything recent you can recommend or a lead to follow?

decadenza commented 1 year ago

Hello and sorry for the late reply.

  1. I can confirm that any configuration is suitable for DSR. Since it does not rely on mathematical optimisation but on pure analytic results, it always provides a solution, also for extreme skewed configurations like in the example provided in the library.

  2. The practical issue in extremely skewed configurations is indeed obtaining calibration parameters as you commonly need to capture chessboard images so that the chessboard is fully visible from both cameras at the same time.

I set up a simulation with two cameras XY shifted and with relative angle of 20 degrees. You did not provide the XY shift so I guessed that.

Screenshot_2023-07-02_11-00-57

I put two cubes as generic objects. In the example you can have left and right images as:

As you can see, there may be quite a good area of overlap so you can calibrate using the chessboard placed in that area. Maybe try with a smaller chessboard? That depends on your XY shift.

If, like you say, you don't have the necessary overlap, then you can:

  1. Extract only the intrinsic camera parameters from each camera. Better to obtain these in the classic way so to remove some degrees of freedom from the problem.
  2. Try to apply one of the methods for self-calibration applied only to extrinsic parameters.
  3. Compute rectification transforms via DRS.
  4. Capture and process images with any stereo matching algorithm of your choice.

I have never applied any method for self calibration as they normally don't deliver the same accuracy of the proper calibration. Since I worked in metrology, the accuracy came first. I would be happy to help, though.

Keep me posted. Regards.

N.B. It is interesting that this is the same principle of temporal stereo, i.e. you use only one camera (with known intrinsics) and take two pictures in two different positions, then you estimate the extrinsic parameters [R | t] before rectifying and perform the stereo matching. Of course, with two relatively fixed cameras [R | t] is fixed and their estimation can be repeated and improved over time.

KevinCain commented 1 year ago

Thanks for not only for framing your experience but performing a synthetic test case for chessboard calibration akin to our setup! You make a good point about potential for extrinsic-only calibration, if direct camera calibration fails.

We have the option of using two identical grayscale cameras, in which case are set up is similar to a single camera with temporal stereo. However if possible I would like to calibrate two cameras of different kinds, so that we can exploit a higher resolution color image for texture.

My fear is that it will be difficult to cover enough of the shared 3d camera space needed for high quality camera calibration -- particularly sampling chessboard images towards the edge of frame.

I'll keep you posted and close this issue once I have the chance to post some initial results.

Thanks again.

On Sun, Jul 2, 2023, 3:30 AM decadenza @.***> wrote:

Hello and sorry for the late reply.

1.

I can confirm that any configuration is suitable for DSR. Since it does not rely on mathematical optimisation but on pure analytic results, it always provides a solution, also for extreme skewed configurations like in the example https://github.com/decadenza/DirectStereoRectification/tree/master/img provided in the library. 2.

The practical issue in extremely skewed configurations is indeed obtaining calibration parameters as you commonly need to capture chessboard images so that the chessboard is fully visible from both cameras at the same time.

I set up a simulation with two cameras XY shifted and with relative angle of 20 degrees. You did not provide the XY shift so I guessed that.

[image: Screenshot_2023-07-02_11-00-57] https://user-images.githubusercontent.com/30215028/250348987-e4c12a3a-1913-44d6-83b6-5ea7b2504cc2.png

I put two cubes as generic objects. In the example you can have left and right images as:

https://user-images.githubusercontent.com/30215028/250349155-58413099-2921-42de-8d7f-6f423b4f52a6.png

https://user-images.githubusercontent.com/30215028/250349161-d72826bb-2e96-47e6-971b-d47e4a98e3ab.png

As you can see, there may be quite a good area of overlap so you can calibrate using the chessboard placed in that area. Maybe try with a smaller chessboard? That depends on your XY shift.

If, like you say, you don't have the necessary overlap, then you can:

  1. Extract only the intrinsic camera parameters from each camera. Better to obtain these in the classic way so to remove some degrees of freedom from the problem.
  2. Try to apply one of the methods for self-calibration applied only to extrinsic parameters https://scholar.google.co.uk/scholar?hl=en&as_sdt=0,5&q=extrinsic+parameters+self+calibration .
  3. Compute rectification transforms via DRS.
  4. Capture and process images with any stereo matching algorithm of your choice.

I have never applied any method for self calibration as they normally don't deliver the same accuracy of the proper calibration. Since I worked in metrology, the accuracy came first. I would be happy to help, though.

Keep me posted. Regards.

N.B. It is interesting that this is the same principle of temporal stereo, i.e. you use only one camera (with known intrinsics) and take two pictures in two different positions, then you estimate the extrinsic parameters [R | t] before rectifying and perform the stereo matching. Of course, with two relatively fixed cameras [R | t] is fixed and their estimation can be repeated and improved over time.

— Reply to this email directly, view it on GitHub https://github.com/decadenza/DirectStereoRectification/issues/9#issuecomment-1616584888, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABJMHA7YQSPFGKFZIRUW57DXOFEVNANCNFSM6AAAAAAZXXVEPY . You are receiving this because you authored the thread.Message ID: @.***>

KevinCain commented 11 months ago

I neglected to mention in this thread that I'm using two different cameras in our rig. Is this supported by SimpleStereo? It does appear that some supported forms of calibration (SimpleStereo/CV calibration) allow separate intrinsics and coefficients.

The rig consists of three monochrome 1K fisheye cameras (left, center, right) and one RGB camera (center). The left and right fisheye cameras are tilted away from the axis of the center RGB camera: that is, the left fisheye camera looks off left, and the right fisheye camera looks right. That seriously cuts down on the possible shared viewing space between left or right fisheye cameras and the color camera at center. Here's an example: stereo_pair

As you can see, framing the chessboard pattern in the far left of the color image corresponds to far-right framing in the left camera. This means the full geometry of the lenses are poorly sampled, since we cannot shoot for near-complete coverage of the frame over the whole calibration sequence. Consequently the reported reprojection error is ~75 pixels(!) Here is a link to the input data and Python script. The script is simply your example rig, also included below the signature here. I can see a couple options:

import sys
import os

import cv2

# C:\Users\14424\AppData\Local\Programs\Python\Python311\Lib\site-packages\simplestereo
import simplestereo as ss

"""
Build a stereo rig object calculating parameters from calibration images.
"""

# Paths
curPath = os.path.dirname(os.path.realpath(__file__))
# Image folder
loadPath = os.path.join(curPath, "revok", "chessboard_cd")
# Destination
saveFile = os.path.join(curPath, "revok", "rig.json")

# Total number of images
N_IMAGES = 35

# Image paths
# NOTE: LEFT are PGM, RIGHT are JPEG
images = [(os.path.join(loadPath, f'left ({i+1}).pgm'), os.path.join(loadPath, f'right ({i+1}).jpg')) for i in range(N_IMAGES)]

print(f"Calibrating using {len(images)} images from:\n{loadPath}...")

# Calibrate and build StereoRig object
# Chessboard: (7,6) 52mm squares
rig = ss.calibration.chessboardStereo(images, chessboardSize=(7,6), squareSize=52.0)

# Save rig object to file
rig.save(saveFile)

# Print some info
print("Saved in:", saveFile)
print("Reprojection error:", rig.reprojectionError)
print("Centers:", rig.getCenters())
print("Baseline:", rig.getBaseline())

print("Done!")
decadenza commented 11 months ago

Hi KevinCain,

Different cameras are supported, as each one would have its own intrinsic parameters. However fisheye cameras are not correcly managed by SimpleStereo yet, as this require different distortion parameters.

Specific functions in calibration.py and _rigs.py may be extended rather easily to support the fisheye cameras by using the corresponding OpenCV methods described here.

If you do this, please share as it could become part of the library.

So, nothing seems wrong with your script (provided good calibration), it's just that it's using wrong distortion model for the fisheye camera.

Good luck!

KevinCain commented 11 months ago

Hello @decadenza,

Well, I've run SS stereo rig calibration a number of times, but the disparity calculations from the rectification results are poor. I think the initial calibration coverage is to blame.

During chessboard calibration we have <1 pixel of reprojection error. However, by necessity the chessboard images are small in frame since we need to see the same chessboard in both cameras, as above. Therefore the total area of the frames the chessboard can occupy for the calibration photos is limited. That is, while we have subpixel reprojection I think we have poor calibration.

For example, here are the matrices from a rectified rig, which show a meager ~3.89 pixel baseline on X:

cam0=[579.244618 0 502.39877969; 0 579.51166712 510.58215499; 0 0 1]
cam1=[579.70322722 0 506.29166502; 0 579.02887502 506.23262379; 0 0 1]

Since the chessboard occupies a small region of the images, the calibration may be very accurate for those regions but not necessarily for the entire field of view, especially given distortion for the ~120^ FOV.

Worse, the chessboard cannot be captured in a way that covers many different depths and angles, the calibration might only be valid for the particular depth and orientation mostly seen during calibration. A low reprojection error does not necessarily guarantee that these extrinsic relations between the stereo cameras in our rig are accurate, that is -- the calibration setup did not adequately constrain these parameters.

I'm using your 'ss.passive.StereoASW' to proof the quality of the rectification results -- which look pretty bad, likely owing to the short baseline above making depth inference hard and inaccurate: image

As a sanity check, here I use well rectified source in the same 'ss.passive.StereoASW', and OpenCV SGBM, methods: image

Any fresh suggestions?

KevinCain commented 11 months ago

Below I use brute force matching to find features in two rectified images, then calculate RMSE and mean absolute error (MAE) between their distances to judge rectification quality along epipolar lines. See my code here.

Note that I scale RMSE and MAE by image size so we can directly compare values -- the error in the rectified images from SS is an order of magnitude greater:

matches Scaled RMSE along epipolar lines (x-axis): 0.6026319368617741 Scaled MAE along epipolar lines (x-axis): 0.6002280167707308

matches Scaled RMSE along epipolar lines (x-axis): 0.03597576353285048 Scaled MAE along epipolar lines (x-axis): 0.03113951947953966