UZ-SLAMLab / ORB_SLAM3

ORB-SLAM3: An Accurate Open-Source Library for Visual, Visual-Inertial and Multi-Map SLAM
GNU General Public License v3.0
6.48k stars 2.54k forks source link

Does ROS Interface(Stereo+IMU) version native support fisheye camera ? #101

Open ShilangChen1011 opened 4 years ago

ShilangChen1011 commented 4 years ago

I am using the Intel RealSense T265 camera. I have calibrated it well by Kalibr. But I found the ORB-SLAM3 did not fitted with this camera, it detected little features in the image. So that it always tracked failure as following. Does fisheye image need stereo rectification? Does default code in the ros_stereo_inertial OK with the fisheye camera? Do I need do other operation to fit this camera? Screenshot from 2020-09-09 16-02-37

jj-gomez commented 3 years ago

I would need more information to figure out correctly you issue. When you say that ORB-SLAM3 detects a small number of features , do you mean that almost no FAST corner are extracted or that FAST corners are correctly extracted but they are not matched in consecutive images?

Our stereo fisheye does not require rectification but requires a few extra parameters in the calibration file. I am posting here another answer I wrote related with those parameters in another issue (#88 ):

@ankuPRK @XiongLianga This will solved in a new realease this week. However, I will provide you with a short explanation:

Stereo cameras are usually rectified for the ease of finding stereo matches. This rectifying step is very well-known for pinhole cameras and it has negligible effects on the FoV of the original images.

However, rectifying stereo fisheye cameras yields to a trade back depending of the calibration of the rectified images:

  • Keep the characteristic FoV of the fisheye cameras but greatly magnify objects at the boundaries of the image and loose resolution at the center of the image.
  • Loose a huge quantity of FoV but make things in the center of the image look fine (like a pinhole camera).

Original image and rectifying versions with different calibrations

These reasons discourage the use of rectification over fisheye images. Instead of that, we use the 2 raw stereo images modifying in a new stereo approch. In this approach, we consider the stereo rig as 2 monocular cameras constrianed by:

  1. A constant relative SE(3) transformation.
  2. Optionally an image region that observe the same portion of the scene (the lapping area).

Within this lapping area, we can get stereo, truescaled, points which combined with the relative pose of the cameras allows us to retrieve the real scale of the map. This lapping area is likely to be different depending of your stereo rig so we decided to include it as a parameter to identify in which regions of the image should consider features as "stereo" or "monocular".

lapping area example

This approach allows us to use all the features extracted in both images which renders a more robust stereo system. For example, if one camera gets completely occluded, the other one can be used for a monocular tracking in the meanwhile the other occluded.

Since the code is in a beta version and we ran a lot of experiments, there is still a lot of dead code and hardcoded parameters (in this case, the lapping area). That is why I apologize for the confusion. However, as said before, this will solved shortly with a parameter parser and a correct implementation of the lapping area.

Regarding ROS stereo+IMU, it should work if provided the correct calibration.

It would be also helpful some example images in which you are having troubles to process.

longzhenren commented 3 years ago

Hello, maybe you can try the followings: 1.Download and Install the Realsense SDK for Windows Platform 2.Run the program "rs-sensor-control.exe", which maybe in the directory like this: "C:\Program Files (x86)\Intel RealSense SDK 2.0\tools" 3.get the internal and external parameters (external parameters contains a matrix for converting between the cameras and a pair of matrix for converting from one of the cameras to the body(IMU),and internal parameters contains two pairs of values of the IMU, one distortion correction matrix and four values ) 4.Copy the yaml file "Examples/Stereo-Inertial/TUM_512.yaml" and replace the values you got in step 3. Note that you should do a square root for the four values you got for the IMU. Intel has saved the parameters inside the T265 Camera, So maybe you should not use the Kalibr, the parameters I got with Kalibr didn't work well. I successfully ran ORB_SLAM3 with a T265 using this method, Hope this can help you. (Maybe my English is too weak so you cannot get the ideas :( ) 就是别用Kalibr,获取相机内已经保存的数据,填进去,IMU的四个数据记得开平方,over 还整不明白可以邮件联系:hljzhangzhibo@buaa.edu.cn

malleshamdasari commented 3 years ago

I used the default parameters given by real sense, the tracking is really bad. I did with Kalibr, it seems good.

ryleymcc commented 3 years ago

Can you elaborate more on this topic. For simplicity I am leaving out the IMU for now and i am trying to run Stereo with t265. I am confused about the lapping parameters too.

After running rs-sensor-control I get the the extrinsic transformation matrix from right camera to left camera(fisheye2 to fisheye1):

Translation Vector : [-0.0640212,-0.000119843,-8.24535e-05]
Rotation Matrix    : [0.999973,0.00361533,-0.00632303]
                       : [-0.00360718,0.999993,0.00130074]
                           : [0.00632768,-0.0012779,0.999979]

And the intrinsics for fisheye1:

Principal Point         : 429.849, 393.509
Focal Length            : 286.641, 286.546
Distortion Model        : Kannala Brandt4
Distortion Coefficients : [-0.0112001,0.0501633,-0.04698,0.009193,0]

And the intrinsics for fisheye2:

Principal Point         : 423.534, 401.768
Focal Length            : 286.157, 286.008
Distortion Model        : Kannala Brandt4
Distortion Coefficients : [-0.00927463,0.0461914,-0.043395,0.00802249,0]

I based the Camera.bf parameter off of the .065m distance between fisheye1 and fisheye2 multiplied by Camera2.fx: 286.1571960449219

This is my yaml file. Notice that I haven't changed the lapping parameters because I don't know where to get them.

%YAML:1.0

#--------------------------------------------------------------------------------------------
# Camera Parameters. Adjust them!
#--------------------------------------------------------------------------------------------
Camera.type: "KannalaBrandt8"

# Left Camera calibration and distortion parameters (OpenCV)
Camera.fx: 286.6408996582031
Camera.fy: 286.5458068847656
Camera.cx: 429.8493957519531
Camera.cy: 393.5089111328125

# Kannala-Brandt distortion parameters
Camera.k1: -0.011200119741261005
Camera.k2: 0.05016331002116203
Camera.k3: -0.04697997868061066
Camera.k4: 0.0091929966583848

# Right Camera calibration and distortion parameters (OpenCV)
Camera2.fx: 286.1571960449219
Camera2.fy: 286.00799560546875
Camera2.cx: 423.5339050292969
Camera2.cy: 401.7677917480469

# Kannala-Brandt distortion parameters
Camera2.k1: -0.009274626150727272
Camera2.k2: 0.04619137942790985
Camera2.k3: -0.043395038694143295
Camera2.k4: 0.00802248902618885

# Transformation matrix from right camera to left camera

Tlr: !!opencv-matrix
  rows: 3
  cols: 4
  dt: f
  data: [ 0.999973,0.00361533,-0.00632303,-0.0640212,
            -0.00360718,0.999993,0.00130074,-0.000119843,
            0.00632768,-0.0012779,0.999979,-0.0000824535]
# Camera resolution
Camera.width: 848
Camera.height: 800

# Lapping area between images
Camera.lappingBegin: 0
Camera.lappingEnd: 511

Camera2.lappingBegin: 0
Camera2.lappingEnd: 511

# Camera frames per second
Camera.fps: 30.0

# Color order of the images (0: BGR, 1: RGB. It is ignored if images are grayscale)
Camera.RGB: 1

# Close/Far threshold. Baseline times.
ThDepth: 40.0
# I based this number off of the .065m distance between fisheye1 and fisheye2 multiplied by Camera2.fx: 286.1571960449219
Camera.bf: 18.600217743 

#--------------------------------------------------------------------------------------------
# ORB Parameters
#--------------------------------------------------------------------------------------------

# ORB Extractor: Number of features per image
ORBextractor.nFeatures: 1000

# ORB Extractor: Scale factor between levels in the scale pyramid
ORBextractor.scaleFactor: 1.2

# ORB Extractor: Number of levels in the scale pyramid
ORBextractor.nLevels: 8

# ORB Extractor: Fast threshold
# Image is divided in a grid. At each cell FAST are extracted imposing a minimum response.
# Firstly we impose iniThFAST. If no corners are detected we impose a lower value minThFAST
# You can lower these values if your images have low contrast
ORBextractor.iniThFAST: 20
ORBextractor.minThFAST: 7

#--------------------------------------------------------------------------------------------
# Viewer Parameters
#--------------------------------------------------------------------------------------------
Viewer.KeyFrameSize: 0.05
Viewer.KeyFrameLineWidth: 1
Viewer.GraphLineWidth: 1
Viewer.PointSize: 2
Viewer.CameraSize: 0.05
Viewer.CameraLineWidth: 2
Viewer.ViewpointX: 0
Viewer.ViewpointY: -10
Viewer.ViewpointZ: -0.1
Viewer.ViewpointF: 2000

I run the command rosrun ORB_SLAM3 Stereo /path/to/vocab/ORBvoc.txt /path/to/settings/t265_new.yaml false

my terminal output looks like this:

New Map created with 0 points
TRACK_REF_KF: Less than 15 matches!!
Fail to track local map!
LM: Active map reset recieved
LM: Active map reset, waiting...
LM: Reseting current map in Local Mapping...
LM: End reseting Local Mapping...
LM: Reset free the mutex
LM: Active map reset, Done!!!
mnFirstFrameId = 262
mnInitialFrameId = 261
132 Frames set to lost
^CFirst KF:132; Map init KF:131
New Map created with 0 points

SaveKeyFrameTrajectoryTUM to KeyFrameTrajectory_TUM_Format.txt ...

SaveTrajectoryTUM to FrameTrajectory_TUM_Format.txt ...

SaveTrajectoryKITTI to FrameTrajectory_KITTI_Format.txt ...

I get no tracked points using this method. The other ways I've i get some points but the tracking is not usable.

hanb257 commented 2 months ago

Have you solved it yet? I encountered the same problem.

Can you elaborate more on this topic. For simplicity I am leaving out the IMU for now and i am trying to run Stereo with t265. I am confused about the lapping parameters too.

After running rs-sensor-control I get the the extrinsic transformation matrix from right camera to left camera(fisheye2 to fisheye1):

Translation Vector : [-0.0640212,-0.000119843,-8.24535e-05]
Rotation Matrix    : [0.999973,0.00361533,-0.00632303]
                         : [-0.00360718,0.999993,0.00130074]
                           : [0.00632768,-0.0012779,0.999979]

And the intrinsics for fisheye1:

Principal Point         : 429.849, 393.509
Focal Length            : 286.641, 286.546
Distortion Model        : Kannala Brandt4
Distortion Coefficients : [-0.0112001,0.0501633,-0.04698,0.009193,0]

And the intrinsics for fisheye2:

Principal Point         : 423.534, 401.768
Focal Length            : 286.157, 286.008
Distortion Model        : Kannala Brandt4
Distortion Coefficients : [-0.00927463,0.0461914,-0.043395,0.00802249,0]

I based the Camera.bf parameter off of the .065m distance between fisheye1 and fisheye2 multiplied by Camera2.fx: 286.1571960449219

This is my yaml file. Notice that I haven't changed the lapping parameters because I don't know where to get them.

%YAML:1.0

#--------------------------------------------------------------------------------------------
# Camera Parameters. Adjust them!
#--------------------------------------------------------------------------------------------
Camera.type: "KannalaBrandt8"

# Left Camera calibration and distortion parameters (OpenCV)
Camera.fx: 286.6408996582031
Camera.fy: 286.5458068847656
Camera.cx: 429.8493957519531
Camera.cy: 393.5089111328125

# Kannala-Brandt distortion parameters
Camera.k1: -0.011200119741261005
Camera.k2: 0.05016331002116203
Camera.k3: -0.04697997868061066
Camera.k4: 0.0091929966583848

# Right Camera calibration and distortion parameters (OpenCV)
Camera2.fx: 286.1571960449219
Camera2.fy: 286.00799560546875
Camera2.cx: 423.5339050292969
Camera2.cy: 401.7677917480469

# Kannala-Brandt distortion parameters
Camera2.k1: -0.009274626150727272
Camera2.k2: 0.04619137942790985
Camera2.k3: -0.043395038694143295
Camera2.k4: 0.00802248902618885

# Transformation matrix from right camera to left camera

Tlr: !!opencv-matrix
  rows: 3
  cols: 4
  dt: f
  data: [ 0.999973,0.00361533,-0.00632303,-0.0640212,
            -0.00360718,0.999993,0.00130074,-0.000119843,
            0.00632768,-0.0012779,0.999979,-0.0000824535]
# Camera resolution
Camera.width: 848
Camera.height: 800

# Lapping area between images
Camera.lappingBegin: 0
Camera.lappingEnd: 511

Camera2.lappingBegin: 0
Camera2.lappingEnd: 511

# Camera frames per second
Camera.fps: 30.0

# Color order of the images (0: BGR, 1: RGB. It is ignored if images are grayscale)
Camera.RGB: 1

# Close/Far threshold. Baseline times.
ThDepth: 40.0
# I based this number off of the .065m distance between fisheye1 and fisheye2 multiplied by Camera2.fx: 286.1571960449219
Camera.bf: 18.600217743 

#--------------------------------------------------------------------------------------------
# ORB Parameters
#--------------------------------------------------------------------------------------------

# ORB Extractor: Number of features per image
ORBextractor.nFeatures: 1000

# ORB Extractor: Scale factor between levels in the scale pyramid
ORBextractor.scaleFactor: 1.2

# ORB Extractor: Number of levels in the scale pyramid
ORBextractor.nLevels: 8

# ORB Extractor: Fast threshold
# Image is divided in a grid. At each cell FAST are extracted imposing a minimum response.
# Firstly we impose iniThFAST. If no corners are detected we impose a lower value minThFAST
# You can lower these values if your images have low contrast
ORBextractor.iniThFAST: 20
ORBextractor.minThFAST: 7

#--------------------------------------------------------------------------------------------
# Viewer Parameters
#--------------------------------------------------------------------------------------------
Viewer.KeyFrameSize: 0.05
Viewer.KeyFrameLineWidth: 1
Viewer.GraphLineWidth: 1
Viewer.PointSize: 2
Viewer.CameraSize: 0.05
Viewer.CameraLineWidth: 2
Viewer.ViewpointX: 0
Viewer.ViewpointY: -10
Viewer.ViewpointZ: -0.1
Viewer.ViewpointF: 2000

I run the command rosrun ORB_SLAM3 Stereo /path/to/vocab/ORBvoc.txt /path/to/settings/t265_new.yaml false

my terminal output looks like this:

New Map created with 0 points
TRACK_REF_KF: Less than 15 matches!!
Fail to track local map!
LM: Active map reset recieved
LM: Active map reset, waiting...
LM: Reseting current map in Local Mapping...
LM: End reseting Local Mapping...
LM: Reset free the mutex
LM: Active map reset, Done!!!
mnFirstFrameId = 262
mnInitialFrameId = 261
132 Frames set to lost
^CFirst KF:132; Map init KF:131
New Map created with 0 points

SaveKeyFrameTrajectoryTUM to KeyFrameTrajectory_TUM_Format.txt ...

SaveTrajectoryTUM to FrameTrajectory_TUM_Format.txt ...

SaveTrajectoryKITTI to FrameTrajectory_KITTI_Format.txt ...

I get no tracked points using this method. The other ways I've i get some points but the tracking is not usable.