JakobEngel / dso

Direct Sparse Odometry
GNU General Public License v3.0
2.31k stars 911 forks source link

Correcting the scale drift of DSO calculated path. #105

Open tomcoders opened 7 years ago

tomcoders commented 7 years ago

I want to fix the scale drift of DSO calculated path. If I obtain an estimate(IMU/GPS) of the scale, how can I apply correction to the path. Which modules do I need to concentrate on and understand in depth? I found these parameters in the source code in Hessianblocks.h.

define SCALE_IDEPTH 1.0f

define SCALE_XI_ROT 1.0f

define SCALE_XI_TRANS 0.5f

Will adjusting these parameters dynamically help in correcting the drift?

s3r637 commented 7 years ago

Issue#36 also addresses this topic. However, good question! There are multiple solutions for this. But first keep it simple and stupid. You could try to integrate the additional information into the direct image alignment (tracking) to provide better (pre-initialization of) poses for new frames. In this case you have to do some assumptions (beware, assumption is the mother of all ..), e.g. no motion on the first frame especially in the case of IMU only. You have to care about DSO initialization and tracking. However, a fusion based on a kind of Kalman filter could also be a possible solution.

More advanced techniques are also available, let yourself be inspired by:

cheers

tomcoders commented 7 years ago

Thank you @s3r637 for your recommendations..

tomcoders commented 7 years ago

@s3r637 I have understood that the scale factor is unknown due to lack of knowledge of depth, but I have not understood why it drifts with time. From my experiments I find that the scale factor follows a decreasing or increasing curve with time and that the variation is not random noise. Could you please share your ideas on why does scale drift at all?

NikolausDemmel commented 7 years ago

Drift in scale happens for the same reason as drift in position and orientation. In general, SLAM and VIO systems drift in the dimensions that are unobservable. E.g. for 2D laser SLAM, global position and orientation is unobservable, therefore a laser SLAM system drifts in these 3 dimensions. For monocular SLAM, the global pose and scale of the scene is unobservable, therefore any system will drift over time in these 7 dimensions.

tomcoders commented 7 years ago

Thank you @NikolausDemmel . So the only way to correct them is to feed the unknown information into the system. And so if we give into DSO, global information like that of GPS, we can get an estimation of the scale of the DSO calculated path and from that we can derive the information of scale of the scene.

NikolausDemmel commented 7 years ago

You can do two things. Either you adapt DSO to include additional sensor information that makes scale observable, e.g. with stereo images (see the stereo DSO paper here), IMU, odometry, height sensor, ultrasound, laser scanner, GPS, ... In most cases doing this properly can get quite involved.

Alternatively, you can leave DSO as is, and fuse the output with additional sensor data in a subsequent step that estimates the drifting scale, e.g. with something like msf.

s3r637 commented 6 years ago

@tibitomabraham In general, scale drift occurs because the relative scale of a frame is only known in relation to its predecessor(s). Any inaccuracy in the measurement (pose estimation) is propagated and this may cause the scale to successively change over time.

Cheers

wlsh24 commented 6 years ago

Hello everyone, I also have a few questions about correcting the scale from the DSO reconstruction but more related to the code.

1) I am interested the second part of @tibitomabraham original question regarding:

... source code in Hessianblocks.h.

define SCALE_IDEPTH 1.0f

define SCALE_XI_ROT 1.0f

define SCALE_XI_TRANS 0.5f

Will adjusting these parameters dynamically help in correcting the drift?

Can someone explain why the state vector needs to be scaled by these factors ?

2) Assuming I know at the creation of the keyframe 2-3 (before any marginalization), that the scale should be twice bigger, which part of the code should be changed to correct this scale ? 3) A solution would be to correct the inverse depth and pose of the first keyframes but I am not really sure how to handle the energy term in this case... Does someone have any suggestion ?

ghost commented 6 years ago

@NikolausDemmel I am trying to leave DSO as is and trying to integrate IMU measurements.

Thanks in advance

NikolausDemmel commented 6 years ago

What is the reasoning for suggesting EKF ( as used in msf).

Since it is simple and should work more or less out of the box with no code changes, just like it does, e.g. for PTAM.

I could not understand why we are trying to predict the next position other than trying to find Kalman Gain, since it is observable from both IMU and SLAM as our main goal is to find the scale and use it in DSO.

I don't understand what you are saying here.

Why cannot we employ a structure like in Visual Inertial OrbSLAM

Yes, you can probably do this as well. I just said that it would be more involved.

NikolausDemmel commented 6 years ago

Can someone explain why the state vector needs to be scaled by these factors ?

My understanding (although I am not sure) is that this is done purely for numerical reasons, to make the optimization better conditioned and thus converge faster / give more accurate results.

Assuming I know at the creation of the keyframe 2-3 (before any marginalization), that the scale should be twice bigger, which part of the code should be changed to correct this scale ?

A solution would be to correct the inverse depth and pose of the first keyframes but I am not really sure how to handle the energy term in this case... Does someone have any suggestion ?

You will probably carefully have to make sure to multiple all the state related to translation by the same scale factor (i.e camera position and point depth), but it might be tricky to not forget something.

An alternative that is maybe simpler to implement could be that you save this scale factor, and apply it whenever you use the output, i.e. whenever you use the resulting poses of point depths to display / save / process further.