JakobEngel / dso

Direct Sparse Odometry
GNU General Public License v3.0
2.27k stars 906 forks source link

Real world depth information #241

Closed chapa17 closed 2 years ago

chapa17 commented 2 years ago

Hello,

I am trying to obtain the real time(in world co-ordinate system) the depth information but I am unable to get it. I implemented the code in the Output Wrapper but when I visualize it in ROS or other viewer the depth doesn't seem to be correct. I get depth values mostly in the range of 0.1 to 2 in TUM dataset. I suspect those are no the true depths. Is there something I am missing while the implementation.

If anyone has any idea please let me know.

Thank you!

NikolausDemmel commented 2 years ago

From a monocular camera you cannot get true metric scale with just geometry, so the output will have an arbitrary scale factor. The initial keyframe is initialized to a mean depth around 1, so the values you are seening make sense.

chapa17 commented 2 years ago

Hello @NikolausDemmel

I just came across this issue. I did not understand it completely but I think it describes the same issue which I am facing currently to obtain the true depth value. Is that correct?

Also is it possible to get the true values by changing the initial keyframe value to a fixed value of xxx (height of camera from ground)?

NikolausDemmel commented 2 years ago

I just came across this issue. I did not understand it completely but I think it describes the same issue which I am facing currently to obtain the true depth value. Is that correct?

Yes.

To be precise, there are 2 issues you are facing: 1) you don't know the correct scale factor for the first keyframe and 2) over time this scale factor will also drift, so even if you somehow manage to correctly scale the first frame, the scale of the system will still drift away from correct metric scale during operation.

Also is it possible to get the true values by changing the initial keyframe value to a fixed value of xxx (height of camera from ground)?

Height over ground alone is no use (unless you can also identify the pixels that correspond to ground in the depth map and scale those). You would need to rescale the depth map. Another way using poses only is by taking a bunch of keyframes with sufficient translational motion where you know the true translational motion in meters. By aligning trajectories you can estimate the scale factor as well. But problem 2) from above remains.

In general, solving these issue well is a complex problem and usually depends exploiting application specific knowledge. There is no simple fix that works in all situations.

chapa17 commented 2 years ago

Thank you @NikolausDemmel for a brief explanation.