FilippoAleotti / mobilePydnet

Pydnet on mobile devices
Apache License 2.0
253 stars 40 forks source link

Disparity to distance #37

Open rodrigoGA opened 3 years ago

rodrigoGA commented 3 years ago

First I want to congratulate you on the project.

I would like to know how to convert, the result of the model, to distance in meters.

Searching on the internet I found the following formula depth = baseline * focal / disparity

disparity: is the result of the model, a number between 0 and 1. (modelResult - min_result) / (max_result - min_result) baseline: seems to depend on the training data, for kitti dataset I have found values of is 0.54 or 0.22 focal: this i think is the focal length of the camera but i'm not sure i think a value close to 2262

I'm not sure if this is the correct way to do it, nor what parameters to use for the model you trained.

In case of exporting the model with different height and width, what parameters do I have to change?

python export.py --ckpt ckpt/pydnet \
        --arch pydnet \
        --dest "./" \
        --height 192 --width 192
FilippoAleotti commented 3 years ago

Hi,

Unfortunately, that formula doesn't hold here since the model is not stereo (nor is mimicking a stereo one). In particular, the model predicts an inverse depth map for the input image, thus you have to align each prediction using some known 3D points of the scene to obtain a metric depth map.

rodrigoGA commented 3 years ago

Thank you very much for your prompt response.

what do you mean by

you have to align each prediction using some known 3D points of the scene

Can your model be extended for stereo view?

LulaSan commented 3 years ago

I am also interested in how to obtain distance values from this network

rodrigoGA commented 3 years ago

I will tell you what I have researched on inverse distance, but I am not an expert on the subject, I have found everything in google

1 / D = V * a + b

D= physical distance V= inverse depth map value (result of the model) a and b are values that fit based on known points in the image, you can do least squares to fit a and b based on known points.

Now I have doubts with the fact that it is necessary to calibrate in each frame. Is this necessary even if the same enviroment is used with the same light and camera?

I have found this repository https://github.com/nianticlabs/manydepth which seems to make the prediction consistent over time. It would be interesting to include something similar

mpottinger commented 3 years ago

Thank you very much for your prompt response.

what do you mean by

you have to align each prediction using some known 3D points of the scene

Can your model be extended for stereo view?

For example AR/Slam provide a sparse point cloud as part of their tracking, so you have some depth points in the image as a reference, rescale to that. Or have another depth sensor. I know that sounds pointless because why do monodepth if you already have depth? Well, depth sensors have usually many holes in the depth map, especially on reflective surfaces, dark surfaces, etc. This might help with that.

wbzheng11 commented 1 month ago

Hi,

Unfortunately, that formula doesn't hold here since the model is not stereo (nor is mimicking a stereo one). In particular, the model predicts an inverse depth map for the input image, thus you have to align each prediction using some known 3D points of the scene to obtain a metric depth map.

Hello!

I understand that you mean to match the pixel value with the actual depth value, and there will be a conversion relationship.

But the image in the results file obtained by executing inference.py is RGB three-channel.

How should I match it with a one-dimensional depth value?