DepthAnything / Depth-Anything-V2

Depth Anything V2. A More Capable Foundation Model for Monocular Depth Estimation
https://depth-anything-v2.github.io
Apache License 2.0
3.2k stars 253 forks source link

What's the meaning of the model's output?How can I get the true distance from the model output? #93

Open duyuwen-duen opened 1 month ago

duyuwen-duen commented 1 month ago

As it is mentioned in KITTI datasets(https://europe.naverlabs.com/research/computer-vision/proxy-virtual-worlds-vkitti-1/),the value of Depth values are distances to the camera plane obtained from the z-buffer that a pixel intensity of 1 in our single channel PNG16 depth images corresponds to a distance of 1cm to the camera plane. So the depth value should be small when it's a close object.However,the depth value is big when it's a close object in this method,which is totally different from the groundtruth.Could you tell me the reason and how can I get the true distance from the model output?

Edric-star commented 1 month ago

Hi, did you try the metric depth methods? It outputs the distances with units in meters. Please check it here.

duyuwen-duen commented 1 month ago

I find there is same problem with run.py.The close object is white,which means the value is big,but I think if the output of the model should like the groundtruth,whose close object should be black,because the depth value represents the distance?There is result the result I get.I think the result is reversed,but I don't not why.

duyuwen-duen commented 1 month ago

If I use the metric/run.py,there are the same problem.Besides that,how can I decide the value of max_depth?Looking forword to your reply! ![Uploading 1341848025.462832.png…]()

Edric-star commented 1 month ago

It seemed like you have used indoor pictures, please make sure that you used indoor hypersim model to predict

duyuwen-duen commented 1 month ago

If I use the outdoor pictures,it's also same.The value of the close object is bigger than far object,which is white.But if the depth value represents the distance,the value of the close object should be smaller than far object. demo02

heyoeyo commented 1 month ago

The (relative) models output depth that is like ~1 / true depth. So things that are far away output values close to zero (e.g. 1 / large number) and things that are very close output large values (e.g. 1 / small number), which is why the black/white values seem reversed.

The paper describes this output as "affine-invariant inverse depth" (see section 5.2 on page 6). The v1 paper describes it in a bit more detail in section 3.1 (page 3). I also have a description of it with some diagrams here.

duyuwen-duen commented 1 month ago

Hi, did you try the metric depth methods? It outputs the distances with units in meters. Please check it here.

You are right!I use the old checkpoints to run the model,as soon as I change the checkpoints in metrics/README.md,I find it's right!

duyuwen-duen commented 1 month ago

The (relative) models output depth that is like ~1 / true depth. So things that are far away output values close to zero (e.g. 1 / large number) and things that are very close output large values (e.g. 1 / small number), which is why the black/white values seem reversed.

The paper describes this output as "affine-invariant inverse depth" (see section 5.2 on page 6). The v1 paper describes it in a bit more detail in section 3.1 (page 3). I also have a description of it with some diagrams here.

Thank you for your reply!I haven't notice that,and I have been confused for a long time!Now I got it.Thank you very much!!

bhack commented 1 month ago

@duyuwen-duen the annoying thing is that also with the metric depth we need to specify the max distance x frame. Especially in a sequence where the camera is moving a lot over Z It could be a real pain.

rhelck commented 1 month ago

Hi, did you try the metric depth methods? It outputs the distances with units in meters. Please check it here.

You are right!I use the old checkpoints to run the model,as soon as I change the checkpoints in metrics/README.md,I find it's right!

In the run.py file, are the raw metric distances what are saved in the .npy file?

duyuwen-duen commented 1 month ago

Hi, did you try the metric depth methods? It outputs the distances with units in meters. Please check it here.

You are right!I use the old checkpoints to run the model,as soon as I change the checkpoints in metrics/README.md,I find it's right!

In the run.py file, are the raw metric distances what are saved in the .npy file?

If you use the metric/run.py,and use the appropriate model in metric/README.md,you can gain a depth picture whose value represents the unit meters.

rhelck commented 1 month ago

I think I got it, its just the raw output unadjusted for 255 color spectrum? Setting the max depth helped quite a bit

duyuwen-duen commented 1 month ago

I think I got it, its just the raw output unadjusted for 255 color spectrum? Setting the max depth helped quite a bit You are right.If you use the model in metric/depth,the output of the model is represents the distance ,and if you want to show it in a picture,you can adjust it for 255 color spectrum.If you only want to use the value of the depth,you don't need to adjust for 255 color spectrum.

DebbyX3 commented 1 month ago

Sorry for the reiteration, but I'm not sure if I'm getting it: the .npy files generated when running metric_depth/run.py contain the estimated depths from the camera in meters for each pixel, correct? (ofc with the correct models/checkpoints suggested in the metric_depth/README.md)

So the image is just a normalized version of them to fit 0-255... Because gray value of pixel != value of array in the same point (also because the arrays contain floats)

duyuwen-duen commented 1 month ago

Sorry for the reiteration, but I'm not sure if I'm getting it: the .npy files generated when running metric_depth/run.py contain the estimated depths from the camera in meters for each pixel, correct? (ofc with the correct models/checkpoints suggested in the metric_depth/README.md)

So the image is just a normalized version of them to fit 0-255... Because gray value of pixel != value of array in the same point (also because the arrays contain floats)

image ![Uploading image.png…]() Yes, you are correct. However, I have saved the depth data as a .png file, where the pixel values can represent distances. This means I have altered the run.py script, changing the significance of the .png file. For instance, if the depth value is 1.23, I add 1000 to it, resulting in a value of 1230, which I then store in a 16-bit format.But if you don't need it, you needn't to change it.

rhelck commented 1 month ago

Ok this makes sense @duyuwen-duen , many thanks for the responses.

szhang963 commented 1 month ago

Hi, I got a bad result when I used the metric/run.py and depth_anything_v2_metric_vkitti_vitl.pth to test the outdoor image. image However, the result is right for using run.py. image The original image is here. ![Uploading image.png…]()

@duyuwen-duen, I have used your outdoor image, and I still got a bad result. image

Could someone provide some suggestions?