Mismatch in inference on image used in paper

akhilStyl commented 1 year ago

Hi, this is awesome work! Congratulations!

I tried to make an inference on one of the photos used in the paper figures (attached). The depth map I generated (attached) using the pre-trained model looks very different from the depth map published in the paper (mine almost doesn't have anu Blue color). Also, I used the function for generating point cloud to estimate the metric depth of the marked point in the image. Whereas the paper mentions a metric depth of 5.82 meters, I am getting 3.5 meters.

Could you please help me with what I may be doing wrong? Any pointers would be highly appreciated.

Thanks!

choyingw commented 1 year ago

What's the color code you used here to visualize depth and which checkpoint did you use to estimate the depth?

akhilStyl commented 1 year ago

Hi! I ran the demo.py script using the DPT Large checkpoint (first one on the Evaluation table in the repo). The color code scale is from 0 to 5 (basically I didn't change the color scale).

I also ran the evaluation script using the same image and same checkpoint. This time I got a depthmap that looks like the one you published in the paper. But on inference, the depth still seems off. I am now wondering if there's a problem with how I am calculating metric depth from the prediction. So I took an image and it's depth map from your repo (the one you give as an example to run the visualize_pc.py script). the image is at ./data)/sample_pc/0000.jpg. I simply assume that the X,Y,Z corresponds to the 3D coordinate of u,v pixel point. But on that basis, the width of the three blue vertical panels on the left half of the photo are all very different from each other, whereas they should all be same.

I am sure I must be making some mistake here. Could you please help me with which exact checkpoint to use and how to reproduce the 5,82 meter prediction on the image in your paper? Thank you so much!

gautamjajoo commented 1 year ago

Hey @akhilStyl, have you resolved the issue? I am also trying to run the sample image with the pre-trained model but I'm not getting the same result as shown in the paper.

facebookresearch / DistDepth

Mismatch in inference on image used in paper #12