Open duyuwen-duen opened 1 month ago
Hi, did you try the metric depth methods? It outputs the distances with units in meters. Please check it here.
I find there is same problem with run.py.The close object is white,which means the value is big,but I think if the output of the model should like the groundtruth,whose close object should be black,because the depth value represents the distance?There is the result I get.I think the result is reversed,but I don't not why.
If I use the metric/run.py,there are the same problem.Besides that,how can I decide the value of max_depth?Looking forword to your reply! ![Uploading 1341848025.462832.png…]()
It seemed like you have used indoor pictures, please make sure that you used indoor hypersim model to predict
If I use the outdoor pictures,it's also same.The value of the close object is bigger than far object,which is white.But if the depth value represents the distance,the value of the close object should be smaller than far object.
The (relative) models output depth that is like ~1 / true depth. So things that are far away output values close to zero (e.g. 1 / large number) and things that are very close output large values (e.g. 1 / small number), which is why the black/white values seem reversed.
The paper describes this output as "affine-invariant inverse depth" (see section 5.2 on page 6). The v1 paper describes it in a bit more detail in section 3.1 (page 3). I also have a description of it with some diagrams here.
Hi, did you try the metric depth methods? It outputs the distances with units in meters. Please check it here.
You are right!I use the old checkpoints to run the model,as soon as I change the checkpoints in metrics/README.md,I find it's right!
The (relative) models output depth that is like ~1 / true depth. So things that are far away output values close to zero (e.g. 1 / large number) and things that are very close output large values (e.g. 1 / small number), which is why the black/white values seem reversed.
The paper describes this output as "affine-invariant inverse depth" (see section 5.2 on page 6). The v1 paper describes it in a bit more detail in section 3.1 (page 3). I also have a description of it with some diagrams here.
Thank you for your reply!I haven't notice that,and I have been confused for a long time!Now I got it.Thank you very much!!
@duyuwen-duen the annoying thing is that also with the metric depth we need to specify the max distance x frame. Especially in a sequence where the camera is moving a lot over Z It could be a real pain.
Hi, did you try the metric depth methods? It outputs the distances with units in meters. Please check it here.
You are right!I use the old checkpoints to run the model,as soon as I change the checkpoints in metrics/README.md,I find it's right!
In the run.py file, are the raw metric distances what are saved in the .npy file?
Hi, did you try the metric depth methods? It outputs the distances with units in meters. Please check it here.
You are right!I use the old checkpoints to run the model,as soon as I change the checkpoints in metrics/README.md,I find it's right!
In the run.py file, are the raw metric distances what are saved in the .npy file?
If you use the metric/run.py,and use the appropriate model in metric/README.md,you can gain a depth picture whose value represents the unit meters.
I think I got it, its just the raw output unadjusted for 255 color spectrum? Setting the max depth helped quite a bit
I think I got it, its just the raw output unadjusted for 255 color spectrum? Setting the max depth helped quite a bit You are right.If you use the model in metric/depth,the output of the model is represents the distance ,and if you want to show it in a picture,you can adjust it for 255 color spectrum.If you only want to use the value of the depth,you don't need to adjust for 255 color spectrum.
Sorry for the reiteration, but I'm not sure if I'm getting it:
the .npy
files generated when running metric_depth/run.py
contain the estimated depths from the camera in meters for each pixel, correct? (ofc with the correct models/checkpoints suggested in the metric_depth/README.md
)
So the image is just a normalized version of them to fit 0-255... Because gray value of pixel != value of array in the same point (also because the arrays contain floats)
Sorry for the reiteration, but I'm not sure if I'm getting it: the
.npy
files generated when runningmetric_depth/run.py
contain the estimated depths from the camera in meters for each pixel, correct? (ofc with the correct models/checkpoints suggested in themetric_depth/README.md
)So the image is just a normalized version of them to fit 0-255... Because gray value of pixel != value of array in the same point (also because the arrays contain floats)
![Uploading image.png…]() Yes, you are correct. However, I have saved the depth data as a .png file, where the pixel values can represent distances. This means I have altered the run.py script, changing the significance of the .png file. For instance, if the depth value is 1.23, I add 1000 to it, resulting in a value of 1230, which I then store in a 16-bit format.But if you don't need it, you needn't to change it.
Ok this makes sense @duyuwen-duen , many thanks for the responses.
Hi, I got a bad result when I used the metric/run.py and depth_anything_v2_metric_vkitti_vitl.pth to test the outdoor image. However, the result is right for using run.py. The original image is here. ![Uploading image.png…]()
@duyuwen-duen, I have used your outdoor image, and I still got a bad result.
Could someone provide some suggestions?
As it is mentioned in KITTI datasets(https://europe.naverlabs.com/research/computer-vision/proxy-virtual-worlds-vkitti-1/),the value of Depth values are distances to the camera plane obtained from the z-buffer that a pixel intensity of 1 in our single channel PNG16 depth images corresponds to a distance of 1cm to the camera plane. So the depth value should be small when it's a close object.However,the depth value is big when it's a close object in this method,which is totally different from the groundtruth.Could you tell me the reason and how can I get the true distance from the model output?