Question about the value in depth matrix

DepthAnything / Depth-Anything-V2

[NeurIPS 2024] Depth Anything V2. A More Capable Foundation Model for Monocular Depth Estimation

https://depth-anything-v2.github.io

Apache License 2.0

3.86k stars 334 forks source link

Question about the value in depth matrix #22

Open Liuhsinlun opened 4 months ago

Liuhsinlun commented 4 months ago

Hello, I have been researching monocular depth estimation recently. I found this place, and the documentation is very detailed, but I still have a small question that I would like the author to help me with. For each value in the depth matrix (I printed it out), do they represent the estimated real-world distance (in meters) for each pixel?

code : depth = model.infer_image(raw_img) # HxW raw depth map in numpy print("depth: \n", depth)

output : 555

osiloke commented 4 months ago

I would say this library gives you a perception of depth akin to how elevation of "cards" in UI are calculated. To get real world measurements, you still need intrinsic and extrinsic data like, camera focal length etc used to calculate a transformation matrix to map 2D pixels to real world 3D memeasurements.

An article like this one about intrinsic and extrinsic matrices may explain better

tangjunjun966 commented 2 months ago

@Liuhsinlun Did you solve this problem? I also want to know if the code depth = model.infer_image(raw_img) refers to the depth Wz in the camera coordinate system.

Liuhsinlun commented 2 months ago

666 @tangjunjun966 If u want to obtain the predicted depth values in real-world units (i.e., meters), you can use the model in this folder. Its output represents the straight-line distance from the center of the image to the object being captured.