YvanYin / Metric3D

The repo for "Metric3D: Towards Zero-shot Metric 3D Prediction from A Single Image" and "Metric3Dv2: A Versatile Monocular Geometric Foundation Model..."
https://jugghm.github.io/Metric3Dv2/
Creative Commons Zero v1.0 Universal
986 stars 70 forks source link

Get good depth maps without running triton #122

Open famaster opened 1 week ago

famaster commented 1 week ago

Hi everyone, I would like to thank you for your efforts on this open-source model. I am new to computer vision and i want to use your model to measure objects in the real life from any phone camera. I am using the normal maps and depth maps to project point into planes in the 3d space provided by the depth maps. But i can't seem to get the right measurements. I have two questions: 1) when i visualize depth maps directly as they come out of the model without post processing or normalization i get a lot of tearing and artefacts, but when i visualize them after normalization, the depth map looks okay but it lacks details. Is this a normal behaviour or is it because i can't run triton on my windows remote machine (i can't do virtualization or install wsl)? 2) how do i calculate correctly the focal_length_px (in pixels) of my phone camera, should i use the normal focal length in mm that you can find in EXIF data (f_mm * sensor_ppi) or the focal length of the equivalent full frame sensor of 35mm? Thank you for answering my questions in advance.

JUGGHM commented 1 week ago

Hi everyone, I would like to thank you for your efforts on this open-source model. I am new to computer vision and i want to use your model to measure objects in the real life from any phone camera. I am using the normal maps and depth maps to project point into planes in the 3d space provided by the depth maps. But i can't seem to get the right measurements. I have two questions: 1) when i visualize depth maps directly as they come out of the model without post processing or normalization i get a lot of tearing and artefacts, but when i visualize them after normalization, the depth map looks okay but it lacks details. Is this a normal behaviour or is it because i can't run triton on my windows remote machine (i can't do virtualization or install wsl)? 2) how do i calculate correctly the focal_length_px (in pixels) of my phone camera, should i use the normal focal length in mm that you can find in EXIF data (f_mm * sensor_ppi) or the focal length of the equivalent full frame sensor of 35mm? Thank you for answering my questions in advance.

1) What do you mean by "normalization"? Could you please post your visualization here? 2) Both are OK but the calculation procedures are different. (a) For EXIF data we need to know the CCD width of the sensor and (b) For 35mm-equivalent data we need to use 36mm as the width. Focal (by pixel) = Image Width * focal (by mm) / CCDwidth (by mm)

Personally, I recommend a tutorial provided by Washington University for reference.