Aradhye2002 / EcoDepth

[CVPR'2024] Official implementation of the paper "ECoDepth: Effective Conditioning of Diffusion Models for Monocular Depth Estimation"
https://ecodepth-iitd.github.io/
MIT License
167 stars 17 forks source link

Absolute depth (metric depth) #10

Closed AbbosAbdullayev closed 7 months ago

AbbosAbdullayev commented 7 months ago

thanks for the awesome contribution on monocular depth estimation. Does the model predict the relative depth map? is it possible to get actual distances (in meters) from the predicted depth map? thank you for your response

Aradhye2002 commented 7 months ago

Hi,

Thanks for appreciating our work. The model indeed produces metric depth (in metres) and not relative depth. This is because all three models (vkitti.ckpt, kitti.ckpt and nyu.ckpt) are metric datasets and we train the model to predict the ground truth depth without the use of median scaling. It would be helpful to note that while the model produces metric depth, median scaling often helps reduce the error to a large extent in OOD images (ie. those images which are very different from the training dataset for the model used).

I hope that this answers your question. Please feel free to open another issue is case of any other questions.

AbbosAbdullayev commented 7 months ago

@Aradhye2002, thank you very much for your quick response. I followed the README of your work and downloaded the required checkpoint files. However, I'm encountering an out-of-memory issue when performing inference on a single arbitrary image. My environment consists of an RTX GPU with 12GB, which I believe should be sufficient for inference. Could I have overlooked something? I'm testing under Kitti outdoor conditions using the vkitti model (4GB) and transforms (v1-5**) (4GB). Can you provide some suggestions?

Aradhye2002 commented 6 months ago

Hi @AbbosAbdullayev, can you please take a look at https://github.com/Aradhye2002/EcoDepth/issues/6#issuecomment-2041351542 which mentions a fix for reducing GPU memory consumption? In case it still doesn't work, can you report the value of the max_area variable that you are using? Ideally as you would reduce the max_area variable to zero, the memory requirement should go to 4GB + 4GB = 8 GB (for the vkitti model and stable diffusion model resp.) and the performance should drop to 0. Since your capacity is 12 GB, you can do inference with a non-zero max_area variable value. I would advise you to try different values of max_area to find the largest one (in order to maximize performance) that does not result in an OOM error.