dcharatan / flowmap

[3DV 2025] Code for "FlowMap: High-Quality Camera Poses, Intrinsics, and Depth via Gradient Descent" by Cameron Smith*, David Charatan*, Ayush Tewari, and Vincent Sitzmann
https://cameronosmith.github.io/flowmap/
MIT License
893 stars 87 forks source link

Predicted depth range and metric depth setting #28

Closed skrya closed 6 months ago

skrya commented 6 months ago

Dear Authors,

Thanks for the amazing work and the amazing code base!!

Could you please provide information on whether there are any constraints regarding the minimum and maximum values outputted by the depth network for the MiDaS model using the 'exp' setting? Specifically, I would like to know if the depth values range from 0.01 to infinity, or if they are normalized in some way. Additionally, could you clarify if any scale or shift adjustments are applied to the output? I am interested in adapting the output for use in a metric depth setting.

Also, what are the advantages of using depth midas 'exp' vs. midas 'original'? Did you find any difference in your experiments?

Thanks!

dcharatan commented 6 months ago

There are no additional constraints regarding the minimum and maximum values for the "exp" setting. The outputs will naturally be between 0.01 and infinity. We don't apply additional scale or shift adjustments beyond what you see here:

https://github.com/dcharatan/flowmap/blob/19ac72b78d010220bd9487553db4fb463ab317d4/flowmap/model/backbone/backbone_midas.py#L80-L84

The "exp" setting works better for random initialization, since there's no clipping of the gradients due to ReLU like there is with the "original" setting. We use the "original" setting because that's what the pre-trained MiDaS network uses. If you want to train a new initialization checkpoint totally from scratch, it might be worth exploring whether "exp" works better. Just make sure the values at initialization are reasonable (i.e., not extremely large because of the exp).