Question about Depth Estimation in Your Paper

dcharatan / flowmap

Code for "FlowMap: High-Quality Camera Poses, Intrinsics, and Depth via Gradient Descent" by Cameron Smith*, David Charatan*, Ayush Tewari, and Vincent Sitzmann

https://cameronosmith.github.io/flowmap/

MIT License

872 stars 84 forks source link

Question about Depth Estimation in Your Paper #33

Closed TwiceMao closed 3 months ago

TwiceMao commented 4 months ago

@dcharatan Thank you for your excellent work. However, I am a bit confused about the depth estimation mentioned in your paper. In section 3, "Supervision via Camera-Induced Scene Flow," is the depth obtained through backpropagation of the loss gradients? It seems to me that the depth is directly obtained from an off-the-shelf monocular depth estimation method. Could you please clarify this for me?

Guangyun-Xu commented 3 months ago

S3.3 Depth NN (MiDaS) details For our depth network, we use the lightweight CNN version of MiDaS [50], pretrained with the publicly available weights trained on relative-depth estimation.

In this paper, a pre-trained depth estimation model is used, and the pre-trained model is fine-turned in the process of reconstruction.

dcharatan commented 3 months ago

We use a pre-trained depth estimator because it makes convergence faster. If you want to train from scratch, you can append +experiment=ablation_random_initialization or +experiment=ablation_random_initialization_long to the FlowMap command. Doing so will use randomly initialized MiDaS weights. The results should be about the same quality, but convergence will be slower.