Closed TwiceMao closed 3 months ago
S3.3 Depth NN (MiDaS) details For our depth network, we use the lightweight CNN version of MiDaS [50], pretrained with the publicly available weights trained on relative-depth estimation.
In this paper, a pre-trained depth estimation model is used, and the pre-trained model is fine-turned in the process of reconstruction.
We use a pre-trained depth estimator because it makes convergence faster. If you want to train from scratch, you can append +experiment=ablation_random_initialization
or +experiment=ablation_random_initialization_long
to the FlowMap command. Doing so will use randomly initialized MiDaS weights. The results should be about the same quality, but convergence will be slower.
@dcharatan Thank you for your excellent work. However, I am a bit confused about the depth estimation mentioned in your paper. In section 3, "Supervision via Camera-Induced Scene Flow," is the depth obtained through backpropagation of the loss gradients? It seems to me that the depth is directly obtained from an off-the-shelf monocular depth estimation method. Could you please clarify this for me?