Regarding the depth used for generating target image

ClementPinard / SfmLearner-Pytorch

Pytorch version of SfmLearner from Tinghui Zhou et al.

MIT License

1.01k stars 224 forks source link

Regarding the depth used for generating target image #152

Closed everythoughthelps closed 11 months ago

everythoughthelps commented 11 months ago

According to equation 2 in the paper, we require the reference depth for the reference image to create a synthesis target image. However, in your code, you have used tgt_depth instead of ref_depth which has caused confusion for me. Can you kindly clarify this for me? I would greatly appreciate your response. Thank you!

ClementPinard commented 11 months ago

Do you mean in this paper ? https://openaccess.thecvf.com/content_cvpr_2017/papers/Zhou_Unsupervised_Learning_of_CVPR_2017_paper.pdf

The equation 2 says

Where D_t is the estimation of target depth, so no reference depth here.

Or maybe you are referring to another equation ?

everythoughthelps commented 11 months ago

yes, exactly this paper, I didn't express myself clearly, we use $r$ and $t$ index the image in equation2: $$pt = K T{r \rightarrow t} D_r(p_r) K^{-1} p_r$$, which is more close to your code. The thing that confuse me a lot is: according to this equation, you are supposed to use the ref_depth $D_r$ to generate the fake tgt_img right? but you use the tgt_depth, which is output from the dispnet(tgt_img), to generate the fake tgt_img

everythoughthelps commented 11 months ago

One reason I can figure is that there are (sequence number -1) ref_depths are used to generate a fake tgt_img, this consumes a lot of time so you use the tgt_depth approxing ref_depths to save time, is that right?

ClementPinard commented 11 months ago

I think your confusion is that this equation describes how we can reconstruct target image by getting colors from reference image and not the other around

the p_s is indeed referred to coordinates in reference image, but it tells where to pick the color for the pixel that will be at the coordinate p_t, which is in the target image.

This is also the reason why this operation is called inverse warp and not simply warp. target depth is used to reconstruct target image, even though we already know target depth since we used it to get the depth.

everythoughthelps commented 11 months ago

I get it!, thanks for your instant reply!