Closed Zhu980805 closed 3 years ago
Hi, thanks for the scrupulous comments about the code and cross-view masking operation.
1, The implementation of the original cross-view masking: It is noticed in the paper that the predicted depth map in the Depth Estimation Branch is used as the pseudo ground truth in the Data-Augmentation Branch. As you mentioned, the ground truth depth is not allowable, hence we utilize the pseudo ground truth in the cross-view masking step.
For example, Please check line 127 in jdacs/train.py
as follows:
loss, scalar_outputs, image_outputs, segment_loss, depth_est = train_sample(sample, do_summary)
The variable depth_est
is the aforementioned predicted depth map.
2, The modifications in the code: Empirically, we find that only blocking out the regions in the reference view can provide great performance as well as simultaneously blocking out the regions in the source view. Furthermore, these extra calculations of the corresponding regions in source views may consume more time in the training phase. Hence, we remove these operations for simplification.
Very glad to receive your reply! I fully understand what you said, so thank you again for your amazing work!
Thanks for your amazing work. But I have a question about cross-view masking. According to the code provided, it seems that you only block out some regions on reference view but didn't mask out the corresponding area in source views, which is inconsistent with the statement in the paper. What's more, I think if you do mask out the corresponding area in source views, it means that you need the groudtruth depth, which is not allowable in the setting of self-supervised MVS. Is my understanding correct? Looking forward to your reply.