Closed shanjiwei closed 11 months ago
Hi,
Good question.
"get the depth value corresponding to each pixel in the patch directly from the rendered depth map" implies the requirement of rendering more pixels.
Let A be the number of rendered pixels. If we assume the depth of each pixel point in the patch has the same depth value as the center pixel, we only need to render A pixels. If we get the pixel depth value from the rendered depth map, which implies rendering $s\times s\times A$ pixels. It will make batch size much larger the the previous one.
Another implementation is using perspective warping. However, it requires backward twice, which is also slow, as we mentioned in the rebuttal in OpenReview.
By the way, I have tried this implementation. However, I only found that it is EXTREMELY time-consuming without any performance boost.
Hi, I'd like to ask about a detail mentioned in the paper: why is it necessary to assume that the depth of each pixel point in the patch is the same as the depth of the sampling point when constructing patch-based warping loss? Why not get the depth value corresponding to each pixel in the patch directly from the rendered depth map?