ClementPinard / SfmLearner-Pytorch

Pytorch version of SfmLearner from Tinghui Zhou et al.
MIT License
1.01k stars 224 forks source link

Reason for one_scale function in the photometric_reconstruction_loss #94

Closed aadilmehdis closed 4 years ago

aadilmehdis commented 4 years ago

Hi, Is there any particular reason for scaling the target image, intrinsic, and reference image with the height of the depth map in the one_scale function defined in here

ClementPinard commented 4 years ago

intrinsics change with image size. Basic example is optical center. It's usually at the center of the image, so at W/2, H/2. You can see how when resizing image to W2, H2, the optical center now becomes W2/2, H2/2. In other words, we multiplied the values by the size ratio.

This is very important because for inverse warp to work, we need the right intrinsics with respect to a particular image.

Now as to why we resize the image, it's because now we have a photometric function that operates with depth and image of the same size, so that each depth pixel is associated to a pixel in the rescaled target image.

It's not an obligation (see monodepth2, where they keep all the pictures full resolution even with the tinied depth maps), but the rationale is that photometric error in inverse warp is only good up to a single pixel of displacement, because of how colors are interpolated. As such, if displacement are to great in pixels, we might fall in a local minimum. The proposed solution here is to operate the photometric error in multiple scales, so that the lowest is the easiest to find, and when going up in scale, the depth can be refined.

It should be noted that for simplicity, since we had a network with multiple depth sizes output, it fits nicely to use them for mulitscale photometric loss, as if each depth was the exact downscale of the depth above, but it's not the case. So you could completely imagine an arbitrary number of different scale for photometric error for each outputted depth map from the network.