Output Format and Scale

fabiotosi92 / monoResMatch-Tensorflow

Tensorflow implementation of monocular Residual Matching (monoResMatch) network.

116 stars 20 forks source link

Output Format and Scale #3

Closed InzamamAnwar closed 5 years ago

InzamamAnwar commented 5 years ago

Hi! First of all thank you for sharing your findings. I ran the single image test and got output in "npy" format. Can you please provide details about the scale of the output and whether it give real world meter or not? If not then how can the output "npy" can be scaled to get real world meters.

Thanks!

fabiotosi92 commented 5 years ago

Hi, the network estimates disparity values at training/test time. You can easily convert such values in real world depth exploiting the intrinsic parameters provided by the KITTI dataset. In "utils/evaluation_utils.py" you can find the "convert_disps_to_depths_kitti" function for this purpose.

InzamamAnwar commented 5 years ago

Thank you for your reply. If the test image does not belong to KITTI, we will use this function (intrinsic parameter) for real world depth estimation.

fabiotosi92 commented 5 years ago

However, keep in mind that such conversion is meaningful if both the camera and the scene are the same at training and testing time. In a totally different scenario (i.e. indoor scene) you can't recover the exact scale factor for the real world depth conversion because the ill-posed nature of the monocular depth estimation task. In such case, the network gives you only a relative depth information about the context.

InzamamAnwar commented 5 years ago

HI!

I have modified the function, given below to get depth in real world.

def get_depth(pred_disparity):
    height, width = pred_disparity.shape
    pred_disparity = pred_disparity * pred_disparity.shape[1] / width
    pred_depth = width_to_focal[width] * 0.54 / pred_disparity
    return pred_depth

I have two questions regarding this

When test image is not from KITTI dataset, what would be the value for width_to_focal?
How well algorithms trained on datasets like KITTI or NYU Depth would work on images in the wild?

fabiotosi92 commented 5 years ago

Hi! I try to answer your questions:

If the target scene is completely different from the scenario used during the training phase I find myself unable to say that you can recover the exact scale factor to obtain meaningful depth values (monocular depth estimation is an ill-posed problem by definition). On the other hand, if the testing dataset contains the same objects of the training dataset but imaged by another camera you can apply the focal length normalization to avoid depth inconsistencies.
It depends on the target scenario. If you train the network on outdoor(indoor) images and then test it on indoor(outdoor) scenes, you can notice a huge drop in accuracy. Nevertheless, the network is able to estimate good (relative) depth maps if the two domains are quite similar.