alexklwong / calibrated-backprojection-network

PyTorch Implementation of Unsupervised Depth Completion with Calibrated Backprojection Layers (ORAL, ICCV 2021)
Other
117 stars 24 forks source link

What is the 'input_channels_depth' parameter for? #10

Closed rakshith95 closed 2 years ago

rakshith95 commented 2 years ago

Hello, In run_kbnet.py, you have an argument 'input_channels_depth' whose default is 2; I'm not sure what this means. In networks.py, it's written that it is the "number of input channels for depth branch", but why would the depth be 2 channel?

rakshith95 commented 2 years ago

Related, but I also cannot find the part of the code in run_kbnet where the pretrained model weights are loaded for inference.

EDIT: I see that the line depth_model.restore_model(depth_model_restore_path) in kbnet.py does this.

alexklwong commented 2 years ago

Hi, so input_channels_depth is 2 because we also follow the existing works to feed in not just sparse depth but also a validity map (in our case binary). It has been founded by several works that it improves performance.

run_kbnet.py takes in the argument --depth_model_restore_path https://github.com/alexklwong/calibrated-backprojection-network/blob/master/src/run_kbnet.py#L84

which is passed to the run function here https://github.com/alexklwong/calibrated-backprojection-network/blob/master/src/run_kbnet.py#L144

and it is restored here https://github.com/alexklwong/calibrated-backprojection-network/blob/master/src/kbnet.py#L804

rakshith95 commented 2 years ago

Hi, so input_channels_depth is 2 because we also follow the existing works to feed in not just sparse depth but also a validity map (in our case binary). It has been founded by several works that it improves performance.

Ah okay, thank you. Now, I see that the validity map for the sparse data during inference is obtained for positive depths here:

        validity_map_depth = torch.where(
            sparse_depth > 0,
            torch.ones_like(sparse_depth),
            sparse_depth) 

Can I know what conditions you used to generate the validity map in the original dataset? I'm not sure about XIVO but I assume all VIO algorithms filter out points behind the camera.

alexklwong commented 2 years ago

In the case VOID the validity map are taken as is based on the points tracked by XIVO and yes any point that is not visible in the frame is not valid. For KITTI however, the validity map and sparse point cloud may differ. Namely that we do a naive filtering to remove points that straddle occlusion boundaries.

https://github.com/alexklwong/calibrated-backprojection-network/blob/master/src/kbnet.py#L411 The parameters that we have chosen for this has no effect on VOID, but will remove points in KITTI in which case the validity map will be a subset of lidar returns.

rakshith95 commented 2 years ago

Alright, that makes sense, thank you