alexklwong / calibrated-backprojection-network

PyTorch Implementation of Unsupervised Depth Completion with Calibrated Backprojection Layers (ORAL, ICCV 2021)
Other
117 stars 24 forks source link

Are the absolute poses in the void dataset used in training? #12

Closed rakshith95 closed 2 years ago

rakshith95 commented 2 years ago

Hello, Though the absolute poses are available for each frame in the VOID dataset, it looks like PoseNet is used for getting the poses between cameras. Is there a particular reason for this?

alexklwong commented 2 years ago

It is mainly for ease of use since KITTI does not provide pose. So using pose network lets you go between the two dataset without any need for extra set up. It turns out that the numbers are pretty comparable (pose from VIO is a bit better), but one would need to tune the VIO for each sequence -- so there is that trade-off.

rakshith95 commented 2 years ago

I see, thank you. Since you're training on the photometric loss, wouldn't the trade off in the pose accuracy make a significant difference? I'm not sure what you mean by 'having to tune the VIO for each sequence'. Could you expand on that a bit?

alexklwong commented 2 years ago

There are a number of parameters that one can play with:

https://github.com/ucla-vision/xivo/blob/devel/cfg/tumvi_cam0.json

How well you converge on a sequence depends on them.

We did a study in our ICRA/RAL 2020 paper (Unsupervised Depth Completion with Visual Inertial Odometry):

https://github.com/alexklwong/unsupervised-depth-completion-visual-inertial-odometry

where we train with pose from VIO and pose from pose network and the improvement from using pose from VIO was approximately 10 mm (which can amount up to 10% improvement in some cases, in the grand scheme of things, 10 mm is not too big of a difference).

rakshith95 commented 2 years ago

Oh okay, thanks a lot for following up, it makes sense.