Closed rakshith95 closed 2 years ago
It is mainly for ease of use since KITTI does not provide pose. So using pose network lets you go between the two dataset without any need for extra set up. It turns out that the numbers are pretty comparable (pose from VIO is a bit better), but one would need to tune the VIO for each sequence -- so there is that trade-off.
I see, thank you. Since you're training on the photometric loss, wouldn't the trade off in the pose accuracy make a significant difference? I'm not sure what you mean by 'having to tune the VIO for each sequence'. Could you expand on that a bit?
There are a number of parameters that one can play with:
https://github.com/ucla-vision/xivo/blob/devel/cfg/tumvi_cam0.json
How well you converge on a sequence depends on them.
We did a study in our ICRA/RAL 2020 paper (Unsupervised Depth Completion with Visual Inertial Odometry):
https://github.com/alexklwong/unsupervised-depth-completion-visual-inertial-odometry
where we train with pose from VIO and pose from pose network and the improvement from using pose from VIO was approximately 10 mm (which can amount up to 10% improvement in some cases, in the grand scheme of things, 10 mm is not too big of a difference).
Oh okay, thanks a lot for following up, it makes sense.
Hello, Though the absolute poses are available for each frame in the VOID dataset, it looks like PoseNet is used for getting the poses between cameras. Is there a particular reason for this?