SenZHANG-GitHub / ekf-imu-depth

[ECCV 2022] Towards Scale-Aware, Robust, and Generalizable Unsupervised Monocular Depth Estimation by Integrating IMU Motion Dynamics
194 stars 13 forks source link

Trans_scale_factor in options.py #7

Closed Ekxplen closed 1 year ago

Ekxplen commented 1 year ago

I have been testing you model some, and I was wondering about this trans_scale_factor which is set to 5.4 by default.

From my understanding, this has to do with the stereo-configuration in KITTI. As such we should not use it when running with mono only? That is: set trans_scale_factor = 1.0 when not training using stereo. Otherwise the relative motions between images will be wrong. This wouldn't be an issue if we only used the pose-network since it could learn to deal with this scale. However, this scale is also used in the compute_imu_pose_with_inv function, meaning that the acceleration-based translation will be incorrect. (Though this may be somewhat corrected by the velocity network)

Have I understood this correctly? I saw that my predicted depths were surprisingly low before I changed this scale factor to 1.

SenZHANG-GitHub commented 1 year ago

I have been testing you model some, and I was wondering about this trans_scale_factor which is set to 5.4 by default.

From my understanding, this has to do with the stereo-configuration in KITTI. As such we should not use it when running with mono only? That is: set trans_scale_factor = 1.0 when not training using stereo. Otherwise the relative motions between images will be wrong. This wouldn't be an issue if we only used the pose-network since it could learn to deal with this scale. However, this scale is also used in the compute_imu_pose_with_inv function, meaning that the acceleration-based translation will be incorrect. (Though this may be somewhat corrected by the velocity network)

Have I understood this correctly? I saw that my predicted depths were surprisingly low before I changed this scale factor to 1.

Hi @Ekxplen , the purpose of trans_scale_factor is to convert the stereo baseline from 0.54m to 0.1m, which is suggested in monodepth2 for stabilising the training process. In DynaDepth, we do not use stereo for training the dataset. We instead assume a virtual stereo baseline 0.54m to be consistent with the stereo setting of monodepth2.

The scale of the predicted depths is consistent with the translation, thus it has been divided by trans_scale_factor. That's why the raw predictions are low. We need to multiply trans_scale_factor to recover the original scale, as indicated in evaluate_depth.py.

Ekxplen commented 1 year ago

Ok ,I see. I am training on a custom dataset and implemented some training warmup, so I didn't consider stabilizing the training that way.

Thanks for the quick response.