Lavreniuk / SPIdepth

Strengthened Pose Information for self-supervised monocular depth estimation. SPIdepth refines the pose network to improve depth prediction accuracy, achieving state-of-the-art results on benchmarks like KITTI, Cityscapes, and Make3D.
MIT License
45 stars 0 forks source link

PoseNet improvement? #5

Open zshn25 opened 1 month ago

zshn25 commented 1 month ago

Dear @Lavreniuk,

thank you for the great work. You mention that SPIDepth

improves monocular depth estimation by focusing on the refinement of the pose network

but I do not find from the paper the proposed changes to neither the pose network, nor to any losses that specifically tackle pose. Also, the code still refers to the pose network from monodepth2.

Just curious, what is the proposed change to the pose network?

P.S. I have worked on the exact same proposal before. It would be good to combine/compare https://github.com/zshn25/pc4consistentdepth

Lavreniuk commented 1 month ago

@zshn25, thank you for your interest in this work. In monodepth2 here is link (https://github.com/nianticlabs/monodepth2/blob/master/networks/pose_cnn.py) they proposed weak CNN as pose net, other works even SOTA used it also. In SPIDepth other posenet is used, check this code https://github.com/Lavreniuk/SPIdepth/blob/main/networks/pose_cnn.py#L49 the stronger backbone pretrained from timm produce much better results in pose estimation and thus depth model learns better and faster.

zshn25 commented 1 month ago

Thank you for clarifying. Which backbone pretrained model do you use from timm for the pose encoder?

I now see that even the depth encoder is a ConvNext backbone. In that case, how do you check if the improvement is coming from the stronger backbone of pose encoder and not the stronger backbone of the depth encoder?

Lavreniuk commented 1 month ago

at first I took this SOTA work that already used ConvNext, and start working on PoseNet only. https://arxiv.org/pdf/2309.00526 Also in the end I done ablation study and show in paper impact of both Depth backbone and Pose.

Lavreniuk commented 1 month ago

I tried different posenets, but pretrained resnet is good enough in terms of additional parameters and additional accuracy.