Open zshn25 opened 1 month ago
@zshn25, thank you for your interest in this work. In monodepth2 here is link (https://github.com/nianticlabs/monodepth2/blob/master/networks/pose_cnn.py) they proposed weak CNN as pose net, other works even SOTA used it also. In SPIDepth other posenet is used, check this code https://github.com/Lavreniuk/SPIdepth/blob/main/networks/pose_cnn.py#L49 the stronger backbone pretrained from timm produce much better results in pose estimation and thus depth model learns better and faster.
Thank you for clarifying. Which backbone pretrained model do you use from timm for the pose encoder?
I now see that even the depth encoder is a ConvNext backbone. In that case, how do you check if the improvement is coming from the stronger backbone of pose encoder and not the stronger backbone of the depth encoder?
at first I took this SOTA work that already used ConvNext, and start working on PoseNet only. https://arxiv.org/pdf/2309.00526 Also in the end I done ablation study and show in paper impact of both Depth backbone and Pose.
I tried different posenets, but pretrained resnet is good enough in terms of additional parameters and additional accuracy.
Dear @Lavreniuk,
thank you for the great work. You mention that SPIDepth
but I do not find from the paper the proposed changes to neither the pose network, nor to any losses that specifically tackle pose. Also, the code still refers to the pose network from monodepth2.
Just curious, what is the proposed change to the pose network?
P.S. I have worked on the exact same proposal before. It would be good to combine/compare https://github.com/zshn25/pc4consistentdepth