crockwell / far

[CVPR 2024 - Highlight] FAR: Flexible, Accurate and Robust 6DoF Relative Camera Pose Estimation
https://crockwell.github.io/far/
93 stars 4 forks source link

Loftr backbone #8

Open chenjiajie9811 opened 1 month ago

chenjiajie9811 commented 1 month ago

Hi there,

thank you for your amazing work! I have some questions regarding the loftr backbone. Do you use the pretrained loftr model for getting the feature and correspondences or train it from scratch? The model is supervised purely using the gt pose, is it strong enough for getting good correspondences from loftr?

crockwell commented 1 month ago

Hello,

LoFTR is trained from scratch. It is trained in the standard fashion: with losses on correspondences, where correspondences are derived from ground truth pose and depth. The pose prediction network (FAR) is supervised using only pose. Note this loss backpropogates to LoFTR, but is not sufficient to train correspondences alone.

Hope this helps! Chris

chenjiajie9811 commented 1 month ago

Thank you crockwell for your explaination.

I have a follow up question, since the pose loss of the pose prediction network backpropogates to LoFTR, do you observe any improvement on the performance of the LoFTR module alone? (Or does the LoFTR benefits from the end-to-end training with the pose branch, producing more accurate correspondences?)

crockwell commented 1 month ago

I didn't see a meaningful benefit in correspondence prediction. Early experiments showed backprop through pose prediction network to LoFTR might help very marginally

chenjiajie9811 commented 4 weeks ago

Another question on the translation scale.

I looked through the code and it seems that the predict_translation_scale in default setting and the training scripts you provided is set to False; And when the solver type is prior_ransac, the translation_scale is set to None here, so I am wondering whether the translation scale is different when you perform the linear combination of transformer regressed RT with the solver computed RT?

crockwell commented 3 weeks ago

Good question, I can see how that could be a bit confusing. That variable is for a separate head that could predict translation scale. For the experiments in the publicly released code, we use the network as-is, which predicts translation scale implicitly by predicting a translation vector which includes scale. So you can ignore the predict_translation_scale variable.

Check this function for how we predict T vector.