antabangun / DualRefine

This repository contains the code for the paper "DualRefine: Self-Supervised Depth and Pose Estimation Through Iterative Epipolar Sampling and Refinement Toward Equilibrium", a self-supervised depth and pose estimation model. It refines depth and pose estimates by computing matching costs based on epipolar geometry and deep equilibrium model.
https://antabangun.github.io/projects/DualRefine/index.html
GNU General Public License v3.0
53 stars 5 forks source link

About the paper #8

Closed wangjiyuan9 closed 1 month ago

wangjiyuan9 commented 1 month ago

Thank you for your amazing work! I've read your paper twice times and I believe there are some questions that most people will have:

  1. The 'F', is really confusing as it never introduces them in paper image image
  2. Why always use Resnet 18 as the pose encoder? Will it be better with Resnet 50 or ViT? Is there any experiment to prove that?
  3. The cost map C_k has: image But you say there is such a number of candidates: image I believe there must be some details or logistics I ignored because you use 'c' here: image Could you please give me an explanation? Really thank you!
antabangun commented 1 month ago

Thank you for pointing out unclear points which could have been written better.

  1. F are feature maps from multiple stages/scales, as mentioned in the line from the second image.
  2. We tried Resnet50 as pose network but I don't remember the conclusion we made and didn't explore this in detail to stay consistent with previous works. Experimenting with various pose networks is an interesting idea though, and perhaps its something you could explore.
  3. c=D[u]/C is defined in that paragraph. The candidates are the following set: {D[u] + (i x c x n) | i=[-r,-r+1,...,r-1,r], c=D[u]/C, n=[1,2,3] }. So in total n x (2 x r + 1) candidates.
wangjiyuan9 commented 1 month ago

Thank you for your valuable explanation!