Open tcourat opened 10 months ago
Hi! We've tried different transformer backbones, and they can reach the performance of dinov2 when unfrozen and finetuned. However, this requires significantly more memory and compute, and additionally makes the training unstable.
The performance gains from the robust regression loss is mainly for the extremely precise thresholds, less than one pixel, and is orthogonal to the improvements from dinov2. The classification loss we found was especially important when using transformers.
Our main contribution is the insight of decoupling coarse matching and refinement, not specifically developing robust losses.
The work is still in review, and we may add results with additional backbones in a later version.
The work is still in review, and we may add results with additional backbones in a later version.
Nice !
Have you also tried to compare results with color (RGB) images and greyscale images only ? Some good keypoint detector+matchers (like LoFTR) only use greyscale images.
I tried setting grayscale at testtime only, found a small reduction in performance. Can't say what happens if you do grayscale training. I think grayscale is kind of dumb.
Hello, your results are very promising!
I wonder if you've tried other backbones instead of DinoV2. Indeed, I wonder how much of the good results you're getting come from the extremely well pre-trained DinoV2 backbone or from the loss function you've developed.