Parskatt / RoMa

[CVPR 2024] RoMa: Robust Dense Feature Matching; RoMa is the robust dense feature matcher capable of estimating pixel-dense warps and reliable certainties for almost any image pair.
https://parskatt.github.io/RoMa/
MIT License
434 stars 33 forks source link

Trying different backbone ? #6

Open tcourat opened 10 months ago

tcourat commented 10 months ago

Hello, your results are very promising!

I wonder if you've tried other backbones instead of DinoV2. Indeed, I wonder how much of the good results you're getting come from the extremely well pre-trained DinoV2 backbone or from the loss function you've developed.

Parskatt commented 10 months ago

Hi! We've tried different transformer backbones, and they can reach the performance of dinov2 when unfrozen and finetuned. However, this requires significantly more memory and compute, and additionally makes the training unstable.

The performance gains from the robust regression loss is mainly for the extremely precise thresholds, less than one pixel, and is orthogonal to the improvements from dinov2. The classification loss we found was especially important when using transformers.

Our main contribution is the insight of decoupling coarse matching and refinement, not specifically developing robust losses.

Parskatt commented 10 months ago

The work is still in review, and we may add results with additional backbones in a later version.

tcourat commented 10 months ago

The work is still in review, and we may add results with additional backbones in a later version.

Nice !

Have you also tried to compare results with color (RGB) images and greyscale images only ? Some good keypoint detector+matchers (like LoFTR) only use greyscale images.

Parskatt commented 10 months ago

I tried setting grayscale at testtime only, found a small reduction in performance. Can't say what happens if you do grayscale training. I think grayscale is kind of dumb.