Parskatt / RoMa

[CVPR 2024] RoMa: Robust Dense Feature Matching; RoMa is the robust dense feature matcher capable of estimating pixel-dense warps and reliable certainties for almost any image pair.
https://parskatt.github.io/RoMa/
MIT License
556 stars 43 forks source link

Training Plot #18

Open sushi31415926 opened 8 months ago

sushi31415926 commented 8 months ago

Hello! I try to training roma myself, i wonder if you can upload your training plot. In addition the training process contains several loss(delta_regression_loss_1,delta_certainty_loss_16, delta_certainty_loss_4,gm_cls_loss_16..) and i am not sure on which output should I focus. Also for how long the model had been train? after 250000 i was able to achieve auc 0.58 @5 on MegaDepth-1500.

Parskatt commented 8 months ago

Hi,

The losses I usually don't focus on, the dense eval on mega1500 seems well related, and it should be around 87% @ 1 pixel and 97% @ 5 pixels if I remember correctly at the end of training.

We trained with 4 GPUs (batch size of 32 total).

Parskatt commented 8 months ago

By the way, what is your AUC10 and AUC20?

sushi31415926 commented 8 months ago

In the paper i found this result on MegaDepth-1500 RoMa 62.6 76.7 86.3 (5◦ ↑ 10◦ ↑ 20◦ ) this is the output of the running on MegaDepth-1500. roma_outdoor auc: [0.5836, 0.735504, 0.8440] I train on MegaDepth only with roma-outdoor.

Parskatt commented 8 months ago

What happens if you run the evaluation with our pretrained weights?

Parskatt commented 8 months ago

As in here:

https://github.com/Parskatt/RoMa/blob/main/experiments/roma_outdoor.py#L275

Parskatt commented 8 months ago

Actually I see that some parts of the code has been updated while the eval is old, I'll go through code and update.

sushi31415926 commented 8 months ago

this is the results of the pretrained weights roma_outdoor auc: [0.6233139783697981, 0.7639168854542346, 0.8610960261153799]

Parskatt commented 8 months ago

Ok, that seems to closely resemble the paper results (there might be slight fluctuations (due to e.g. exact resolution, ransac randomness, etc).

The training was done in our internal codebase, but is supposed to be identical to the public one (but is more messy).

When you say

after 250000

Do you mean the global step? We train for

image

8M steps (the refinement keeps improving for a long time). I think 4M steps would also work, but you need to make sure that you're including the learning rate decrease.

Parskatt commented 8 months ago

And yes, this training takes quite a long time. As we report in the paper, it takes about 4 days on 4 A100s. This is currently one of the downsides of both DKM/RoMa.

sushi31415926 commented 8 months ago

thanks! in my experiments i reach my result only after one day of training(0.58, the pretrained results 0.62) , so i tried to figure by the losses metrics if there is any point to keep training the model for more days.

Parskatt commented 8 months ago

Ok good to hear that code seems to work :D I didn't eval on mega1500 during training so I'm not completely sure what the eval metrics are during training.

Here is some old wandb plots on convergence on the dense matching

image image image
sushi31415926 commented 7 months ago

thanks for the plot. I tried to train roma from scratch with dino that was fine-tuned to my task. However, I was not able to improve the results. Do you think that modifying the backbone might be a logical way to improve the performance of roma? If not, do you have any other suggestions for a way forward?

Thanks!

Parskatt commented 7 months ago

So the backbone was pretrained on your dataset and then frozen like in roma?

We have an experiment regarding the performance of different frozen backbone features, perhaps you could try, something similar to see if your pretraining produces better features than DINOv2 for matching.

Hard to say much more without knowing more details of your experiment.