cvg / glue-factory

Training library for local feature detection and matching
Apache License 2.0
756 stars 98 forks source link

while Reproducing... can't reach the same performance #58

Open AubreyCH opened 9 months ago

AubreyCH commented 9 months ago

Hi! Thank you for your excellent work! I've been trying to reproduce the results reported in the paper recently. Here's what I got:

  1. By using 2 4090s and following the official config, the results I got from pretraining on the homography dataset: 09fa20b94f85008974ca40bed1057ac0

2.Finetune on Megadepth

Then I followed settings described in the paper : lr as 1e-5, decay by 0.8 after 10 epochs, and I got the checkpoint_best with loss: Epoch 48 :New best val: loss/total=0.4931303240458171 the test results are as follows: 4f10be29187de7e8a9432651e6ba6b16

I also tried the settings in the official config: lr as 1e-4 , decay by 0.95 after 30 epochs, this is what i got: Epoch 49: New best val: loss/total=0.3537058917681376 a007554757cc24d102906d06d8d86edb

And I've tried some other possible settings and still didn't reach the same results reported in the paper. Could you plz give more details on how you finetune the model on the megadepth dataset? Or any other suggestions on improving the performance?

Phil26AT commented 9 months ago

Hi @AubreyCH

We are sorry for some inconsistencies in the training setup, the SuperPoint model was released much earlier than glue-factory, which has undergone significant changes until the final release. For reproducibility, I suggest to move to ALIKED, which was trained and released at the final release, and which generally outperforms SuperPoint on most datasets (see the excellent blog post here). We can also provide the logs for our training runs there.

The pretraining looks fine to me. Also, after finetuning, your MegaDepth results are <2% off the official results, so already quite good. However, your final loss is a bit off, we reached a final loss of 0.265. I would suggest to increase the learning rate to 2.0e-4 or even 4.0e-4 at the start (you can keep the schedule of the official config).

Are you using cached features during finetuning or do you extract them again for each pair? I'd recommend caching features.

duplicate https://github.com/cvg/LightGlue/issues/108

AubreyCH commented 8 months ago

Hi @Phil26AT

Thank you so much for your quick reply and great advice! I do use cached features during finetuning with "data.load_features.do=True". Following your suggestions, I've tried to increase the learning rate to 2.0e-4 and it does help to improve performance, however, it still cannot reach the official results. And I also tried learning rate with 3.0r-4 and 4.0e-4, it seemed too big because the loss exploded to NaN after several epochs. And some of my findings might be interesting:

  1. With learning rate=2.0e-4, using mix_precision=float16 achieves reasonable results, however, finetuning without mix_precision leads to higher loss and loss explosion after 7 epochs.
  2. Different learning rate settings perform inconsistently on the opencv and poselib evaluations. For example, 2.0e-4 performs better with poselib but 1e-4 performs better with opencv.

I feel quite confused about these findings. Could you give me some advice? Do you recommend to use mix_precision? I will keep working on my projects and I will try to find a good learning rate between 5.0e-5(which seems a bit small according to my work) and 2.0e-4(without mix_precision unfortunately). Besides, I will try to use ALIKED as the keypoints generator.

zyxzyx45 commented 2 months ago

Hello, can you share the official pre-training weights? Thank you very much。