question about the training result

cvlab-epfl / disk

Disk code release

Apache License 2.0

317 stars 46 forks source link

question about the training result #18

Closed KN-Zhang closed 2 years ago

KN-Zhang commented 2 years ago

Hello~ I wonder the performance of the final models would be similar or different when I train the network from scratch each time. Have you ever evaluated this issue? Thanks.

jatentaki commented 2 years ago

Hello, what exactly do you mean by "each time"? On a different dataset? Or are you just asking about reproducibility on the dataset we trained with?

KN-Zhang commented 2 years ago

Hello, what exactly do you mean by "each time"? On a different dataset? Or are you just asking about reproducibility on the dataset we trained with?

I mean the reproducibility with the same dataset.

jatentaki commented 2 years ago

Ok, is your question not covered by the relevant section of readme? In short, yes, it should be easily reproducible up to 0.50432 stereo AUC and 0.72624 multiview AUC on IMW2020 test set and up to the paper results with the slightly more involved schedule explained in that paragraph. Are you seeing different results on your end?

KN-Zhang commented 2 years ago

Ok, is your question not covered by the relevant section of readme? In short, yes, it should be easily reproducible up to 0.50432 stereo AUC and 0.72624 multiview AUC on IMW2020 test set and up to the paper results with the slightly more involved schedule explained in that paragraph. Are you seeing different results on your end?

In fact I just began to train the model and haven't performed evaluation. Due to the nature of sampling, the variances of all metrics (for example ”n_keypoints“, "n_pairs" during the training stage) are very large. So I am curious whether the performance among each trained model would vary.

Thanks for your response! I would track this issue if different results were saw on my end.

jatentaki commented 2 years ago

Yes, the variance of the training metrics is very large indeed, but we found the validation numbers (averaged across the entire set) to be quite reliable. Please reopen if you find any major discrepancies :)