I ran PatchNetVLAD trained on pitts30k on Mapillary val split. This gives:
NetVLAD
all_recall@1: 0.580
all_recall@5: 0.720
all_recall@10: 0.761
all_recall@20: 0.785
Which match with the one in Table 1 under Mapillary (val).
Patch-NetVLAD:
all_recall@1: 0.734
all_recall@5: 0.801
all_recall@10: 0.828
all_recall@20: 0.849
Which are little lower than the reported ones. The same testing when I did with Mapillary trained models,
NetVLAD:
all_recall@1: 0.711
all_recall@5: 0.815
all_recall@10: 0.843
all_recall@20: 0.880, and
Patch-NetVLAD:
all_recall@1: 0.808
all_recall@5: 0.865
all_recall@10: 0.884
all_recall@20: 0.904
What my doubt is that is it fair to compare NetVLAD results (trained on pitts30k) with Patch-NetVLAD results (trained on Mapillary) on the same test data?
Most scenarios a model which sees more varieties during its training performs better than a model which sees a fewer variety of samples right? can we still judge models trained on different datasets on the same test data?
Hi.
I ran PatchNetVLAD trained on pitts30k on Mapillary val split. This gives: NetVLAD all_recall@1: 0.580 all_recall@5: 0.720 all_recall@10: 0.761 all_recall@20: 0.785 Which match with the one in Table 1 under Mapillary (val). Patch-NetVLAD: all_recall@1: 0.734 all_recall@5: 0.801 all_recall@10: 0.828 all_recall@20: 0.849 Which are little lower than the reported ones. The same testing when I did with Mapillary trained models, NetVLAD: all_recall@1: 0.711 all_recall@5: 0.815 all_recall@10: 0.843 all_recall@20: 0.880, and Patch-NetVLAD: all_recall@1: 0.808 all_recall@5: 0.865 all_recall@10: 0.884 all_recall@20: 0.904
What my doubt is that is it fair to compare NetVLAD results (trained on pitts30k) with Patch-NetVLAD results (trained on Mapillary) on the same test data? Most scenarios a model which sees more varieties during its training performs better than a model which sees a fewer variety of samples right? can we still judge models trained on different datasets on the same test data?