Ambiguity in the reported results

Hi.

I ran PatchNetVLAD trained on pitts30k on Mapillary val split. This gives: NetVLAD all_recall@1: 0.580 all_recall@5: 0.720 all_recall@10: 0.761 all_recall@20: 0.785 Which match with the one in Table 1 under Mapillary (val). Patch-NetVLAD: all_recall@1: 0.734 all_recall@5: 0.801 all_recall@10: 0.828 all_recall@20: 0.849 Which are little lower than the reported ones. The same testing when I did with Mapillary trained models, NetVLAD: all_recall@1: 0.711 all_recall@5: 0.815 all_recall@10: 0.843 all_recall@20: 0.880, and Patch-NetVLAD: all_recall@1: 0.808 all_recall@5: 0.865 all_recall@10: 0.884 all_recall@20: 0.904

What my doubt is that is it fair to compare NetVLAD results (trained on pitts30k) with Patch-NetVLAD results (trained on Mapillary) on the same test data? Most scenarios a model which sees more varieties during its training performs better than a model which sees a fewer variety of samples right? can we still judge models trained on different datasets on the same test data?

QVPR / Patch-NetVLAD

Ambiguity in the reported results #85