Closed leefeng001 closed 2 years ago
Hi, thanks for the good question!
In our preliminary experiments, we saw that using L2Norm before GeM gives slightly better results.
Note that you can easily change the L2Norm position with the parameter --l2
.
We did this so you can also use trained models from other sources: for example, you can use the model from the original GeM paper's repository (passing --l2=after_pool
) or the model from the repo that introduced the AP loss (passing --l2=none
).
thanks for your reply. one more question, if using L2Norm before GeM, the final feature descriptor was NOT normalized (its norm dose NOT equals to 1), when compare the similarity between feature in database and queries, should we use euclidean distance or cosine similarity?
We always used the euclidean distance in all our experiments.
Have you ever benchmark the result between euclidean distance and cosine similarity? And does it make sense to use euclidean distance among a set of vector which are NOT unified to the same scale? how do you think of this point?
I haven't personally tried to use any other distances besides euclidean. The similarity should reflect the measures that is used in the loss: the standard triplet loss works with euclidean distance, but you can try the torch.nn.TripletMarginWithDistanceLoss if you want to experiment with other similarity measures (e.g. cosine). To check if these assumptions are correct, you could test our pretrained models (trained with euclidean, test with cosine) which will take just a few minutes (for a quick test you could download the st_lucia dataset), or you could train your own model (with our code training a ResNet-18 on pitts30k takes just a few hours).
Ok, got it. I have no further questions, Thanks for your reply.
Hi, thanks for this nice work first! I just confused by one thing: why you using L2Norm before GeM? I had also study the architecture proposed in original GeM paper inwhich the author was normalize the final vector instead of before pooling layer. so have you ever benchmarking the performance between using L2Norm before and after pooling layer? Looking forward to your reply!