How to calculate PrecisionRateAtN?

MAX-OTW commented 1 year ago

Hi, @MubarizZaffar @oravus ,thank you very much for your work. https://github.com/MubarizZaffar/VPR-Bench/blob/77ad30f3684476717188c67d7f3f66be19f832af/performance_comparison.py#L71C1-L104C1, this is the code for calculating RecallRateAtN, I would like to ask you how to calculate PrecisionRateAtN?I tried several times, but failed. Can you provide the relevant code? note this: Recall@N=TP@N/(TP@N＋ FN@N);Precision@N=TP@N/(TP@N+ FP@N) Hope to receive your reply soon :) Thanks & Best Regards!

MubarizZaffar commented 1 year ago

Hey, the Recall@N (RecallRate@N) as implemented in VPR-Bench follows the definition of existing literature, e.g., https://ieeexplore.ieee.org/document/5540009 but also see other references as mentioned in the VPR-Bench paper.

Can you share some literature where you have seen PrecisionRate@N, I could then help with a code for example.

MAX-OTW commented 1 year ago

Hi , @MubarizZaffar ,Recall@N and Precision@N: These metrics are widely applied in the field of information retrieval. They reveal how well the model performs in terms of FPs as well as FNs. I found them in Google search, as below： https://towardsdatascience.com/unveiling-the-precision-n-and-recall-n-in-recommender-system-7a4c6b69d060 ；https://medium.com/@m_n_malaeb/recall-and-precision-at-k-for-recommender-systems-618483226c54 ；https://insidelearningmachines.com/precisionk_and_recallk/ ; and also in this paper : https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=9525035

I have two questions： 1、Is Recall@N in VPR-Bench the same as Recall@N defined in these materials above? Can I evaluate Recall@N in the above materials using the code Recall@N in VPR-Bench? 2、Can Precision@N of the above materials be embedded in the VPR-Bench code? I feel similar to the following code, just need to make a little change on the basis of this code, how to achieve it? https://github.com/MubarizZaffar/VPR-Bench/blob/77ad30f3684476717188c67d7f3f66be19f832af/performance_comparison.py#L71C1-L104C1

Hope to receive your reply soon :) Best Regards!

stschubert commented 1 year ago

Hi all, For your interest, we continued this discussion here: https://github.com/stschubert/VPR_Tutorial/issues/9.

Best, Stefan

MubarizZaffar commented 11 months ago

Thanks @MAX-OTW, and sorry for the delay in my response. I have been busy with deadlines, but for (delayed) completeness I write down my thought process below on your question.

The question you have posed is fundamental for VPR and not specific to VPR-Bench. There are parallels between recommender systems and VPR, in the sense that in VPR we are recommending candidates for loop-closure. A relevant candidate is the one that leads to a correct loop-closure. Of course you can replace loop-closure here with other applications of VPR, but I choose this particular application for my explanation. I follow the definitions here for the remainder of this text.

The question that is being asked here is: what is a recommendation made by a VPR method and what is then a relevant recommendation. Every query image for which we can confidently output a matching database image is a recommendation, and it is considered relevant if that recommendation was a correct match. Note that in VPR, we have a ground-truth match for every query image, thus the total number of relevant candidates is equal to the total number of query images. Now if we consider that our VPR method is confident for all the query images, then the total number of recommended candidates also equals the total number of query images. This is what is currently happening in VPR when we talk about Precision@K or Recall@K: the total number of recommended items and relevant items is the same, and are equal to the total number of query images, and hence equations 1 and 2 in the article equate.

Generally, what I have found few people (and myself too some years back) confused about is that in VPR we also have a matching score (a confidence of the correctness of a VPR match) which is being used in the background but due to its marginalization its role is not evident. This matching score could quite as well be replaced with any other form of uncertainty estimate, for example, geometric verification or epistemic/aleatoric uncertainty. But well for now let's stick with the matching score as our confidence estimate. If we were to say that we are only going to recommend candidates if our confidence is above a particular threshold T, then the total number of recommended items will be lesser than the total number of query images, where the latter is the current practice in VPR. In such a scenario, your Precision@K and Recall@K would report different values.

Let's say we have 10 query images and for all of them there is a correct match in the database. Assume that we have a VPR model which correctly matched 8 out of these 10 query images, i.e., there are 8 recommendations which were relevant. Let us choose the K in Precision@K or Recall@K as K=1, i.e., we will only consider the relevance of our top-most (best matched) retrieved image. Let's say that for each query image we have a confidence score between [0-1]. If we were to choose the threshold T as 0, then the total number of recommended candidates will be 10, the total number of relevant items in these 10 recommended items is 8, and hence the Precision@1=Recall@1=0.8.

Now let's say we choose T=0.5, and there are only 6 query images for which the best matching candidate has a matching score above T, i.e., the total number of recommended items will be 6. Let's say that out of these 6 recommended items, only 4 were relevant (i.e., the best matching reference was a correct match), that gives us a Precision@1 of 4/6. But the Recall@1 will be 4/10, since the total number of recommended items that were relevant is 4 and the total number of relevant items is 10.

I hope this helps you. Many thanks.

MubarizZaffar / VPR-Bench

How to calculate PrecisionRateAtN? #15