Closed osf9018 closed 2 years ago
Hi Oliver,
Thanks for your interest in rank_eval
and the kind words.
Yesterday, when I did the last commit, I noticed something was off in the code there! I'm going to address the problem in the next few days and come back to you.
Thanks for your feedback.
Have a good one,
Elias
@osf9018 the issue is now fixed.
Thanks again for your feedback!
Closing.
Hi Elias,
Thanks for the fix.
Until recently, I focused mainly on recall, MAP, R-precision and @. but since one of my students is using @. in the context of question answering, I also consider including it in the measures I use. If I don't make a mistake, you define @.*** as "the number of relevant documents retrieved". It is no so easy to find a reference definition for this measure but I have the feeling that it is most likely defined as the fraction of queries for which at least one relevant document is found among its top-k retrieved documents. At the query level, it is a binary measure : 0 is none relevant document is found among the top-k retrieved documents and 1 if at least one relevant document is found.
I agree that in Information Retrieval, we can speak about the number of hits for referring to the number of relevant documents among the documents retrieved for a query but I have the feeling that @. generally refers to a measure with values in the range [0,1]. Did you define your @. measure with a specific reference in mind?
Best regards,
Olivier
Le mer. 1 déc. 2021 à 19:15, Elias Bassani @.***> a écrit :
@osf9018 https://github.com/osf9018 the issue is now fixed.
Thanks again for your feedback!
Closing.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/AmenRa/rank_eval/issues/2#issuecomment-983929962, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACS37TCNACMCBABQQRLD4H3UOZQ55ANCNFSM5JBG6I7Q . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
Sorry Oliver, I do not understand what ***@***.***
means.
Could you clarify, please?
Hi Elias,
I am not sure to understand what you don't understand since I don't see any @ in my message :-) You mean @.***? More globally, my message was just about the definition of the hits measure.
Olivier
Le lun. 6 déc. 2021 à 09:33, Elias Bassani @.***> a écrit :
Sorry Oliver, I do not understand what @.*** means. Could you clarify, please?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/AmenRa/rank_eval/issues/2#issuecomment-986549808, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACS37TDZFAC74BSYKKETME3UPRYN7ANCNFSM5JBG6I7Q . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
Sorry Oliver, I am confused: I see ***@***.***
in your messages on four different browsers on two devices.
I think I probably called that measure Hits
because it is the (mean) number of relevant documents retrieved for each query.
It is an integer value for each query, not a boolean one.
I am sure I saw it in some paper, but I do not recall which one at the moment.
It is a sub-function of other metrics, so I decided to expose it if someone wants to use it, but maybe I should hide it as it is not that useful anyway.
For example, precision
is something like:
def precision(qrels, run, k):
# If k is 0 use the number of retrieved documents
k = k if k != 0 else len(run)
return hits(qrels, run, k) / k
while recall
is something like:
def recall(qrels, run, k):
# If k is 0 use the number of retrieved documents
# In this case k is used just to avoid useless computations
# as we divide the number of retrieved relevant documents (hits)
# by the number of relevant documents later
k = k if k != 0 else len(run)
return hits(qrels, run, k) / len(qrels)
You can use it for an analysis purpose if you want, but I suggest you stick with the other metrics for scientific evaluation / comparison.
Hi Elias,
Le jeu. 9 déc. 2021 à 15:45, Elias Bassani @.***> a écrit :
Sorry Oliver, I am confused: I see @.*** in your messages on four different browsers on two devices.
Funny. I guess it is perhaps the message passed through a github email address and @ is interpreted as something special in this context.
I think I probably called that measure Hits because it is the (mean) number of relevant documents retrieved for each query. It is an integer value for each query, not a boolean one. I am sure I saw it in some paper, but I do not recall which one at the moment.
It is a sub-function of other metrics, so I decided to expose it if someone wants to use it, but maybe I should hide it as it is not that useful anyway.
For example, precision is something like:
def precision(qrels, run, k):
If k is 0 use the number of retrieved documents
k = k if k != 0 else len(run) return hits(qrels, run, k) / k
while recall is something like:
def recall(qrels, run, k):
If k is 0 use the number of retrieved documents
# In this case k is used just to avoid useless computations # as we divide the number of retrieved relevant documents (hits) # by the number of relevant documents later k = k if k != 0 else len(run) return hits(qrels, run, k) / len(qrels)
You can use it for an analysis purpose if you want, but I suggest you stick with the other metrics for scientific evaluation / comparison.
I see your point and it was already what I understood from your code. Perhaps, it can be less confusing to change the name of this metric in https://github.com/AmenRa/rank_eval/blob/76b6e241b4c8a860e72c305d95204e5bc04d20bf/rank_eval/meta_functions.py#L29 into something like n_hits or something like that.
if metric == "hits": return hits
--> if metric == "n_hits": return hits
Only a suggestion for avoiding misunderstanding from people who know "hits at k" as a metric with the definition I mentioned. But it is not essential.
Best regards,
Olivier
Hi Oliver,
Thanks again for your feedback. I will take your suggestion into consideration.
I also inform you that my tool is going to change name soon because of naming similarities with other tools.
The new name is ranx
. You can already install the library with pip install ranx
.
Best regards,
Elias
Hi,
I tested your code and found that it was easy to use and integrate. Moreover, the results I got are fully coherent with those I previously obtained with a personal implementation of trec_eval and the computation of the measures is fast. This is clearly an interesting software and its presentation to the demo session of ECIR 2022 is a good thing.
I had only a problem with the R-precision measure. The main problem is that if you replace ""ndcg@5" in the 4th cell of the overview.ipynb notebook, you get:
`
TypeError Traceback (most recent call last) /tmp/ipykernel_28676/2318072837.py in
1 # Compute NDCG@5
----> 2 evaluate(qrels, run, "r-precision")
/vol/data/ferret/tools-distrib/_research_code/rank_eval/rank_eval/meta_functions.py in evaluate(qrels, run, metrics, return_mean, threads, save_results_in_run) 149 for m, scores in metric_scores_dict.items(): 150 for i, q_id in enumerate(run.get_query_ids()): --> 151 run.scores[m][q_id] = scores[i] 152 # Prepare output ----------------------------------------------------------- 153 if return_mean:
TypeError: 'numpy.float64' object does not support item assignment `
I first detected the problem through the integration of your code and obtained the same error. By looking at the file meta_functions.py where the problem arises:
I saw your recent last update of this part of the code but there is still a problem since for R-precision, the mean of the scores is stored in run.score and not in run.mean_scores. As a consequence, the use of run.scores for storing the score of each query raises a problem if both return_mean and save_results_in_run flags are set to True. More globally, I am not sure to understand why you differentiate R-precision from the other measures concerning the computation of the mean score.
Thank you by advance for your efforts for fixing the issue.
Olivier