Closed PaulLerner closed 2 years ago
Hi, I can add it to the pool of the provided metrics for sure! :)
I'm just not confident about how I should call it.
Could success_rate
or hit_rate
be appropriate?
I can even call it hits
and rename or hide the current hits
metric.
What do you think?
hit_rate seems fine, as you want really :)
Hi,
I will need this feature pretty soon, do you plan to implement it soon? Otherwise could you provide instructions so that I implement it myself?
Bests,
Paul
Hi,
Added in 0.1.9
as hit_rate
.
It supports at k
as usual.
Closing.
Should probably update report:
Traceback (most recent call last):
File "/gpfswork/rech/fih/usl47jg/miniconda3/envs/meerqat/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "/gpfswork/rech/fih/usl47jg/miniconda3/envs/meerqat/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/gpfsdswork/projects/rech/fih/usl47jg/meerqat/meerqat/ir/metrics.py", line 165, in <module>
compare(args['--qrels'], args['<run>'], output_path=args['--output'], **kwargs)
File "/gpfsdswork/projects/rech/fih/usl47jg/meerqat/meerqat/ir/metrics.py", line 136, in compare
print(report)
File "/gpfsdswork/projects/rech/fih/usl47jg/ranx/ranx/report.py", line 202, in __str__
return self.to_table()
File "/gpfsdswork/projects/rech/fih/usl47jg/ranx/ranx/report.py", line 75, in to_table
for x in list(list(self.results.values())[0].keys())
File "/gpfsdswork/projects/rech/fih/usl47jg/ranx/ranx/report.py", line 75, in <listcomp>
for x in list(list(self.results.values())[0].keys())
File "/gpfsdswork/projects/rech/fih/usl47jg/ranx/ranx/report.py", line 59, in get_metric_label
return f"{metric_labels[m_splitted[0]]}@{m_splitted[1]}"
KeyError: 'hit_rate'
this fixes it. I can open a PR https://github.com/PaulLerner/ranx/commit/f9a67510488ab43af6c3dfa539614a76e4149c91
Fixed in 0.1.10
. Sorry for the inconvenience.
Hi,
@osf9018 mentioned it in #2 but I guess it’s better to create a specific issue.
Motivation
It is often difficult to estimate the total number of relevant document for a query. For example, in Question Answering, if you have a large enough Knowledge Base, you can find the answer to your question in a surprisingly large number of documents that one cannot annotate in advance. Because of this, the relevance of the document is often estimated on-the-go, by checking whether the answer string is in the document retrieved by the system.
Because of this, recall is not an appropriate metric. However, one way to circumvent this is to compute recall "as if" there was only a single relevant document. After averaging over the whole dataset, it corresponds to the proportion of question for which the system retrieved at least one relevant document in top-K. This is what @osf9018 and I call "hits@K" (I can’t remember but I’ve seen it in a paper) and others, such as Karpukhin et al., call "accuracy". Accuracy is a confusing term IMO.
The request
Would you be interested in implementing or integrating this feature in your library? It might take some renaming but it could be implemented very easily by using the
_hits
function. It is simplymin(1, _hits(qrels, run, k))