Motivation

It is often difficult to estimate the total number of relevant document for a query. For example, in Question Answering, if you have a large enough Knowledge Base, you can find the answer to your question in a surprisingly large number of documents that one cannot annotate in advance. Because of this, the relevance of the document is often estimated on-the-go, by checking whether the answer string is in the document retrieved by the system.

Because of this, recall is not an appropriate metric. However, one way to circumvent this is to compute recall "as if" there was only a single relevant document. After averaging over the whole dataset, it corresponds to the proportion of question for which the system retrieved at least one relevant document in top-K. This is what @osf9018 and I call "hits@K" (I can’t remember but I’ve seen it in a paper) and others, such as Karpukhin et al., call "accuracy". Accuracy is a confusing term IMO.

The request

Would you be interested in implementing or integrating this feature in your library? It might take some renaming but it could be implemented very easily by using the _hits function. It is simply min(1, _hits(qrels, run, k))

AmenRa commented 2 years ago

Hi, I can add it to the pool of the provided metrics for sure! :)

I'm just not confident about how I should call it. Could success_rate or hit_rate be appropriate? I can even call it hits and rename or hide the current hits metric.

What do you think?

PaulLerner commented 2 years ago

hit_rate seems fine, as you want really :)

PaulLerner commented 2 years ago

Hi,

I will need this feature pretty soon, do you plan to implement it soon? Otherwise could you provide instructions so that I implement it myself?

Bests,

Paul

AmenRa commented 2 years ago

Hi,

Added in 0.1.9 as hit_rate. It supports at k as usual.

Closing.

PaulLerner commented 2 years ago

Should probably update report:

Traceback (most recent call last):
  File "/gpfswork/rech/fih/usl47jg/miniconda3/envs/meerqat/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/gpfswork/rech/fih/usl47jg/miniconda3/envs/meerqat/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/gpfsdswork/projects/rech/fih/usl47jg/meerqat/meerqat/ir/metrics.py", line 165, in <module>
    compare(args['--qrels'], args['<run>'], output_path=args['--output'], **kwargs)
  File "/gpfsdswork/projects/rech/fih/usl47jg/meerqat/meerqat/ir/metrics.py", line 136, in compare
    print(report)
  File "/gpfsdswork/projects/rech/fih/usl47jg/ranx/ranx/report.py", line 202, in __str__
    return self.to_table()
  File "/gpfsdswork/projects/rech/fih/usl47jg/ranx/ranx/report.py", line 75, in to_table
    for x in list(list(self.results.values())[0].keys())
  File "/gpfsdswork/projects/rech/fih/usl47jg/ranx/ranx/report.py", line 75, in <listcomp>
    for x in list(list(self.results.values())[0].keys())
  File "/gpfsdswork/projects/rech/fih/usl47jg/ranx/ranx/report.py", line 59, in get_metric_label
    return f"{metric_labels[m_splitted[0]]}@{m_splitted[1]}"
KeyError: 'hit_rate'

PaulLerner commented 2 years ago

this fixes it. I can open a PR https://github.com/PaulLerner/ranx/commit/f9a67510488ab43af6c3dfa539614a76e4149c91

AmenRa commented 2 years ago

Fixed in 0.1.10. Sorry for the inconvenience.

AmenRa / ranx

feature request: hits (or accuracy?) #7

Motivation

The request