Closed AmitPoonia closed 9 months ago
Hi and thanks for your question.
qrels
scores can be any integer, positive or negative.
All the implemented metrics but bpref
only considers relevant documents, i.e. discard all qrels
< 1. You can change this threshold by adding -ln
(n
is a parameter) to a metric name when using evaluate
and compare
methods (e.g., recall-l2
means that all qrels
< 2 are considered as non-relevant when computing the recall
score).
All the implemented metrics but DCG
and nDCG
consider relevance to be binary, i.e., graded scores are not taken into account.
run
scores can be any float, positive or negative. They are used only for sorting as they are model-dependent.
Let me know if you have further questions.
Please consider giving ranx
a star if you like it.
Thank you very much.
Hi,
In the current api documentation I see some examples of how to structure
qrel
andrun
dicts, but there is no detail about what should be the range ofqrel
relevance score, I assume its 0 for non-relevant cases and 1 or more for graded relevance, but is there a upper limit to integer value?Similarly, for
run
score which is a float type, I assume the range is 0.0 to 1.0?Thank you.