Couldn't find any documentation about Qrel and run score range

AmitPoonia commented 9 months ago

Hi,

In the current api documentation I see some examples of how to structure qrel and run dicts, but there is no detail about what should be the range of qrel relevance score, I assume its 0 for non-relevant cases and 1 or more for graded relevance, but is there a upper limit to integer value?

Similarly, for run score which is a float type, I assume the range is 0.0 to 1.0?

Thank you.

AmenRa commented 9 months ago

Hi and thanks for your question.

qrels scores can be any integer, positive or negative. All the implemented metrics but bpref only considers relevant documents, i.e. discard all qrels < 1. You can change this threshold by adding -ln (n is a parameter) to a metric name when using evaluate and compare methods (e.g., recall-l2 means that all qrels < 2 are considered as non-relevant when computing the recall score). All the implemented metrics but DCG and nDCG consider relevance to be binary, i.e., graded scores are not taken into account.

run scores can be any float, positive or negative. They are used only for sorting as they are model-dependent.

Let me know if you have further questions. Please consider giving ranx a star if you like it.

AmitPoonia commented 9 months ago

Thank you very much.

AmenRa / ranx

Couldn't find any documentation about Qrel and run score range #58