cvangysel / pytrec_eval

pytrec_eval is an Information Retrieval evaluation tool for Python, based on the popular trec_eval.
http://ilps.science.uva.nl/
MIT License
282 stars 32 forks source link

use complete set of queries from relevance judgments (-c) #30

Closed seanmacavaney closed 2 years ago

seanmacavaney commented 3 years ago

The -c option in trec_eval does the following:

 --complete_rel_info_wanted:
 -c: Average over the complete set of queries in the relevance judgements  
     instead of the queries in the intersection of relevance judgements 
     and results.  Missing queries will contribute a value of 0 to all 
     evaluation measures (which may or may not be reasonable for a  
     particular evaluation measure, but is reasonable for standard TREC 
     measures.) Default is off.

Although the default in trec_eval is off, I think it would be prudent to default this value to on (and maybe give the user an option to turn it off). Without this, a user may accidentally average over an incomplete set of queries, e.g., if their engine doesn't return any results for a given query.

It doesn't look like this is as simple as setting:

self->epi_.average_complete_flag = 1;

because the setting only affects trec_eval's averages, not the individual per-query scores. A fix could be modifying the run dict before sending it to the relevance assessor, adding in any missing queries pointing to empty dicts.

seanmacavaney commented 2 years ago

This functionality (and others) are now available via the ir-measures wrapper.