elastic / rally-tracks

Track specifications for the Elasticsearch benchmarking tool Rally
19 stars 181 forks source link

Add recall and NDCG operations in msmarco-v2-vector #610

Closed jimczi closed 4 months ago

jimczi commented 5 months ago

This change adds an operation called knn-recall that computes the following metrics:

The new queries-recall.json file contains all the queries (76 in total) from the testing set along with their embeddings and the top 1000 ids computed with brute force over the entire corpus. For the relevance metrics, the qrels.tsv file contains annotations for all the queries listed in queries-recall.json. This file is generated from the original training data available at ir_datasets/msmarco_passage_v2.