capreolus-ir / capreolus

A toolkit for end-to-end neural ad hoc retrieval
https://capreolus.ai
Apache License 2.0
95 stars 32 forks source link

[wip] adding nfcorpus #69

Closed crystina-z closed 4 years ago

crystina-z commented 4 years ago

issue #68

automatically download NF Corpus; now only the title and desc was added

todo:

  1. differentiate video title and description with "nontopic"-title?
  2. add train/dev/test.all.queries

current score: map=0.15 ndcg20=0.3053 (bm25grid, k=1.0 b=0.4, hits=1000)

andrewyates commented 4 years ago

I'm also getting an error when running this with a reranker

$ capreolus rerank.traineval with benchmark.name=nf rank.searcher.name=BM25 rank.searcher.b=0.4 rank.searcher.k1=1.5 reranker.name=KNRM
...
capreolus/extractor/__init__.py", line 167, in <dictcomp>
    self.qid2toks = {qid: tokenize(topics[qid]) for qid in qids}
KeyError: '1'
crystina-z commented 4 years ago
  • add info on which version of the collection we're using to the docstrings. I think this is all queries with marginally relevant as 0? ("2-1-0")
  • add a citation to the docstrings I'm also getting an error when running this with a reranker

yes these have been added/fixed

  • include the test qrel file separately, so it's easy to run trec_eval against for debugging? e.g. qrels.nf_test.txt

I'm slightly confused, that means moving the test qrels to another separated file? but in that case we would also need to change the code in rank (i.e. change eval_runs(test_runs, benchmark.qrels, metrics) to eval_runs(test_runs, benchmark.test_qrels, metrics) to get the eval result?

andrewyates commented 4 years ago

I mean leaving the existing qrels file as it currently is, but also creating an additional test qrels file like I did with qrels.antique_test.txt. The reason for this is that it's difficult to debug by running trec_eval on the full qrels file, because this computes metrics over both training and testing qids. It's a lot more convenient to run trec_eval qrels_test.txt .../searcher. This shortcut has been coming up a lot as I've been checking results for new collections.

crystina-z commented 4 years ago

scores received by running

python run.py rank.traineval with 
  metrics=map,ndcg_cut_1000 benchmark.name=nf benchmark.collection.name=nf searcher.name=BM25
  benchmark.labelrange=1-3 #  0-2)
  benchmark.fields=vid_desc # or "all_titles" "nontopics" "all_fields"  "vid_title"
  searcher.hits=1000 searcher.k1=1.2 searcher.b=0.75
qrel: 0-1-2          
  "all_fields" "all_titles" "nontopics" "vid_title" "vid_desc"
map 0.2139 0.1343 0.1343 0.1326 0.1505
ndcg 1k 0.4419 0.3274 0.327 0.3274 0.3867
qrel: 1-2-3          
  "all_fields" "all_titles" "nontopics" "vid_title" "vid_desc"
map 0.1974 0.1197 0.1197 0.1147 0.1279
ndcg 1k 0.4244 0.3088 0.3088 0.3064 0.3549