castorini / tct_colbert

6 stars 4 forks source link

About training with TREC 2019 #2

Open mhd0528 opened 2 years ago

mhd0528 commented 2 years ago

Hi, I'm trying to reproduce your result on TREC 2019 task and I have noticed that TREC actually includes several tasks while it's based on MS MARCO dataset. So may I ask which part of TREC 2019 did you use in your experiments? Thanks!

jacklin64 commented 2 years ago

Hi, @mhd0528, thanks for your question. To clarify, here TREC 2019 means TREC 2019 DL (deep learning track: https://microsoft.github.io/msmarco/TREC-Deep-Learning-2019). In addition, we did not specifically train TCT for TREC 2019 since the query set is relative small for training. The most common approach is to train models on MS MARCO passage retrieval dataset and directly conduct retrieval on TREC DL queries.

mhd0528 commented 2 years ago

Hi @jacklin64, thanks for the reply. Then may I ask if you can point me to the query file(s) you used for TREC DL? I found there are 3 test files on TREC-2019's page for passage retrieval task. Thanks again.

jacklin64 commented 2 years ago

This is the one we use: (https://msmarco.blob.core.windows.net/msmarcoranking/msmarco-test2019-queries.tsv.gz and the according qrels https://trec.nist.gov/data/deep/2019qrels-pass.txt)

mhd0528 commented 2 years ago

Ok, thanks so much for helping!