Georgetown-IR-Lab / cedr

Code for CEDR: Contextualized Embeddings for Document Ranking, accepted at SIGIR 2019.
MIT License
155 stars 28 forks source link

How were the train pairs for Robust made? #42

Closed nirmal2k closed 2 years ago

nirmal2k commented 2 years ago

I couldn't find some training pairs in qrels. Also is eval data for robust used as training data?

seanmacavaney commented 2 years ago

Training pairs and qrels are found here: https://github.com/Georgetown-IR-Lab/cedr/tree/master/data

We used the standard Robust04 5-folds from Huston & Croft (2014).

I recommend using OpenNIR instead of this repository.

nirmal2k commented 2 years ago

Can you please confirm that the training is done on eval qrels? Thanks

seanmacavaney commented 2 years ago

We used 5-fold cross validation over Robust04. So in other words, 5 models were trained, each using 3 folds for training, one for tuning (e.g., tuning the training epoch), and the final one held out for evaluation.