Georgetown-IR-Lab / cedr

Code for CEDR: Contextualized Embeddings for Document Ranking, accepted at SIGIR 2019.
MIT License
156 stars 28 forks source link

where is the documents.tsv file #20

Closed WenTingTseng closed 4 years ago

WenTingTseng commented 4 years ago

When I try to train a Vanilla BERT model,using like this

python3 train.py \
  --model vanilla_bert \
  --datafiles data/wt/queries.tsv   data/wt/document.tsv\
  --qrels data/wt/qrels \
  --train_pairs data/wt/train.wt12.pairs \
  --valid_run data/wt/valid.wt12.run \
  --model_out_dir models/vbert

But it does not have document.tsv file and --train_pairs data/ws/train_pairs it has many train_pairs files like train.wt12.pairs or train.wt13.pairs which I need to set.Same question about --valid_run data/valid_run

Thanks a lot for your help

seanmacavaney commented 4 years ago

Regarding documents, see #9

Regrading pairs/runs: You choose the training pairs and validation pairs based on which data sets you want to train and validate on. So if you want to train on WT 2012 and validate on WT 2013, you would use train.wt12.pairs and test.wt13.run.

Does this answer your question?

WenTingTseng commented 4 years ago

Ok , I understand Thanks a lot