Closed jihyukkim-nlp closed 2 years ago
Hi @jihyukkim-nlp,
For training the ColBERT retriever, we used the same training configurations as the original default training code mentioned here: https://github.com/stanford-futuredata/ColBERT#training, with just one change of --doc_maxlen 300
instead of 180
.
Our training triplets were official MSMARCO train triplets.
Kind Regards, Nandan Thakur
Thanks for the heads-up :)
If it helps, you can find my ColBERT model checkpoint here: https://public.ukp.informatik.tu-darmstadt.de/thakur/BEIR/models/ColBERT/msmarco.psg.l2.zip
Kind Regards, Nandan Thakur
Thank you for sharing. I found the url in the paper. This helped me a lot.
But, I wanted to further analyze training process of ColBERT under the same training configuration. Now, I can do this. Thanks!
Best regards, Jihyuk Kim
Hi @thakur-nandan , is there a reference on how many NUM_PARTITIONS for ColBERT faiss search is used for each BEIR datasets ? The default is set to 32768 in their original repo, but 96 was given in your evaluation script (https://github.com/thakur-nandan/beir-ColBERT/blob/91190882deac1792c78b3c33d51be9edaa9c6805/evaluate_beir.sh#L26) . I wonder if that was changed for each datasets.
Thank you for sharing this work!
Could you share the training configuration for ColBERT retriever?
Thanks in advance