Closed jordane95 closed 2 years ago
Hi @jordane95, I just released the ColBERT evaluation code on the BEIR benchmark today, you can find more details in this repository: https://github.com/NThakur20/beir-ColBERT.
I simply concatenate the title and text with a single whitespace and treat them as passages. More details can be found within the repository!
Kind Regards, Nandan Thakur
Hello,
I have a little question regarding the ColBERT evaluation. ColBERT is trained on MS MARCO, where the passage in
collection.tsv
only contains one field. They prepend it with a special [D] token before feeding into the model to get token level dense representation.However, for datasets in BEIR, passage in
corpus.jsonl
also contains atitle
field besides thetext
field.May I know how did you evaluate ColBERT on your BEIR benchmark? In other words, how did you handle the title and text field in your BEIR datasets to fit the ColBERT input?