beir-cellar / beir

A Heterogeneous Benchmark for Information Retrieval. Easy to use, evaluate your models across 15+ diverse IR datasets.
http://beir.ai
Apache License 2.0
1.55k stars 186 forks source link

How did you evaluate ColBERT on BEIR benchmark? #77

Closed jordane95 closed 2 years ago

jordane95 commented 2 years ago

Hello,

I have a little question regarding the ColBERT evaluation. ColBERT is trained on MS MARCO, where the passage in collection.tsv only contains one field. They prepend it with a special [D] token before feeding into the model to get token level dense representation.

However, for datasets in BEIR, passage in corpus.jsonl also contains a title field besides the text field.

May I know how did you evaluate ColBERT on your BEIR benchmark? In other words, how did you handle the title and text field in your BEIR datasets to fit the ColBERT input?

thakur-nandan commented 2 years ago

Hi @jordane95, I just released the ColBERT evaluation code on the BEIR benchmark today, you can find more details in this repository: https://github.com/NThakur20/beir-ColBERT.

I simply concatenate the title and text with a single whitespace and treat them as passages. More details can be found within the repository!

Kind Regards, Nandan Thakur