How did you evaluate ColBERT on BEIR benchmark?

beir-cellar / beir

A Heterogeneous Benchmark for Information Retrieval. Easy to use, evaluate your models across 15+ diverse IR datasets.

Apache License 2.0

1.55k stars 186 forks source link

Hello,

I have a little question regarding the ColBERT evaluation. ColBERT is trained on MS MARCO, where the passage in collection.tsv only contains one field. They prepend it with a special [D] token before feeding into the model to get token level dense representation.

However, for datasets in BEIR, passage in corpus.jsonl also contains a title field besides the text field.

May I know how did you evaluate ColBERT on your BEIR benchmark? In other words, how did you handle the title and text field in your BEIR datasets to fit the ColBERT input?

beir-cellar / beir

How did you evaluate ColBERT on BEIR benchmark? #77