beir-cellar / beir

A Heterogeneous Benchmark for Information Retrieval. Easy to use, evaluate your models across 15+ diverse IR datasets.
http://beir.ai
Apache License 2.0
1.49k stars 177 forks source link

Can't explain low results for some models #131

Open NebelAI opened 1 year ago

NebelAI commented 1 year ago

Hey,

it is the second time I encounter low results for specific models. In short, I once trained deepset/gbert-base with train_msmarco_v3_margin_MSE.py and it worked like a charm. Then I tried the large version (deepset/gbert-large) and all results created with evaluate_sbert.pywere almost zero (NDCG@1/5/10/100/1000 = 0.001...). Again, the base model created good results.

Now I did the same with xlm-roberta-base which again created good results. Usingmicrosoft/xlm-align results in bad results again. What do I miss here? Are some models not technically feasible?

NouamaneTazi commented 11 months ago

Are you suspecting a problem in training or evaluation?