Can't explain low results for some models

Hey,

it is the second time I encounter low results for specific models. In short, I once trained deepset/gbert-base with train_msmarco_v3_margin_MSE.py and it worked like a charm. Then I tried the large version (deepset/gbert-large) and all results created with evaluate_sbert.pywere almost zero (NDCG@1/5/10/100/1000 = 0.001...). Again, the base model created good results.

Now I did the same with xlm-roberta-base which again created good results. Usingmicrosoft/xlm-align results in bad results again. What do I miss here? Are some models not technically feasible?

beir-cellar / beir

Can't explain low results for some models #131