Fine-tuned Cross-encoder scores are lower than ms-marco zero-shot scores ?

Hi @thakur-nandan, @nreimers

I am fine-tuning the either cross-encoder/ms-marco-electra-base or cross-encoder/ms-marco-MiniLM-L-12-v2 models on other IR collections (tree-covid or NQ). But the fine-tuned model scores are lower than zero-shot scores. I wonder if there's a domain shift in custom datasets or am I doing the training wrong ? I am using sentence-transformer cross-encoder APIs for training.

Since these pre-trained models trained in certain settings (hyper-parameters and model architecture with loss), are these models sensitive to those settings during fine-tuning as well ?

beir-cellar / beir

Fine-tuned Cross-encoder scores are lower than ms-marco zero-shot scores ? #156