I am fine-tuning the either cross-encoder/ms-marco-electra-base or cross-encoder/ms-marco-MiniLM-L-12-v2 models on other IR collections (tree-covid or NQ). But the fine-tuned model scores are lower than zero-shot scores. I wonder if there's a domain shift in custom datasets or am I doing the training wrong ? I am using sentence-transformer cross-encoder APIs for training.
Since these pre-trained models trained in certain settings (hyper-parameters and model architecture with loss), are these models sensitive to those settings during fine-tuning as well ?
Hi @thakur-nandan, @nreimers
I am fine-tuning the either
cross-encoder/ms-marco-electra-base
orcross-encoder/ms-marco-MiniLM-L-12-v2
models on other IR collections (tree-covid or NQ). But the fine-tuned model scores are lower than zero-shot scores. I wonder if there's a domain shift in custom datasets or am I doing the training wrong ? I am using sentence-transformer cross-encoder APIs for training.Since these pre-trained models trained in certain settings (hyper-parameters and model architecture with loss), are these models sensitive to those settings during fine-tuning as well ?