Need to fine-tune pretained models?

finegan-dollak commented 2 years ago

Hi there, I find this work very interesting, and I was trying to replicate your results using the models you've shared on Huggingface. The bi-encoder models are behaving as expected; however, the cross-encoders are getting much lower scores than I expect on STS (results in the 30s-40s rather than 70s to 80s), which makes me think I'm missing a step.

Should the Huggingface pretrained models for STS work out of the box, or do I need to fine-tune them on the train set for each STS dataset?

The models at issue are:

trans-encoder-cross-simcse-roberta-base
trans-encoder-cross-simcse-roberta-large
trans-encoder-cross-simcse-bert-large
trans-encoder-cross-simcse-bert-base

Thanks for any advice you can give!

hardyqr commented 1 year ago

Hi, thanks for your interest.

I wonder how you were evaluating the cross-encoders? Since it is of a different formulation, one needs to concatenate a sentence pair into one string and input it into the model. Specifically, you can use our script:

>> python src/eval.py \
--model_name_or_path "cambridgeltl/trans-encoder-cross-simcse-roberta-large"  \
--mode cross \
--task sts_sickr

as mentioned in the readme (where mode specifies whether to evaluate in bi-encoder or cross-encode formulation).

Hope this is helpful.

finegan-dollak commented 1 year ago

That was absolutely the problem; thank you!

amzn / trans-encoder

Need to fine-tune pretained models? #8