UKPLab / sentence-transformers

State-of-the-Art Text Embeddings
https://www.sbert.net
Apache License 2.0
14.78k stars 2.43k forks source link

How to set up an asymmetric model from a pretrained sbert model #1807

Open emilysilcock opened 1 year ago

emilysilcock commented 1 year ago

Hello,

Thanks for the great and easy-to-use repo!

I am trying to set up an asymmetric model, with both sides soft started from a pretrained sbert model. From reading through other questions on here, there seem to be two different ways of doing this.

I can either use the Asym model by itself:

context_model = SentenceTransformer('all-MiniLM-L12-v2')
first_para_model = SentenceTransformer('all-MiniLM-L12-v2')

asym_model = sentence_transformers.models.Asym({'CTX': [context_model], 'FP': [first_para_model]})
model = SentenceTransformer(modules=[asym_model])

Or I can set this up with a word embedding model, a pooling layer etc.:

word_embedding_model = sentence_transformers.models.Transformer("sentence-transformers/all-MiniLM-L12-v2")

pooling_model = sentence_transformers.models.Pooling(word_embedding_model.get_word_embedding_dimension())

context_model = sentence_transformers.models.Dense(word_embedding_model.get_word_embedding_dimension(), 256, bias=False, activation_function=nn.Identity())
first_para_model = sentence_transformers.models.Dense(word_embedding_model.get_word_embedding_dimension(), 256, bias=False, activation_function=nn.Identity())

asym_model = sentence_transformers.models.Asym({'CTX': [context_model], 'FP': [first_para_model]})
model = SentenceTransformer(modules=[word_embedding_model, pooling_model, asym_model])

I don't quite understand how/whether these two are different. A lot of the second version seems fairly redundant.

Also, do any of the inbuilt evaluators work with the asymmetric set up?

And I've well noted that you've mentioned elsewhere that generally the asymmetric models don't work as well as symmetric ones!

Thanks

creatorrr commented 11 months ago

Any thoughts on this @nreimers ?

maichh commented 8 months ago

Thanks for the incredible library! @nreimers I have the exact same questions above. Also, what's the reason that asymmetric models don't work as well as symmetric ones? Like even for very asymmetric text pairs or even different input modalities? Thanks.

maichh commented 8 months ago

@emilysilcock, after some exploration on my own, I think I can answer your first question. The first method you posted would train asymmetric models that have different transformer model, pooling and dense layers all the way through. The second method, however, would have transformer and pooling layers shared, while dense models being different.