UKPLab / sentence-transformers

State-of-the-Art Text Embeddings
https://www.sbert.net
Apache License 2.0
15.18k stars 2.47k forks source link

Asymmetric (two model) learning of pairs #3024

Open turian opened 2 days ago

turian commented 2 days ago

My understanding is that fine-tuning using sentence-transformers over pairs assumes that the pairs are symmetric. For example, using ContrastiveTensionLossInBatchNegatives.

This means that embed(anchor1) = embed(anchor2) and that d(anchor1, anchor2) = d(anchor2, anchor1).

However, there are many use-cases where the order of the sentences is important. For example: anchor1 is a SUMMARY of anchor2.

It would be great, optionally, to allow fine-tuning to create TWO fine-tuned models. One for embedding anchor1 and the second for embedding anchor2. This would make it easy to fine-tune on asymmetric tasks.

tomaarsen commented 1 day ago

Hello!

There are indeed symmetric losses in Sentence Transformers, but also a few asymmetric ones. For example MultipleNegativesRankingLoss, which is arguably the most common loss. This loss essentially trains:

If you have asymmetric data like summaries, then you might be able to train:

or

depending on what you choose as your first column. The resulting model becomes adept at the asymmetric task (e.g. information retrieval).


Some model authors use prefixes/prompts for each type of data, so that the model can distinguish e.g. a query of "what are pandas" and a document of "what are pandas". There's some docs on them here, but note that this only refers to inference. For training you have to manually prepend the prompts.

Sentence Transformers does not support dual model setups right now, and I don't think I'll be moving in that direction soon.