Open turian opened 2 days ago
Hello!
There are indeed symmetric losses in Sentence Transformers, but also a few asymmetric ones. For example MultipleNegativesRankingLoss, which is arguably the most common loss. This loss essentially trains:
If you have asymmetric data like summaries, then you might be able to train:
or
depending on what you choose as your first column. The resulting model becomes adept at the asymmetric task (e.g. information retrieval).
Some model authors use prefixes/prompts for each type of data, so that the model can distinguish e.g. a query of "what are pandas" and a document of "what are pandas". There's some docs on them here, but note that this only refers to inference. For training you have to manually prepend the prompts.
Sentence Transformers does not support dual model setups right now, and I don't think I'll be moving in that direction soon.
My understanding is that fine-tuning using sentence-transformers over pairs assumes that the pairs are symmetric. For example, using ContrastiveTensionLossInBatchNegatives.
This means that
embed(anchor1) = embed(anchor2)
and thatd(anchor1, anchor2) = d(anchor2, anchor1)
.However, there are many use-cases where the order of the sentences is important. For example:
anchor1
is a SUMMARY ofanchor2
.It would be great, optionally, to allow fine-tuning to create TWO fine-tuned models. One for embedding
anchor1
and the second for embeddinganchor2
. This would make it easy to fine-tune on asymmetric tasks.