UKPLab / sentence-transformers

State-of-the-Art Text Embeddings
https://www.sbert.net
Apache License 2.0
14.88k stars 2.44k forks source link

which loss to use when a query has one positive and multiple negative examples #264

Open luomancs opened 4 years ago

luomancs commented 4 years ago

Hi,

In my data, each query has one positive example and multiple examples. an example looks like this, Query1, positive s0, negative s1, negative s2..., label is 1,0,0... Query2, negative s0, positive s1, negative s2..., label is 0,1,0...

softmaxloss seems to be the best one to use, but the inputs of this are only two sentences, however, in my example, there are 1 query and 10 sentences.

would you give some suggestion? Thank you

nreimers commented 4 years ago

Hi @luomancs Not sure if softmax loss would make sense.

It sounds like triplet loss. Currently, it only accepts only triplets, but you can change the dataset so that it looks like this: query1, pos s0, neg s1 query1, pos s0, neg s2 ... query2, pos s1, neg s0 query2, pos s1, neg s2

You could also update the triplet loss function to expect an input like: anchor, positive, negative1, negative2, negative3

... Then you could check which negative is the closest to the anchor and compare this with the distance of the positive to the anchor.

Best Nils Reimers