UKPLab / sentence-transformers

State-of-the-Art Text Embeddings
https://www.sbert.net
Apache License 2.0
15.21k stars 2.47k forks source link

How to prepare label for the dataset that has two pairs of text, but not labels? #2254

Open Yarmohamadshr opened 1 year ago

Yarmohamadshr commented 1 year ago

Hi,

Thank you for the great information, I have a question. My data has two column of texts, one as description of a request, the other one like an answer for that request. I want to use the Contrasiveloss to make the pairs of request and answer close and the other answer that are not related far, but I do not know how to provide the label for my positive pairs, and negative one, because the dataset function accept is a triple like this calling InputExample:

(a1,b1,1) (a1,bi,0)

I appreciate your help.

carlesoctav commented 1 year ago

Maybe just use triplet loss? I think it's not possible without starting manual labeling first to use contrastive loss.

With triplet loss, you can use other texts in other rows as negatives, or you can perform hard mining on it using BM25