UKPLab / sentence-transformers

State-of-the-Art Text Embeddings
https://www.sbert.net
Apache License 2.0
14.83k stars 2.44k forks source link

How to train Semantic Textual Similarity on a dataset without a score entry? #399

Closed abozanona closed 4 years ago

abozanona commented 4 years ago

In Semantic Textual Similarity training, you're using STSbenchmark dataset, which has two narratives and a score from 0 to 5 to indicate the similarity between the two narratives.

I have a large dataset that contains only two narratives. The two narratives are considered to be talking about the same idea(They all score 5 out of 5).

How can I train the model on a dataset where all entries have the score 5?

nreimers commented 4 years ago

See: https://www.sbert.net/docs/package_reference/losses.html#multiplenegativesrankingloss https://www.sbert.net/examples/training/quora_duplicate_questions/README.html