UKPLab / sentence-transformers

Multilingual Sentence & Image Embeddings with BERT
https://www.SBERT.net
Apache License 2.0
14.37k stars 2.39k forks source link

Any idea about labeling Duplicate Questions dataset task ? #1064

Open svjack opened 2 years ago

svjack commented 2 years ago

Use unsurprised method such as some simple threshold based method to labeling is straight forward (such as fuzzy wuzzy and other edit distance measure) Do you have other idea to extract some duplicates from a collection of sentences ?

nreimers commented 2 years ago

Have a look here: https://www.sbert.net/examples/applications/clustering/README.html https://www.sbert.net/examples/applications/paraphrase-mining/README.html