UKPLab / sentence-transformers

State-of-the-Art Text Embeddings
https://www.sbert.net
Apache License 2.0
14.87k stars 2.44k forks source link

What is the difference between training(https://www.sbert.net/docs/training/overview.html#training-data) and unsupervised learning #1070

Open SAIVENKATARAJU opened 3 years ago

SAIVENKATARAJU commented 3 years ago

Hi,

I have some bunch of PDF's and I am building a QnA system from the pdf's. Currently, I am using deepset/haystack repo for the same task.

My doubt is if we want to generate embeddings for my text which training I should do, what is the difference as both approaches mostly takes sentences right?

nreimers commented 3 years ago

Unsupervised training does not yet work well. Unsupervised learning is still in an active research phase and the performances are not yet as good as the performances of the pre-trained models.

There are several pre-trained models, that work quite well for most use cases. If you need better performances, I recommend to create training data and to do supervised training on it.

SAIVENKATARAJU commented 3 years ago

But when i checking the documentation, w.r.t data making is same for the both the approaches. especially for the semantic similarity. just confused what makes difference between these two approaches

nreimers commented 3 years ago

For the one you have some labeled or structured data, which you exploit, like (question, answer) pairs.

For unsupervised approaches, you just have text without any labels or structure.