UKPLab / sentence-transformers

State-of-the-Art Text Embeddings
https://www.sbert.net
Apache License 2.0
15.19k stars 2.47k forks source link

Retraining the model #118

Open puttapraneeth opened 4 years ago

puttapraneeth commented 4 years ago

Hi,

I would like to retrain this model with any computer language like Java. I don't have a data set with Java information I just have a PDF, the model should read the data from that PDF file and get trained. Kindly provide a sample on how to achieve this. Thanks in advance.

Regards, Praneeth

nreimers commented 4 years ago

Hi, I'm afraid that is not possible.

Training is a bit more complex. You need some structure for sentence pairs, that indicate your similarity between sentences. Without this structure, you cannot train the model.

The easiest way is the "Skip-Thought" idea: Sentence that appear directly after each other in a document are similar, sentences appearing in different documents or at different positions are dissimilar. However, there the results can be mixed.

Training in general is quite a complex thing.

Best Nils Reimers