UKPLab / sentence-transformers

Multilingual Sentence & Image Embeddings with BERT
https://www.SBERT.net
Apache License 2.0
14.72k stars 2.43k forks source link

print similarity score between two sentences #295

Open Mahmedturk opened 4 years ago

Mahmedturk commented 4 years ago

How can I use SBERT to find and print semantic similarity score between a pair of sentences? I have a list of pair of sentences where I want to find semantic similarity between sentences and choose the one with highest similarity.

nreimers commented 4 years ago

Have a look here: https://github.com/UKPLab/sentence-transformers/blob/master/examples/applications/semantic_search.py

Mahmedturk commented 4 years ago

@nreimers

Can this model be fine-tuned on Bio-BERT?

nreimers commented 4 years ago

Hi @Mahmedturk Any model from HuggingFace Transformers can be used as a basis. I think this includes also BioBERT

Best Nils Reimers

Mahmedturk commented 3 years ago

@nreimers can you please tell me how i can use biobert as a basis? where to start after saving downloading biobert model?

nreimers commented 3 years ago

See: https://www.sbert.net/docs/training/overview.html#creating-networks-from-scratch

Mahmedturk commented 3 years ago

@nreimers Thanks. does the input data need to be trained like this "texts=['My first sentence', 'My second sentence'], label=0.8" manually? I mean the sentence pairs with their corresponding label?

nreimers commented 3 years ago

Yes. BERT out of the box produces rather bad sentence representation. I don't expect that BioBERT produces any better representations

Mahmedturk commented 3 years ago

I think i was not clear enough sorry.. i mean to say that how do i label a pair of sentences a a similarity score of 0.3 or 0.8? does that need to be done manually?

nreimers commented 3 years ago

You can either label it by hand or you can exploit some pattern in your data that tells you which sentence pairs are similar.

For example, assume you have publications and you know which publications are referring to each other. You can assume that the title of a publication that refers to another publication is more similar than this title vs. another random publication title.

You can use this information and fine-tune your model with MultipleNegativeRankingLoss.