UKPLab / sentence-transformers

State-of-the-Art Text Embeddings
https://www.sbert.net
Apache License 2.0
14.99k stars 2.45k forks source link

Are pre-trained models still somehow based on BERT / SBERT? #1130

Open mlfutbol opened 3 years ago

mlfutbol commented 3 years ago

Hey,

I have a question regarding the pre-trained models listed on https://www.sbert.net/docs/pretrained_models.html: Are these models still based on your paper "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks"?

So for example, is "paraphrase-mpnet-base-v2 " still based on BERT/SBERT?

Because I thought the base model needs to be BERT to be considered based on SBERT and in this example it is mpnet-base.

I am asking because I would like to use a tool which calculates sentence similarity based on BERT and now I am not sure if a model like "paraphrase-mpnet-base-v2 " still has anything to do with BERT.

Many thanks!

nreimers commented 3 years ago

Hi @mlfutbol

The paraphrase-mpnet-base-v2 model is based on the MPNet model: https://arxiv.org/abs/2004.09297

It is kind of similar to BERT: It is based on a transformer architecture and was pre-trained on a large corpus.

But the pre-training objective was a bit different (more advanced) than the BERT pre-training objective.

That model works really well to compute sentence similarities. If you want to still call it BERT/SBERT depends upon you. You can still call it SBERT, or you can call it 'sentence transformers' to be a bit more general.

krtin commented 3 years ago

@nreimers "paraphrase-mpnet-base-v2 " is fine-tuned on version of pretrained mpnet-base on which corpus? is it MRPC or something else?

nreimers commented 3 years ago

@krtin You can find the info here: https://github.com/UKPLab/sentence-transformers/blob/94be4996fa8a55824ec1a37eb64e03e4ee2c8cd5/examples/training/paraphrases/README.md#pre-trained-models

krtin commented 3 years ago

@krtin You can find the info here: https://github.com/UKPLab/sentence-transformers/blob/94be4996fa8a55824ec1a37eb64e03e4ee2c8cd5/examples/training/paraphrases/README.md#pre-trained-models

Thanks That works