Avg. fine-tuned BERT Embeddings v/s Sentence transformers?

UKPLab / sentence-transformers

State-of-the-Art Text Embeddings

https://www.sbert.net

Apache License 2.0

14.87k stars 2.44k forks source link

Avg. fine-tuned BERT Embeddings v/s Sentence transformers? #256

Closed cabhijith closed 4 years ago

cabhijith commented 4 years ago

Hi, Have been using sentence transformers for quite some time and love it! Works really well for standard English tasks. Now, wanted to use it for something much more domain-specific. Using sentence-transformers did not yield any good results. Unfortunately, we do not have the resources to label sentences for SBERT.

Do you think that the average embedding of a BERT model tuned on our data will outperform SBERT for semantic similarity?

Will report back with whatever results I get. Just wanted to open a discussion.

nreimers commented 4 years ago

Hi @cabhijith Yes, please report back your results.

I think it will not work that well. In order to be used in unsupervised fashions, the dimensions must fullfill certain properties. All dimensions count equally and have the same weight when you compute e.g. cosine similarity. Here, BERT sadly does not produce well working results.

What could be interesting for you would be unsupervised sentence embeddings methods, like Sent2Vec: https://github.com/epfml/sent2vec

They train the sentence embeddings just on raw text. If you have a really specific domain, this might result in better quality sentence embeddings.

cabhijith commented 4 years ago

Will check out that. Also, maybe try the approach of fine-tuning BERT on our dataset and then feeding it into SBERT, like in here - training_nli.py. Maybe that could work?

nreimers commented 4 years ago

Yes, it might help.

cabhijith commented 4 years ago

Thanks!