Is there a benefit of fine tuning, rather than directly using distilbert-base-nli-mean-tokens for sentence classification ?

UKPLab / sentence-transformers

Multilingual Sentence & Image Embeddings with BERT

https://www.SBERT.net

Apache License 2.0

14.69k stars 2.42k forks source link

Is there a benefit of fine tuning, rather than directly using distilbert-base-nli-mean-tokens for sentence classification ? #493

Open shaktisd opened 3 years ago

shaktisd commented 3 years ago

Is there a benefit of fine tuning a bert model over news dataset, rather than directly using distilbert-base-nli-mean-tokens for news sentiment classification ? Can someone share any research paper / blog which talks about fine tuning vs directly using the embeddings for classification, if the underlying data is very generic english like used in news .

nreimers commented 3 years ago

Hi @shaktisd You usually get much better results, if you use directly Transformers and fine-tune it on your sentiment classification task.

I don't know who brought this idea up in the community, but it was never a good idea to first map a sentence to an embedding and then using this embedding as (only) feature for a classifier like logistic regression. Classifier working directly on the text data always outperformed these sentence embedding -> classifier constructions.

So for your case I can recommend to fine tune directly for classification and to not use a sentence embedding in between.

shaktisd commented 3 years ago

@nreimers makes sense, can you please share an example or a link to a document on your recommended approach using sentence transformers library

datistiquo commented 3 years ago

@nreimers maybe this come from this nice blog post: https://github.com/jalammar/jalammar.github.io/blob/master/notebooks/bert/A_Visual_Notebook_to_Using_BERT_for_the_First_Time.ipynb

This was also the approach with finetuned embeddings to train a logistic regression or xgboost. But indeed this was not very good!