TheophileBlard / french-sentiment-analysis-with-bert

How good is BERT ? Comparing BERT to other state-of-the-art approaches on a French sentiment analysis dataset
MIT License
146 stars 35 forks source link

applied to tweets #2

Closed ghost closed 4 years ago

ghost commented 4 years ago

Hi again,

Hope you are all well again !

Wanted to know if this sentiment analysis could be applied to tweets, and how ?

Have a great week-end !

Cheers, Luc

TheophileBlard commented 4 years ago

Sure, you can do sentiment analysis on tweets !

You will need a large dataset of tweets in French language with their corresponding polarity. This is the most difficult part. As you can see here, I automated the annotating process for film reviews. Maybe you can also do it for tweets. You might want to take a look at this projet, where they use emojis to annotate tweets.

You can also try to directly use my models, even if they are trained on movie reviews. As I show here, they can achieve state-of-the-art performances on other domains, such as books or music reviews. But if you want to evaluate the performance of your models, you will still need data...

I close this issue for now, but feel free to re-open it if you have issues using your own dataset on my code.

ghost commented 4 years ago

Just re-opening for a quick question about FlauBERT, have you tried it https://github.com/getalp/Flaubert ? Do you think that will work better than CamemBERT ?

TheophileBlard commented 4 years ago

Just re-opening for a quick question about FlauBERT, have you tried it https://github.com/getalp/Flaubert ? Do you think that will work better than CamemBERT ?

I didn't try FlauBERT because there is no pre-trained model available in the transformers library for Tensorflow (only PyTorch). However, I evaluated my CamemBERT model on the FLUE benchmark, that was released alongside FlauBERT.

I advise you to read CamemBERT and FlauBERT papers. They are both short and easy to understand. Table 3 of the FlauBERT paper compare FlauBERT and CamemBERT performance on the FLUE text classification task (which is sentiment analysis). As you can see, FlauBERT (Base) is not better than CamemBERT. FlauBERT (Large) stands out, but it is very large (can't be loaded in a single GPU), and it's not available (yet) in the transformers library.

image

My only advise is to try both models on your data, and compare performance.