google-research / bert

TensorFlow code and pre-trained models for BERT
https://arxiv.org/abs/1810.04805
Apache License 2.0
38.16k stars 9.6k forks source link

Sentiment analysis on emoji data. #748

Open PrashanthAadepu opened 5 years ago

PrashanthAadepu commented 5 years ago

I am using BERT to do sentiment classification. I am currently classifying into positive, negative and neutral.

I have some data with emojis and it is always classifying them as neutral. I think I am missing something here.

Could someone explain to me how to deal with emoji data to classify them correctly.

Thanks in advance.

aditya-malte commented 5 years ago

Hello, Could you elaborate on what you mean by "data with emojis". Do you mean the emoji alone or with some surrounding text? Because as far as I remember the author of this repo has added emoji support. Thanks

PrashanthAadepu commented 5 years ago

Hi,

I have data like the below sentences.

😍 Love your service period! 😂😂😂😉🤗💕

When I classify the sentences with only emojis its always predicting them as neutral.

Thanks.

aditya-malte commented 5 years ago

Interesting. Does your training data consists of a mixture of emoji and emoji less text? Or do all of them have emojis?

PrashanthAadepu commented 5 years ago

Data has below variants 1, Sentence with no emoji. Ex: Very useful for customers 2, Sentence with text and emoji. Ex: 😍 Love your service period! 3, Sentence with the only emoji. Ex: 😂😂😂😉🤗💕

Thanks.

aditya-malte commented 5 years ago

That's very surprising. Could you share the hyperparameters that you have used so that I can see if something is wrong.

PrashanthAadepu commented 5 years ago

Hello

I am using the below notebook. Tweaked it to classify neutral sentiments also.

https://github.com/google-research/bert/blob/master/predicting_movie_reviews_with_bert_on_tf_hub.ipynb

I am using below tokenizer to properly tokenize the emojis. https://github.com/google-research/bert/blob/master/tokenization.py

And I also added some emojis in vocab.txt and passing it to model training.

Thanks.

dataislife commented 5 years ago

Hey! Any improvements on that aspect? Seems surprising since emojis should be taken into account now by Bert tokenizer. Older version was considering an emoji as UNK token.

bobbyinfj commented 4 years ago

Hello

I am using the below notebook. Tweaked it to classify neutral sentiments also.

https://github.com/google-research/bert/blob/master/predicting_movie_reviews_with_bert_on_tf_hub.ipynb

I am using below tokenizer to properly tokenize the emojis. https://github.com/google-research/bert/blob/master/tokenization.py

And I also added some emojis in vocab.txt and passing it to model training.

Thanks.

Hi Prashanth,

Could you share your code tweaks to classify neutral sentiments? I am starting with the same notebook and am actively struggling with making the same tweaks.

Thank you.

freeIsa commented 4 years ago

Hi, I am also trying to use BERT on data containing emojis but they are always encoded as from the Tokenizer. Has there been any progress in making emojis correctly processed?

Vithurshana commented 1 year ago

Hello.. could you please share the dataset with emojis.. that would be more helpful..