HSLCY / ABSA-BERT-pair

Utilizing BERT for Aspect-Based Sentiment Analysis via Constructing Auxiliary Sentence (NAACL 2019)
https://www.aclweb.org/anthology/N19-1035
MIT License
499 stars 138 forks source link

Question about imbalanced classes / labels in sentihood dataset #23

Open frankaging opened 4 years ago

frankaging commented 4 years ago

Hi, thanks for making your codebase public. I walked through your preprocessing steps, and modeling part. And I could not find a place that you are weighting different classes or labels. As in Sentihood dataset, there is far more "none" cases for sentiment polarity. Did you weight "none" less? Similarity for binary models, there is far more "yes" in this case, as there are many "yes" cases for "none".

As a followup to the binary models, because there are many "yes" cases for "none", the positive score for "none" is biased, and can be always larger than other two. Did this happen when you train your model? The model will just output "none"? Thanks.

sunny678 commented 4 years ago

Yes, the constructed labels are imbalanced. BERT's generalization ability is strong, and it can learn key information. The model will not just output "none". However, when the number of aspects is larger, we may reduce the number of "none" cases when constructing data.