ProsusAI / finBERT

Financial Sentiment Analysis with BERT
Apache License 2.0
1.42k stars 413 forks source link

Dot at the end of short sentences #45

Closed metodj closed 3 years ago

metodj commented 3 years ago

I was playing around with the finBERT model a bit and I noticed that for short sentences having a period at the end makes a big difference on the model's predictions (see Figures 1-2 below).

Any idea why that is the case? Could it be that the model was fine-tuned on the sentences with a dot at the end and that's why it makes such a difference? Or does it have to do with BERT embeddings, i.e. is there a special embedding for a dot?

Figure 1 (short sentence, no period at the end):

image

Figure 2 (short sentence, a period at the end):

image

doguaraci commented 3 years ago

Thanks for raising this, wasn't aware of this behaviour. My guess is that your first potential reason is correct. All the sentences in the data were properly written. Lack of dot might be causing some confusion. Will do more checks.

luke4u commented 2 years ago

I can confirm this again, and would be good to have a in-depth review!