Dot at the end of short sentences

metodj commented 3 years ago

I was playing around with the finBERT model a bit and I noticed that for short sentences having a period at the end makes a big difference on the model's predictions (see Figures 1-2 below).

Any idea why that is the case? Could it be that the model was fine-tuned on the sentences with a dot at the end and that's why it makes such a difference? Or does it have to do with BERT embeddings, i.e. is there a special embedding for a dot?

Figure 1 (short sentence, no period at the end):

Figure 2 (short sentence, a period at the end):

doguaraci commented 3 years ago

Thanks for raising this, wasn't aware of this behaviour. My guess is that your first potential reason is correct. All the sentences in the data were properly written. Lack of dot might be causing some confusion. Will do more checks.

luke4u commented 2 years ago

I can confirm this again, and would be good to have a in-depth review!

ProsusAI / finBERT

Dot at the end of short sentences #45

Figure 1 (short sentence, no period at the end):

Figure 2 (short sentence, a period at the end):