Closed metodj closed 3 years ago
Thanks for raising this, wasn't aware of this behaviour. My guess is that your first potential reason is correct. All the sentences in the data were properly written. Lack of dot might be causing some confusion. Will do more checks.
I can confirm this again, and would be good to have a in-depth review!
I was playing around with the finBERT model a bit and I noticed that for short sentences having a period at the end makes a big difference on the model's predictions (see Figures 1-2 below).
Any idea why that is the case? Could it be that the model was fine-tuned on the sentences with a dot at the end and that's why it makes such a difference? Or does it have to do with BERT embeddings, i.e. is there a special embedding for a dot?
Figure 1 (short sentence, no period at the end):
Figure 2 (short sentence, a period at the end):