TurkuNLP / FinBERT

BERT model trained from scratch on Finnish
Other
96 stars 7 forks source link

Poor F1 results of FinBERT for NER #9

Open jakelin212 opened 9 months ago

jakelin212 commented 9 months ago

Hi, thanks for releasing FinBERT, we are using FinBERT (case) for NER on some unstructured Finnish medical records and have noticed some poor (F1 < 0.50) results on negative sentiment label entity, for example 'not lonely' containing texts that includes 'ei ' while the 'lonely' labels would have good F1 (~0.80) and was wondering if you have any experience or advice. It is not unbalanced since we tried doing labeling with only negative entity.

jouniluoma commented 9 months ago

Hi, I have used FinBERT for NER with good results. Without knowing more about your dataset, it is really hard to give any advice.

jakelin212 commented 9 months ago

Hi, thank you for the response and my apologies for the inaccurate issue title (I wish I can change it), you are right that for the most part, FinBERT produces very good NER results. In some cases when the negative category entries are very close to the positive, it seems like that BERT is not doing well, i.e. I have Lonely and NotLonely, yksinaisyys = lonely, but yksinaisyys ei ole ongelmia is then a not lonely, just like ei koe yksinaisyyttä and on these entries, the NotLonely is performing badly, recall and precision both ~0.50 while the Lonely is ~0.75-0.80; yes Lonely entries are much more common, but I tried a project with only NotLonely (negative) entries and it had the same effect. I think it could be the tokenisation, where the non-0 values are assigned to individual words, and the usage of 'strict' makes it worse compared to 'partial' on compute metrics. I think we will only use BERT for positive NER and then apply regular expression to assign negative categories.

jouniluoma commented 9 months ago

Named entities are usually nouns or noun phrases (something that has a name) or something that can be handled in a similar fashion. I have not really tested NER for adjectives and therefore was asking about dataset. Perhaps there is another way to solve your problem with FinBERT than NER? Is there some evidence of this kind of approach working e.g. in other languages?

jakelin212 commented 9 months ago

Thanks for your feedback, I have read that BERT does not work well with negation in English too. Feel free to close the ticket. Best!

https://aclanthology.org/2023.blackboxnlp-1.23.pdf

Allyson Ettinger. 2020. What bert is not: Lessons from a new suite of psycholinguistic diagnostics for language models. Transactions of the Association for Computational Linguistics, 8:34–48