deepset-ai / FARM

:house_with_garden: Fast & easy transfer learning for NLP. Harvesting language models for the industry. Focus on Question Answering.
https://farm.deepset.ai
Apache License 2.0
1.73k stars 247 forks source link

Checking for iob format when preparing ner results #828

Closed felixvor closed 3 years ago

felixvor commented 3 years ago

Simple check if ner tags are actually in IOB format when presenting inference results. Before the results would just be empty if the data was not in IOB2. Now, if they are, carry on as before and convert_iob_to_simple_tags. If they are not, present prediction results for all of the input tokens. "O" labels are removed in any case.

This is related to issue https://github.com/deepset-ai/FARM/issues/822 and should help any user that is unaware of the IOB conventions by presenting corresponding warnings if the dataset is not IOB formatted.

Tbh, I am not sure if an exception in the NER processor (+ some additional documentation for NER) might not be a better idea than this approach. The user could be forced to always use IOB2 formatting and get some info about it. Also if a user has a dataset with IOB1 instead of IOB2 convert_iob_to_simple_tags might return wrong results. However, if FARM should be flexible as in accepting any NER dataset, than these warnings might be a better solution.