flairNLP / flair

A very simple framework for state-of-the-art Natural Language Processing (NLP)
https://flairnlp.github.io/flair/
Other
13.9k stars 2.1k forks source link

[Feature]: Add support for MobIE NER Dataset #3348

Closed stefan-it closed 1 year ago

stefan-it commented 1 year ago

Problem statement

Hey,

in my latest blog post I used the MobIE NER Dataset to show how to fine-tune models with Flair.

I wrote a custom dataset loader for the MobIE NER Dataset:

The German MobIE Dataset was introduced in the MobIE paper by Hennig, Truong and Gabryszak (2021).

It's a German-language dataset that has been human-annotated with 20 coarse- and fine-grained entity types, and it includes entity linking information for geographically linkable entities. The dataset comprises 3,232 social media texts and traffic reports, totaling 91K tokens, with 20.5K annotated entities, of which 13.1K are linked to a knowledge base. In total, 20 different named entities are annotated.

Solution

Add MobIE support into Flair directly - example class:

https://github.com/stefan-it/autotrain-flair-mobie/blob/main/mobie_dataset.py

It also has some unit tests:

https://github.com/stefan-it/autotrain-flair-mobie/blob/main/script.py#L11-L19

Additional Context

No response

alanakbik commented 1 year ago

Closed in #3351 - thanks @stefan-it!